Skip to content

DOC/BUG: broken example in read_parquet with selecting columns #18628

Closed
@jorisvandenbossche

Description

@jorisvandenbossche

In the dev docs the example that subsets the columns to read with read_parquet is broken for the pyarrow engine: http://pandas-docs.github.io/pandas-docs-travis/io.html#io-parquet

In [514]: result = pd.read_parquet('example_pa.parquet', engine='pyarrow', columns=['a', 'b'])
...
IndexError: Table column index 6 is out of range

In [515]: result = pd.read_parquet('example_fp.parquet', engine='fastparquet', columns=['a', 'b'])

In [516]: result.dtypes
Out[516]: 
a    object
b     int64
dtype: object

This is due to a bug in pyarrow (which I am reporting over there, due to how pyarrow deals with the pandas metadata if not all columns are present), but in the meantime we should also fix our docs to not show this buggy example.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Compatpandas objects compatability with Numpy or Python functionsDocsIO Parquetparquet, feather

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions