Closed
Description
In the dev docs the example that subsets the columns to read with read_parquet
is broken for the pyarrow engine: http://pandas-docs.github.io/pandas-docs-travis/io.html#io-parquet
In [514]: result = pd.read_parquet('example_pa.parquet', engine='pyarrow', columns=['a', 'b'])
...
IndexError: Table column index 6 is out of range
In [515]: result = pd.read_parquet('example_fp.parquet', engine='fastparquet', columns=['a', 'b'])
In [516]: result.dtypes
Out[516]:
a object
b int64
dtype: object
This is due to a bug in pyarrow
(which I am reporting over there, due to how pyarrow deals with the pandas metadata if not all columns are present), but in the meantime we should also fix our docs to not show this buggy example.