Closed
Description
Wes is doing great work in Apache Arrow on parquet's categorical support, which means that roundtripping to parquet with to_parquet
/read_parquet
will preserve categorical dtypes (and with a much better performance as before).
See https://issues.apache.org/jira/browse/ARROW-3246 (and linked issues), apache/arrow#5110
We will need to:
- update the tests for pyarrow to test this faithful roundtrip (depending on the pyarrow version):
pandas/pandas/tests/io/test_parquet.py
Line 409 in 802f670
pandas/pandas/tests/io/test_parquet.py
Line 451 in 802f670
- update the documentation. Eg the caveats section at https://dev.pandas.io/user_guide/io.html#parquet