You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fix: handle partitions with empty table in read_parquet with dataset=True (#2983)
* BUG: fix read_parquet with dataset=True when the first partition is empty.
When reading a set of parquet files with dataset=True, if the first
partition is empty the current logic for dtype inference will fail. It
ill raise exceptions as follows:
```
pyarrow.lib.ArrowTypeError: Unable to merge: Field col0 has incompatible
types: dictionary<values=null, indices=int32, ordered=0> vs
dictionary<values=string, indices=int32, ordered=0
```
To fix this, we filter out empty table(s) before merging them into one
parquet file.
* [style]: forgot to run ruff on the new code.
* bug: fix the corner case where every table is empty.
While that corner case was caughed in the full test suite, we add a mock
test for this corner case for quick turnaround.
---------
Co-authored-by: David Cournapeau <cournape@amazon.com>
Co-authored-by: Anton Kukushkin <kukushkin.anton@gmail.com>
Co-authored-by: jaidisido <jaidisido@gmail.com>
0 commit comments