Skip to content

pd.read_parquet causing Python to crash #39031

Open
@scottmsul

Description

@scottmsul

These are the commands I ran in IPython:

In [1]: import pandas as pd

In [2]: df = pd.DataFrame(data={'x': [1,2,3]})

In [3]: df.to_parquet('test.parquet')

In [4]: with open('test.parquet', 'rb') as f:
   ...:     df2 = pd.read_parquet(f)
   ...:

At which point it exits IPython without displaying an error or stacktrace. Maybe this is some kind of segfault? It also fails when running from a script, and also when reading from a BytesIO instead of a file.

This is using pandas==1.2.0, pyarrow==2.0.0, and Python 3.7.3. Also this was run in Windows from Powershell.

If it helps, this is the output from pd.show_versions():

INSTALLED VERSIONS
------------------
commit           : 3e89b4c4b1580aa890023fc550774e63d499da25
python           : 3.7.3.final.0
python-bits      : 64
OS               : Windows
OS-release       : 10
Version          : 10.0.17763
machine          : AMD64
processor        : Intel64 Family 6 Model 45 Stepping 7, GenuineIntel
byteorder        : little
LC_ALL           : None
LANG             : None
LOCALE           : None.None

pandas           : 1.2.0
numpy            : 1.19.4
pytz             : 2020.5
dateutil         : 2.8.1
pip              : 20.3.3
setuptools       : 51.1.0.post20201221
Cython           : 0.29.14
pytest           : None
hypothesis       : None
sphinx           : 2.0.0
blosc            : None
feather          : None
xlsxwriter       : 1.1.5
lxml.etree       : 4.4.2
html5lib         : None
pymysql          : None
psycopg2         : None
jinja2           : 2.11.2
IPython          : 7.4.0
pandas_datareader: None
bs4              : 4.7.1
bottleneck       : None
fsspec           : None
fastparquet      : None
gcsfs            : None
matplotlib       : 3.1.1
numexpr          : None
odfpy            : None
openpyxl         : None
pandas_gbq       : None
pyarrow          : 2.0.0
pyxlsb           : None
s3fs             : None
scipy            : 1.3.1
sqlalchemy       : 1.3.2
tables           : None
tabulate         : None
xarray           : None
xlrd             : None
xlwt             : None
numba            : None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions