Description
Code Sample, a copy-pastable example if possible
import numpy as np
import pandas as pd
from pathlib import Path
random_data = np.random.standard_normal(size=(1000, 3))
data = pd.DataFrame(random_data, columns=['A', 'B', 'C'])
csv_file = Path('random_data.zip')
data .to_csv(csv_file.with_suffix('.zip'), index=False, compression={'method': 'zip', 'archive_name': csv_file})
Problem description
The Python std lib has made a tremendous effort to support os.PathLike objects, especially pathlib.Path, across the standard library. ZipFile was somewhat recently patched to add PathLike support for externally facing paths, but it appears they did not add this support to ZipInfo which is used to interface with the files within a zip archive.
DataFrame's to_csv()
method was recently updated to handle passing additional arguments when using zip compression. If compression is specified as a dict
and archive_name
is passed as a key, the value currently must be a str
because ZipInfo requires a string. Since pandas has exposed this to the user it would be nice for PathLike objects like pathlib.Path
to get converted to a str
before being passed to ZipInfo.
If someone wants to raise this upstream on the Python issue tracker, that's also an acceptable outcome. Perhaps this was just an oversight on the previous issue where they neglected to consider the arguments to ZipInfo as externally facing.
Expected Output
- Zip file is saved with no exceptions
- Zip file contains a *.csv file (it's only because pandas defaults to creating a zip file containing an identically named zip file that passing
archive_name
is necessary for what should be default behavior. )
Output of pd.show_versions()
INSTALLED VERSIONS
commit : None
python : 3.7.2.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 158 Stepping 9, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None
pandas : 1.0.1
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 20.0.2
setuptools : 40.6.2
Cython : 0.29.14
pytest : 4.2.0
hypothesis : None
sphinx : 2.1.2
blosc : None
feather : None
xlsxwriter : 1.2.2
lxml.etree : 4.3.1
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.10
IPython : 7.8.0
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : 4.3.1
matplotlib : 3.1.1
numexpr : 2.6.9
odfpy : None
openpyxl : 3.0.0
pandas_gbq : None
pyarrow : None
pytables : None
pytest : 4.2.0
pyxlsb : None
s3fs : None
scipy : 1.3.2
sqlalchemy : 1.3.6
tables : 3.4.4
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.2.2
numba : 0.46.0