Description
Code Sample, a copy-pastable example if possible
df.to_csv('file.txt.gz', sep='\t', compression='gzip')
Problem description
I receive this error while writing to file a very big dataframe:
---------------------------------------------------------------------------
OSError Traceback (most recent call last)
<ipython-input-28-48e45479ccfb> in <module>()
----> 1 df.to_csv('file.txt.gz', sep='\t', compression='gzip')
~/.pyenv/versions/3.6.5/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/frame.py in to_csv(self, path_or_buf, sep, na_rep, float_format, columns, header, index, index_label, mode, encoding, compression, quoting, quotechar, line_terminator, chunksize, tupleize_cols, date_format, doublequote, escapechar, decimal)
1743 doublequote=doublequote,
1744 escapechar=escapechar, decimal=decimal)
-> 1745 formatter.save()
1746
1747 if path_or_buf is None:
~/.pyenv/versions/3.6.5/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/io/formats/csvs.py in save(self)
156 f.close()
157 with open(self.path_or_buf, 'r') as f:
--> 158 data = f.read()
159 f, handles = _get_handle(self.path_or_buf, self.mode,
160 encoding=encoding,
OSError: [Errno 22] Invalid argument
I cannot disclose the data but by running df.info()
I received this information:
<class 'pandas.core.frame.DataFrame'>
Index: 10319 entries, Sample1 to Sample10319
Columns: 33707 entries, A1BG to ZZZ3
dtypes: float64(33707)
memory usage: 2.6+ GB
When looking at the disk, the dataframe has probably been dumped incompletely, and not compressed.
I am working with 16G of RAM on macOS 10.13.4 (17E202).
Expected Output
Output of pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Darwin
OS-release: 17.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.23.0
pytest: None
pip: 10.0.1
setuptools: 39.1.0
Cython: None
numpy: 1.14.3
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 6.4.0
sphinx: None
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.4
blosc: None
bottleneck: None
tables: 3.4.3
numexpr: 2.6.5
feather: None
matplotlib: 2.2.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None