Skip to content

BUG: to_json fails writing to GCS with compression #39985

Closed
@dariobig

Description

@dariobig
  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample, a copy-pastable example

import pandas as pd
df = pd.DataFrame({'numbers': list(range(1, 10))})
df.to_json('gcs://test-bucket/test.json.gz')

Problem description

Error writing compressed stream using gcs. Removing compression works fine.

import pandas as pd...
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/workspaces/charybdis/test_pg.py in 
      254 
      255 df = pd.DataFrame({'numbers': list(range(1, 10))})
----> 256 df.to_json('gcs://river-categorizer-data-us-central1/test.jsonl.gz')

/workspaces/charybdis/.venv/lib/python3.8/site-packages/pandas/core/generic.py in to_json(self, path_or_buf, orient, date_format, double_precision, force_ascii, date_unit, default_handler, lines, compression, index, indent, storage_options)
   2463         indent = indent or 0
   2464 
-> 2465         return json.to_json(
   2466             path_or_buf=path_or_buf,
   2467             obj=self,

/workspaces/charybdis/.venv/lib/python3.8/site-packages/pandas/io/json/_json.py in to_json(path_or_buf, obj, orient, date_format, double_precision, force_ascii, date_unit, default_handler, lines, compression, index, indent, storage_options)
    100     if path_or_buf is not None:
    101         # apply compression and byte/text conversion
--> 102         with get_handle(
    103             path_or_buf, "wt", compression=compression, storage_options=storage_options
    104         ) as handles:

/workspaces/charybdis/.venv/lib/python3.8/site-packages/pandas/io/common.py in get_handle(path_or_buf, mode, encoding, compression, memory_map, is_text, errors, storage_options)
    590                 )
    591             else:
--> 592                 handle = gzip.GzipFile(
    593                     fileobj=handle,  # type: ignore[arg-type]
    594                     mode=ioargs.mode,

/usr/local/lib/python3.8/gzip.py in __init__(self, filename, mode, compresslevel, fileobj, mtime)
    202 
    203         if self.mode == WRITE:
--> 204             self._write_gzip_header(compresslevel)
    205 
    206     @property

/usr/local/lib/python3.8/gzip.py in _write_gzip_header(self, compresslevel)
    230 
    231     def _write_gzip_header(self, compresslevel):
--> 232         self.fileobj.write(b'\037\213')             # magic header
    233         self.fileobj.write(b'\010')                 # compression method

Expected Output

No error

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit : 7d32926 python : 3.8.7.final.0 python-bits : 64 OS : Linux OS-release : 4.19.121-linuxkit Version : #1 SMP Tue Dec 1 17:50:32 UTC 2020 machine : x86_64 processor : byteorder : little LC_ALL : None LANG : C.UTF-8 LOCALE : en_US.UTF-8

pandas : 1.2.2
numpy : 1.20.1
pytz : 2021.1
dateutil : 2.8.1
pip : 20.2.2
setuptools : 49.6.0
Cython : None
pytest : 6.2.2
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.6.2
html5lib : None
pymysql : None
psycopg2 : 2.8.6 (dt dec pq3 ext lo64)
jinja2 : 2.11.3
IPython : 7.20.0
pandas_datareader: None
bs4 : 4.9.3
bottleneck : None
fsspec : 0.8.5
fastparquet : None
gcsfs : 0.7.2
matplotlib : 3.3.4
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 3.0.0
pyxlsb : None
s3fs : None
scipy : 1.6.0
sqlalchemy : 1.3.23
tables : None
tabulate : 0.8.7
xarray : None
xlrd : None
xlwt : None
numba : None

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugIO JSONread_json, to_json, json_normalizeRegressionFunctionality that used to work in a prior pandas version

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions