Skip to content

to_csv / to_pickle optionally use fast gzip compressionlevel=1 #33196

Closed
@kernc

Description

@kernc

Code Sample, a copy-pastable example if possible

Enhancement proposal

>>> df.to_csv('data.csv.gz')  # Awfully slow

>>> df.to_pickle('data.pkl.bz2')  # Awfully slow

>>> df.to_csv('data.csv.gz', fast=True)  # Uses fast compressionlevel=1

# or, better:

>>> pd.options.io.compressionlevel = 1

>>> df.to_pickle('data.pkl.bz2')  # Uses fast compressionlevel=1

...

Problem description

Compression of large objects in pandas is slow.

Popular evidence comparing compression levels for average payloads shows [1] [2] that compressed size is usually far less variable than compression time, which most often spans several folds.
compressionlevel=1 is orders of magnitude faster, whereas compressionlevel=9 is only 10% smaller.

One optimizes for size, the other for speed, and much fewer people ever need something in between.

Expected Output

Output of pd.show_versions()

1.1.0.dev0+786.gec7734169

Metadata

Metadata

Assignees

Labels

IO DataIO issues that don't fit into a more specific label

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions