-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Support for partition_cols in to_parquet #23321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
41c2828
0d9f878
1636681
14a2580
7bc337b
6670adf
971ba54
112d6e9
441f879
6cb196d
6e06646
a5164b8
1f0978f
ddfa789
ee7707f
79f1615
514c5c0
eb86de0
8b45547
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1970,7 +1970,7 @@ def to_feather(self, fname): | |
to_feather(self, fname) | ||
|
||
def to_parquet(self, fname, engine='auto', compression='snappy', | ||
index=None, **kwargs): | ||
index=None, partition_cols=None, **kwargs): | ||
""" | ||
Write a DataFrame to the binary parquet format. | ||
|
||
|
@@ -1984,7 +1984,11 @@ def to_parquet(self, fname, engine='auto', compression='snappy', | |
Parameters | ||
---------- | ||
fname : str | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. side issue. we use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we actually use path on the top-level |
||
String file path. | ||
File path or Root Directory path. Will be used as Root Directory | ||
path while writing a partitioned dataset. | ||
TomAugspurger marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
.. versionchanged:: 0.24.0 | ||
anjsudh marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
engine : {'auto', 'pyarrow', 'fastparquet'}, default 'auto' | ||
Parquet library to use. If 'auto', then the option | ||
``io.parquet.engine`` is used. The default ``io.parquet.engine`` | ||
|
@@ -1999,6 +2003,12 @@ def to_parquet(self, fname, engine='auto', compression='snappy', | |
|
||
.. versionadded:: 0.24.0 | ||
anjsudh marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
partition_cols : list, optional, default None | ||
Column names by which to partition the dataset | ||
Columns are partitioned in the order they are given | ||
|
||
.. versionadded:: 0.24.0 | ||
anjsudh marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
**kwargs | ||
Additional arguments passed to the parquet library. See | ||
:ref:`pandas io <io.parquet>` for more details. | ||
|
@@ -2027,7 +2037,8 @@ def to_parquet(self, fname, engine='auto', compression='snappy', | |
""" | ||
from pandas.io.parquet import to_parquet | ||
to_parquet(self, fname, engine, | ||
compression=compression, index=index, **kwargs) | ||
compression=compression, index=index, | ||
partition_cols=partition_cols, **kwargs) | ||
|
||
@Substitution(header='Write out the column names. If a list of strings ' | ||
'is given, it is assumed to be aliases for the ' | ||
|
Uh oh!
There was an error while loading. Please reload this page.