Skip to content

CLN/DOC: DataFrame.to_parquet supports file-like objects #35235

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 17, 2020

Conversation

rhshadrach
Copy link
Member

Adds documentation and type-hints for supporting file-like objects when engine == 'pyarrow'; relevant to #30081. Tests for this behavior currently exist in io.test_parquet.py:

@td.skip_if_no("pyarrow")
    def test_read_file_like_obj_support(self, df_compat):
        buffer = BytesIO()
        df_compat.to_parquet(buffer)
        df_from_buf = pd.read_parquet(buffer)
        tm.assert_frame_equal(df_compat, df_from_buf)

Perhaps the restrictions on the arguments when path is not a string:

  • partition_cols must be None; and
  • engine must end up being pyarrow

should be checked directly in DataFrame.to_parquet, but I'm leaving this out as that is an API change that could be made in a subsequent PR. The latter gives the clear error message TypeError: expected str, bytes or os.PathLike object, not _io.BytesIO but the former raises AttributeError: 'NoneType' object has no attribute '_isfilestore' which is slightly confusing.

Another API change that could be made subsequently is changing the path argument to path_or_buf, consistent with DataFrame.to_csv.

@jreback jreback added Docs IO Parquet parquet, feather Typing type annotations, mypy/pyright type checking labels Jul 13, 2020
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @TomAugspurger lgtm just typing

@jreback jreback added this to the 1.1 milestone Jul 17, 2020
@jreback
Copy link
Contributor

jreback commented Jul 17, 2020

would take a separate PR for path -> path_or_buf (needs deprecation)

@TomAugspurger TomAugspurger merged commit 1fa3747 into pandas-dev:master Jul 17, 2020
@TomAugspurger
Copy link
Contributor

Thanks @rhshadrach.

@rhshadrach rhshadrach deleted the to_parquet branch July 17, 2020 13:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs IO Parquet parquet, feather Typing type annotations, mypy/pyright type checking
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants