Skip to content

BUG: to/read_* do not use user-provided file handle if handle implements os.PathLike and also opened the file #38125

Closed
@twoertwein

Description

@twoertwein
  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas (theoretically affected as well but my example doesn't work for <1.2 (need binary file handle) other examples should trigger this bug in <1.2).

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample, a copy-pastable example

import os

import fsspec
import pandas as pd

# create a 'normal' file handle
with open("abc.test", mode="w") as open_obj:
    assert not isinstance(open_obj, os.PathLike)  # is not converted to a string
    position = open_obj.tell()

    # let to_csv write to the opened file
    pd.DataFrame({"a": [1, 2, 3]}).to_csv(open_obj)

    # the position of the file buffer should have changed if to_csv used it
    assert open_obj.tell() != position


# create a file handle that also implements os.PathLike/has __fspath__
fsspec_obj = fsspec.open("file://abc.test", mode="wb").open()
with fsspec_obj:
    assert isinstance(fsspec_obj, os.PathLike)  # is converted to a string
    position = fsspec_obj.tell()

    # let to_csv write to the opened file
    pd.DataFrame({"a": [1, 2, 3]}).to_csv(fsspec_obj)

    # the position of the file buffer should have changed if to_csv used it
    assert fsspec_obj.tell() != position  # fails

Problem description

get_filepath_or_buffer (<1.2) or get_handle (1.2) call stringify_path to convert pathlib.Path and other os.PathLike to a string. This string is then later opened. It seems that there is at least one file object that implements os.PathLike but at the same time already opens the file. In this case case, all to/read_* that use get_handle (or get_filepath_or_buffer in <1.2) extract the string and then open the file (even though the user already opened it).

I'm not sure whether there are other examples. I will look into how to fix this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugIO DataIO issues that don't fit into a more specific label

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions