Description
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas (theoretically affected as well but my example doesn't work for <1.2 (need binary file handle) other examples should trigger this bug in <1.2).
-
(optional) I have confirmed this bug exists on the master branch of pandas.
Code Sample, a copy-pastable example
import os
import fsspec
import pandas as pd
# create a 'normal' file handle
with open("abc.test", mode="w") as open_obj:
assert not isinstance(open_obj, os.PathLike) # is not converted to a string
position = open_obj.tell()
# let to_csv write to the opened file
pd.DataFrame({"a": [1, 2, 3]}).to_csv(open_obj)
# the position of the file buffer should have changed if to_csv used it
assert open_obj.tell() != position
# create a file handle that also implements os.PathLike/has __fspath__
fsspec_obj = fsspec.open("file://abc.test", mode="wb").open()
with fsspec_obj:
assert isinstance(fsspec_obj, os.PathLike) # is converted to a string
position = fsspec_obj.tell()
# let to_csv write to the opened file
pd.DataFrame({"a": [1, 2, 3]}).to_csv(fsspec_obj)
# the position of the file buffer should have changed if to_csv used it
assert fsspec_obj.tell() != position # fails
Problem description
get_filepath_or_buffer
(<1.2) or get_handle
(1.2) call stringify_path
to convert pathlib.Path
and other os.PathLike
to a string. This string is then later opened. It seems that there is at least one file object that implements os.PathLike
but at the same time already opens the file. In this case case, all to/read_*
that use get_handle
(or get_filepath_or_buffer
in <1.2) extract the string and then open the file (even though the user already opened it).
I'm not sure whether there are other examples. I will look into how to fix this.