-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: df.to_csv() fails to a not-yet-created file when the path is fsspec-based (#55828) #56309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from all commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -382,6 +382,19 @@ def _get_filepath_or_buffer( | |
# urlopen function defined elsewhere in this module | ||
import urllib.request | ||
|
||
# Fix for GH #55828 | ||
parsed_url = parse_url(filepath_or_buffer) | ||
if parsed_url.scheme == "file": | ||
file_path = urllib.request.url2pathname(parsed_url.path) | ||
file_path = os.path.normpath(file_path) | ||
return IOArgs( | ||
filepath_or_buffer=open(file_path, "rb"), | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Shouldn't this still respect mode? |
||
encoding=encoding, | ||
compression=compression, | ||
should_close=True, | ||
mode=fsspec_mode, | ||
) | ||
|
||
# assuming storage_options is to be interpreted as headers | ||
req_info = urllib.request.Request(filepath_or_buffer, headers=storage_options) | ||
with urlopen(req_info) as req: | ||
|
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe
is_url
should not be true for fsspec urls. So that might be a much nicer way of fixing this issue (I think @krehm was also hinting at that in the issue) - I'm not familiar with the urllib regex, we might need to exclude more fsspec URLs from it.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@twoertwein it did seem to me that any fsspec url in is_url is guaranteed to fail in this case, which seemed like a logic flaw to me. But I'm not familiar with the urllib code either, so was hesitant to specify a particular solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The overlap between urllib/fsspec is at the moment:
Could have something like this:
Technically this is a behavior change for sftp, git, ... (might be okay, probably not used frequently?). fsspec should have
available_protocols
since early 2022 fsspec/filesystem_spec#913 might need to double check whether we need to bump the minimum version of fsspec. This might make the regex inis_fsspec_url
obsolete.@mroeschke
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@twoertwein I didn't want to touch
is_url
as its used in a few other places and I wasn't sure if it would break anything. Is it okay to do so?These are the places is_url is used without a corresponding is_fsspec_url call:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for checking that!
Do you think it is possible to replace
if isinstance(filepath_or_buffer, str) and is_url(filepath_or_buffer):
withif isinstance(filepath_or_buffer, str) and is_url(filepath_or_buffer) and not is_fsspec_url(filepath_or_buffer):
?