Skip to content

BUG: fixes bug when using sep=None and comment keyword for read_csv #31667

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Mar 3, 2020

Conversation

s-scherrer
Copy link
Contributor

@s-scherrer s-scherrer commented Feb 4, 2020

This makes read_csv work when sep=None and comment is set to a value.
Fixes pandas-dev#31396.
@jreback jreback added Bug IO CSV read_csv, to_csv labels Feb 5, 2020
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a whatsnew note. let's target 1.1 with this.

@jreback jreback requested a review from gfyoung February 5, 2020 00:56
Added a note in whatsnew/v1.0.0.rst and moved test for pandas-dev#31396 to the end
of tests/io/parser/test_python_parser_only.py.
@s-scherrer
Copy link
Contributor Author

s-scherrer commented Feb 5, 2020

When adding dict(sep=None) as parameter to test_line_comment I found that my current fix does not work if the first line is a comment.

Overall I think the flow of comment removal and separating columns is a bit inconsistent. During searching for the bug I found that PythonParser._check_comments is sometimes called before separating the line (e.g. when sniffing the separator) and sometimes afterwards. In my opinion it would make more sense to remove comments first in all cases.

The next commit fixes the issue with comments on the first line.

Deleted old test case and added test setup as parameter to
`test_line_comment`.
The new test showed that the previous fix did not work in case the
first line was a comment line. This commit should fix this.
@s-scherrer
Copy link
Contributor Author

The Travis CI build failed with the following message:

UnavailableInvalidChannel: The channel is not accessible or is invalid.

  channel name: c3i_test

  channel url: https://conda.anaconda.org/c3i_test

  error code: 404

Is there a way to rerun it?

@gfyoung
Copy link
Member

gfyoung commented Feb 7, 2020

@s-scherrer : This is actually an unrelated error that we need to resolve on master.

cc @pandas-dev/pandas-core

@s-scherrer s-scherrer requested a review from jreback February 10, 2020 10:19
Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@simonjayhawkins simonjayhawkins added this to the 1.1 milestone Mar 2, 2020
@jreback jreback merged commit 861df91 into pandas-dev:master Mar 3, 2020
@jreback
Copy link
Contributor

jreback commented Mar 3, 2020

thanks @s-scherrer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO CSV read_csv, to_csv
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TypeError when using 'comment=...' in read_csv from a file
5 participants