-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
DOC: update the pandas.errors.DtypeWarning docstring #20208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 4 commits
6cbdd7d
ab7f790
9423867
ed7e372
76fc248
12c0ac5
9be8cfd
9e7e129
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -38,9 +38,53 @@ class ParserError(ValueError): | |
|
||
class DtypeWarning(Warning): | ||
""" | ||
Warning that is raised for a dtype incompatibility. This | ||
can happen whenever `pd.read_csv` encounters non- | ||
uniform dtypes in a column(s) of a given CSV file. | ||
Warning raised when importing different dtypes in a column from a file. | ||
|
||
Raised for a dtype incompatibility. This can happen whenever `pd.read_csv` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you also use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's done! |
||
or `pd.read_table` encounter non-uniform dtypes in a column(s) of a given | ||
CSV file. | ||
|
||
It only happens when dealing with larger files. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. instead of this comment, you can add in the Notes section, that this can happen in larger files. Its because the dtype checking happens per chunk that is read. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Tks! |
||
|
||
See Also | ||
-------- | ||
pd.read_csv : Read CSV (comma-separated) file into a DataFrame. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. pd -> pandas I think. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK! |
||
pd.read_table : Read general delimited file into a DataFrame. | ||
|
||
Notes | ||
----- | ||
Despite the warning, the CSV file is imported with mixed types in a single | ||
column. See the examples below to better understand this issue. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. imported -> read. The dtype of the column will be object. |
||
|
||
Examples | ||
-------- | ||
This example creates and reads a large CSV file with a column that contains | ||
`int` and `str`. | ||
|
||
>>> df = pd.DataFrame({'a':['1']*100000 + ['X']*100000 + ['1']*100000, | ||
... 'b':['b']*300000}) | ||
>>> df.to_csv('test', sep='\t', index=False, na_rep='NA') | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you add '.csv' for the temp file ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done! |
||
>>> df2 = pd.read_csv('test', sep='\t') | ||
Traceback (most recent call last): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think if yyou add a line like
The doctest might work. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, I don't think it will be possible to get Warnings to work with doctest, so adding a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm sorry but I couldn't understand what the doctest: +SKIP is supposed to do. I've added it right after the read_csv and i just got an error. Can you help me? ################################################################################ Line 28, in pandas.errors.DtypeWarning |
||
... | ||
DtypeWarning: Columns (0) have mixed types... | ||
|
||
Important to notice that df2 will contain both `str` and `int` for the | ||
same input, '1'. | ||
|
||
>>> df2.iloc[262140,0] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. PEP8: space after comma There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done! |
||
'1' | ||
>>> type(df2.iloc[262140,0]) | ||
<class 'str'> | ||
>>> df2.iloc[262150,0] | ||
1 | ||
>>> type(df2.iloc[262150,0]) | ||
<class 'int'> | ||
|
||
One way to solve this issue is using the parameter `converters` in the | ||
`read_csv` and `read_table` functions to explicit the conversion: | ||
|
||
>>> df2 = pd.read_csv('test', sep='\t', converters={'a': str}) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. FYI, this is still There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Another thing: I think we should recommend |
||
""" | ||
|
||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when reading (not importing)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tks!