Skip to content

Commit df87fd3

Browse files
hissashirochajorisvandenbossche
authored andcommitted
DOC: update the pandas.errors.DtypeWarning docstring (#20208)
1 parent 71e42a8 commit df87fd3

File tree

1 file changed

+54
-3
lines changed

1 file changed

+54
-3
lines changed

pandas/errors/__init__.py

Lines changed: 54 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -38,9 +38,60 @@ class ParserError(ValueError):
3838

3939
class DtypeWarning(Warning):
4040
"""
41-
Warning that is raised for a dtype incompatibility. This
42-
can happen whenever `pd.read_csv` encounters non-
43-
uniform dtypes in a column(s) of a given CSV file.
41+
Warning raised when reading different dtypes in a column from a file.
42+
43+
Raised for a dtype incompatibility. This can happen whenever `read_csv`
44+
or `read_table` encounter non-uniform dtypes in a column(s) of a given
45+
CSV file.
46+
47+
See Also
48+
--------
49+
pandas.read_csv : Read CSV (comma-separated) file into a DataFrame.
50+
pandas.read_table : Read general delimited file into a DataFrame.
51+
52+
Notes
53+
-----
54+
This warning is issued when dealing with larger files because the dtype
55+
checking happens per chunk read.
56+
57+
Despite the warning, the CSV file is read with mixed types in a single
58+
column which will be an object type. See the examples below to better
59+
understand this issue.
60+
61+
Examples
62+
--------
63+
This example creates and reads a large CSV file with a column that contains
64+
`int` and `str`.
65+
66+
>>> df = pd.DataFrame({'a': (['1'] * 100000 + ['X'] * 100000 +
67+
... ['1'] * 100000),
68+
... 'b': ['b'] * 300000})
69+
>>> df.to_csv('test.csv', index=False)
70+
>>> df2 = pd.read_csv('test.csv')
71+
72+
DtypeWarning: Columns (0) have mixed types
73+
74+
Important to notice that ``df2`` will contain both `str` and `int` for the
75+
same input, '1'.
76+
77+
>>> df2.iloc[262140, 0]
78+
'1'
79+
>>> type(df2.iloc[262140, 0])
80+
<class 'str'>
81+
>>> df2.iloc[262150, 0]
82+
1
83+
>>> type(df2.iloc[262150, 0])
84+
<class 'int'>
85+
86+
One way to solve this issue is using the `dtype` parameter in the
87+
`read_csv` and `read_table` functions to explicit the conversion:
88+
89+
>>> df2 = pd.read_csv('test.csv', sep=',', dtype={'a': str})
90+
91+
No warning was issued.
92+
93+
>>> import os
94+
>>> os.remove('test.csv')
4495
"""
4596

4697

0 commit comments

Comments
 (0)