Description
Short summary
to_numeric
downcasts integers "safely," that is, it only returns a downcasted result if that result == the argument. But it downcasts floats "non-safely" / too aggressively, that is, it forces a downcasted result even when that result != the argument.
Illustration for integers: Behavior is as expected
For big integers that must be represented by int64 (because they are greater than np.iinfo('int32').max
), forcing a downcast to int32 by using .astype('int32')
is destructive in that the result is no longer == the argument. But to_numeric
with downcast='integer'
is "safe" in that it will refuse to downcast and instead return a result that is still int64.
s = pd.Series(9876543210)
s.astype('int32') # 1286608618; dtype: int32
s.astype('int32') == s # False
pd.to_numeric(s, downcast='integer') # 9876543210; dtype: int64
pd.to_numeric(s, downcast='integer') == s # True
It looks like this behavior was discussed in the resolved issue #14941.
Illustration for floats: Behavior is unexpected and potentially harmful
For big floats, using to_numeric
with downcast='float'
appears to be just as forceful as using .astype('float32')
, in that it returns a downcasted result even if that result is no longer == the argument.
pd.set_option('display.float_format', '{:.2f}'.format)
s = pd.Series(9876543210.0)
s.astype('float32') # 9876543488.00; dtype: float32
s.astype('float32') == s # False
pd.to_numeric(s, downcast='float') # 9876543488.00; dtype: float32
pd.to_numeric(s, downcast='float') == s # False
Expected output:
s = pd.Series(9876543210.0)
pd.to_numeric(s, downcast='float') # 9876543210.00; dtype: float64
pd.to_numeric(s, downcast='float') == s # True
Output of pd.show_versions()
pandas: 0.22.0
pytest: 3.2.1
pip: 9.0.1
setuptools: 36.5.0.post20170921
Cython: 0.26.1
numpy: 1.14.0
scipy: 0.19.1
pyarrow: 0.8.0
xarray: None
IPython: 6.1.0
sphinx: 1.6.3
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.2
feather: None
matplotlib: 2.1.0
openpyxl: 2.4.8
xlrd: 1.1.0
xlwt: 1.2.0
xlsxwriter: 1.0.2
lxml: 4.1.0
bs4: 4.6.0
html5lib: 0.999999999
sqlalchemy: 1.2.1
pymysql: 0.7.11.None
psycopg2: 2.7.3.2 (dt dec pq3 ext lo64)
jinja2: 2.9.6
s3fs: None
fastparquet: 0.1.4
pandas_gbq: None
pandas_datareader: None