Description
Edited to add information.
Code Sample, a copy-pastable example if possible
a = pd.Series(['123', '345', '456'])
a.astype(int) # works
a.astype('Int64') # doesn't work
Problem description
Currently, the conversion of object dtypes (containing strings) to Int64 doesn't work, even though it should be able to. It produces a long error (see at the end).
Important to note: the above is trying to convert to Int64
with the capital I. Those are the new nullable-integer arrays that got added to python
. pandas
seems to support them, yet I think something inside astype
wasn't update to reflect that.
In essence, the above should work; there is no reason why it should fail and it's quite simply a bug (in answer to some comments). Moreover, to_numeric
is not a sufficient replacement here; it doesn't convert to Int64
when there are missing datatypes, instead it converts to float
automatically (this is actually a non-trivial problem when dealing with long integer identifiers, such as GAIA target identifiers).
Traceback:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-2-5778d49bfa2e> in <module>()
2 a = pd.Series(['123', '345', '456'])
3 print(a.astype(int))
----> 4 print(a.astype('Int64'))
/home/sestovic/anaconda3/lib/python3.6/site-packages/pandas/core/generic.py in astype(self, dtype, copy, errors, **kwargs)
5880 # else, only a single dtype is given
5881 new_data = self._data.astype(
-> 5882 dtype=dtype, copy=copy, errors=errors, **kwargs
5883 )
5884 return self._constructor(new_data).__finalize__(self)
/home/sestovic/anaconda3/lib/python3.6/site-packages/pandas/core/internals/managers.py in astype(self, dtype, **kwargs)
579
580 def astype(self, dtype, **kwargs):
--> 581 return self.apply("astype", dtype=dtype, **kwargs)
582
583 def convert(self, **kwargs):
/home/sestovic/anaconda3/lib/python3.6/site-packages/pandas/core/internals/managers.py in apply(self, f, axes, filter, do_integrity_check, consolidate, **kwargs)
436 kwargs[k] = obj.reindex(b_items, axis=axis, copy=align_copy)
437
--> 438 applied = getattr(b, f)(**kwargs)
439 result_blocks = _extend_blocks(applied, result_blocks)
440
/home/sestovic/anaconda3/lib/python3.6/site-packages/pandas/core/internals/blocks.py in astype(self, dtype, copy, errors, values, **kwargs)
557
558 def astype(self, dtype, copy=False, errors="raise", values=None, **kwargs):
--> 559 return self._astype(dtype, copy=copy, errors=errors, values=values, **kwargs)
560
561 def _astype(self, dtype, copy=False, errors="raise", values=None, **kwargs):
/home/sestovic/anaconda3/lib/python3.6/site-packages/pandas/core/internals/blocks.py in _astype(self, dtype, copy, errors, values, **kwargs)
641 # _astype_nansafe works fine with 1-d only
642 vals1d = values.ravel()
--> 643 values = astype_nansafe(vals1d, dtype, copy=True, **kwargs)
644
645 # TODO(extension)
/home/sestovic/anaconda3/lib/python3.6/site-packages/pandas/core/dtypes/cast.py in astype_nansafe(arr, dtype, copy, skipna)
648 # dispatch on extension dtype if needed
649 if is_extension_array_dtype(dtype):
--> 650 return dtype.construct_array_type()._from_sequence(arr, dtype=dtype, copy=copy)
651
652 if not isinstance(dtype, np.dtype):
/home/sestovic/anaconda3/lib/python3.6/site-packages/pandas/core/arrays/integer.py in _from_sequence(cls, scalars, dtype, copy)
321 @classmethod
322 def _from_sequence(cls, scalars, dtype=None, copy=False):
--> 323 return integer_array(scalars, dtype=dtype, copy=copy)
324
325 @classmethod
/home/sestovic/anaconda3/lib/python3.6/site-packages/pandas/core/arrays/integer.py in integer_array(values, dtype, copy)
105 TypeError if incompatible types
106 """
--> 107 values, mask = coerce_to_array(values, dtype=dtype, copy=copy)
108 return IntegerArray(values, mask)
109
/home/sestovic/anaconda3/lib/python3.6/site-packages/pandas/core/arrays/integer.py in coerce_to_array(values, dtype, mask, copy)
190 ]:
191 raise TypeError(
--> 192 "{} cannot be converted to an IntegerDtype".format(values.dtype)
193 )
194
TypeError: object cannot be converted to an IntegerDtype
Expected Output
Output of pd.show_versions()
[paste the output of pd.show_versions()
here below this line]
INSTALLED VERSIONS
commit : None
python : 3.6.8.final.0
python-bits : 64
OS : Linux
OS-release : 4.18.16-041816-generic
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_GB.UTF-8
LOCALE : en_GB.UTF-8
pandas : 0.25.1
numpy : 1.14.3
pytz : 2018.4
dateutil : 2.7.3
pip : 19.1.1
setuptools : 39.1.0
Cython : 0.28.2
pytest : 3.5.1
hypothesis : None
sphinx : 1.7.4
blosc : None
feather : None
xlsxwriter : 1.0.4
lxml.etree : 4.2.1
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.10
IPython : 6.4.0
pandas_datareader: None
bs4 : 4.6.0
bottleneck : 1.2.1
fastparquet : None
gcsfs : None
lxml.etree : 4.2.1
matplotlib : 3.1.1
numexpr : 2.6.5
odfpy : None
openpyxl : 2.5.3
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : 1.1.0
sqlalchemy : 1.2.7
tables : 3.4.3
xarray : None
xlrd : 1.1.0
xlwt : 1.3.0
xlsxwriter : 1.0.4