Skip to content

Series of object/strings cannot be converted to Int64Dtype() #28599

Closed
@mar-ses

Description

@mar-ses

Edited to add information.

Code Sample, a copy-pastable example if possible

a  = pd.Series(['123', '345', '456'])
a.astype(int)            # works
a.astype('Int64')      # doesn't work

Problem description

Currently, the conversion of object dtypes (containing strings) to Int64 doesn't work, even though it should be able to. It produces a long error (see at the end).

Important to note: the above is trying to convert to Int64 with the capital I. Those are the new nullable-integer arrays that got added to python. pandas seems to support them, yet I think something inside astype wasn't update to reflect that.

In essence, the above should work; there is no reason why it should fail and it's quite simply a bug (in answer to some comments). Moreover, to_numeric is not a sufficient replacement here; it doesn't convert to Int64 when there are missing datatypes, instead it converts to float automatically (this is actually a non-trivial problem when dealing with long integer identifiers, such as GAIA target identifiers).

Traceback:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-2-5778d49bfa2e> in <module>()
      2 a  = pd.Series(['123', '345', '456'])
      3 print(a.astype(int))
----> 4 print(a.astype('Int64'))

/home/sestovic/anaconda3/lib/python3.6/site-packages/pandas/core/generic.py in astype(self, dtype, copy, errors, **kwargs)
   5880             # else, only a single dtype is given
   5881             new_data = self._data.astype(
-> 5882                 dtype=dtype, copy=copy, errors=errors, **kwargs
   5883             )
   5884             return self._constructor(new_data).__finalize__(self)

/home/sestovic/anaconda3/lib/python3.6/site-packages/pandas/core/internals/managers.py in astype(self, dtype, **kwargs)
    579 
    580     def astype(self, dtype, **kwargs):
--> 581         return self.apply("astype", dtype=dtype, **kwargs)
    582 
    583     def convert(self, **kwargs):

/home/sestovic/anaconda3/lib/python3.6/site-packages/pandas/core/internals/managers.py in apply(self, f, axes, filter, do_integrity_check, consolidate, **kwargs)
    436                     kwargs[k] = obj.reindex(b_items, axis=axis, copy=align_copy)
    437 
--> 438             applied = getattr(b, f)(**kwargs)
    439             result_blocks = _extend_blocks(applied, result_blocks)
    440 

/home/sestovic/anaconda3/lib/python3.6/site-packages/pandas/core/internals/blocks.py in astype(self, dtype, copy, errors, values, **kwargs)
    557 
    558     def astype(self, dtype, copy=False, errors="raise", values=None, **kwargs):
--> 559         return self._astype(dtype, copy=copy, errors=errors, values=values, **kwargs)
    560 
    561     def _astype(self, dtype, copy=False, errors="raise", values=None, **kwargs):

/home/sestovic/anaconda3/lib/python3.6/site-packages/pandas/core/internals/blocks.py in _astype(self, dtype, copy, errors, values, **kwargs)
    641                     # _astype_nansafe works fine with 1-d only
    642                     vals1d = values.ravel()
--> 643                     values = astype_nansafe(vals1d, dtype, copy=True, **kwargs)
    644 
    645                 # TODO(extension)

/home/sestovic/anaconda3/lib/python3.6/site-packages/pandas/core/dtypes/cast.py in astype_nansafe(arr, dtype, copy, skipna)
    648     # dispatch on extension dtype if needed
    649     if is_extension_array_dtype(dtype):
--> 650         return dtype.construct_array_type()._from_sequence(arr, dtype=dtype, copy=copy)
    651 
    652     if not isinstance(dtype, np.dtype):

/home/sestovic/anaconda3/lib/python3.6/site-packages/pandas/core/arrays/integer.py in _from_sequence(cls, scalars, dtype, copy)
    321     @classmethod
    322     def _from_sequence(cls, scalars, dtype=None, copy=False):
--> 323         return integer_array(scalars, dtype=dtype, copy=copy)
    324 
    325     @classmethod

/home/sestovic/anaconda3/lib/python3.6/site-packages/pandas/core/arrays/integer.py in integer_array(values, dtype, copy)
    105     TypeError if incompatible types
    106     """
--> 107     values, mask = coerce_to_array(values, dtype=dtype, copy=copy)
    108     return IntegerArray(values, mask)
    109 

/home/sestovic/anaconda3/lib/python3.6/site-packages/pandas/core/arrays/integer.py in coerce_to_array(values, dtype, mask, copy)
    190         ]:
    191             raise TypeError(
--> 192                 "{} cannot be converted to an IntegerDtype".format(values.dtype)
    193             )
    194 

TypeError: object cannot be converted to an IntegerDtype

Expected Output

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line]
INSTALLED VERSIONS

commit : None
python : 3.6.8.final.0
python-bits : 64
OS : Linux
OS-release : 4.18.16-041816-generic
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_GB.UTF-8
LOCALE : en_GB.UTF-8

pandas : 0.25.1
numpy : 1.14.3
pytz : 2018.4
dateutil : 2.7.3
pip : 19.1.1
setuptools : 39.1.0
Cython : 0.28.2
pytest : 3.5.1
hypothesis : None
sphinx : 1.7.4
blosc : None
feather : None
xlsxwriter : 1.0.4
lxml.etree : 4.2.1
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.10
IPython : 6.4.0
pandas_datareader: None
bs4 : 4.6.0
bottleneck : 1.2.1
fastparquet : None
gcsfs : None
lxml.etree : 4.2.1
matplotlib : 3.1.1
numexpr : 2.6.5
odfpy : None
openpyxl : 2.5.3
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : 1.1.0
sqlalchemy : 1.2.7
tables : 3.4.3
xarray : None
xlrd : 1.1.0
xlwt : 1.3.0
xlsxwriter : 1.0.4

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions