Skip to content

pd.concat() does not work generically for ExtensionArrays #20735

Closed
@bfollinprm

Description

@bfollinprm

Code Sample, a copy-pastable example if possible

Sadly, pseudocode because the underlying container is proprietary, but this is a generic problem. If I get a chance this weekend I will write up a mock open-source-compliant example for illustration and testing.

class MyArray(ExtensionArray):
    def __init__(self, values, **kwargs):
        # All the things
        self.values = MyNotNumpyContainer(values)
    # All the other methods

df = pd.DataFrame({'mycolumn': MyArray(values)})

# Raises AttributeError: 'MyNotNumpyContainer' object has no attribute 'dtype'

Problem description

There is a check at the top of _isna_ndarraylike():

def _isna_ndarraylike(obj):
    values = getattr(obj, 'values', obj)
    dtype = values.dtype

    if is_extension_array_dtype(obj):
        if isinstance(obj, (ABCIndexClass, ABCSeries)):
            values = obj._values
        else:
            values = obj
        result = values.isna()

This fails for ExtensionArray objects which define a values attribute, but whose values attribute does not have a dtype attribute.

I noticed this call in pd.concat above, but no doubt it occurs elsewhere.

Expected Output

Since dtype is a required attribute of an ExtensionArray, _isna_ndarraylike() should at least get the dtype from the ExtensionArray class. Really, a check for whether we are dealing with an ExtensionArray should occur upstream somewhere, since there is no guarantee an ExtensionArray is backended by a numpy array-like object.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 2.7.13.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-1048-aws
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: None.None

pandas: 0.23.0.dev0+762.gbb095a6
pytest: 3.2.2
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.28.2
numpy: 1.13.3
scipy: 0.19.1
pyarrow: None
xarray: None
IPython: 5.3.0
sphinx: 1.5.4
patsy: None
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.6.0
html5lib: 0.9999999
sqlalchemy: 1.1.9
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    ExtensionArrayExtending pandas with custom dtypes or arrays.

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions