Description
Pandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
class A:
_dat = 5
# succeeds
pd.DataFrame(data={1:range(10), 2:range(10)}, index = [A() for a in range(10)])
class B:
_data = 5
# fails, as B instances are assumed array-like, so Pandas tries and fails to make a MultiIndex
pd.DataFrame(data={1:range(10), 2:range(10)}, index = [B() for a in range(10)])
Issue Description
In pandas.core.indexes.base
, ensure_index()
calls pandas.lib.is_all_arraylike
to determine whether a list of entries should be treated as a regular index or a multi-index.
However, this function will incorrectly assume that any instance with a _data
attribute is "array-like". This is highly nonstandard. It would seem better to look for something canonical such as __array_interface__
or similar. Just looking for a _data
attribute is both nonspecific (lots of classes have _data
attributes) and misses any array-like instances that don't happen to have a _data
. (Numpy arrays don't have _data
, even: is_all_arraylike
has to special-case for actual arrays...)
The result of this is that instances of non-array-like classes that happen to have _data
attributes cannot be pandas indices.
Here's the source in master:
https://github.com/pandas-dev/pandas/blob/main/pandas/_libs/lib.pyx#L762
Expected Behavior
pandas.lib.is_all_arraylike
should correctly report array-like things as array-like, and things like class B
above correctly as not array-like.
Installed Versions
My versions don't matter (they're below anyway), as the offending code is in master:
https://github.com/pandas-dev/pandas/blob/main/pandas/_libs/lib.pyx#L762
pd.show_versions()
INSTALLED VERSIONS
commit : 945c9ed
python : 3.9.7.final.0
python-bits : 64
OS : Linux
OS-release : 4.19.0-18-cloud-amd64
Version : #1 SMP Debian 4.19.208-1 (2021-09-29)
machine : x86_64
processor :
byteorder : little
LC_ALL : None
LANG : C.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.3.4
numpy : 1.19.5
pytz : 2021.1
dateutil : 2.8.1
pip : 21.2.4
setuptools : 58.0.4
Cython : 0.29.24
pytest : 6.2.5
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : 3.0.2
lxml.etree : 4.7.1
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.0.1
IPython : 7.27.0
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : 0.9.0
fastparquet : None
gcsfs : None
matplotlib : 3.4.3
numexpr : 2.7.3
odfpy : None
openpyxl : 3.0.9
pandas_gbq : 0.15.0
pyarrow : 4.0.0
pyxlsb : None
s3fs : None
scipy : 1.7.0
sqlalchemy : 1.4.28
tables : None
tabulate : 0.8.9
xarray : None
xlrd : None
xlwt : None
numba : 0.54.0