Description
Pandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import numpy as np
import pandas as pd
# Age categories for 'with children'
age_cat_w_children = pd.DataFrame(
data={
'Result': ['001', '002', '003', '004', '005', '006', '007', '008', '009', '010']
},
index=pd.IntervalIndex.from_breaks(
breaks=[-np.inf, 29, 35, 40, 45, 50, 55, 59, 65, 70, np.inf],
closed='left',
name='age',
)
)
# Age categories for 'without children'
age_cat_wo_children = pd.DataFrame(
data={
'Result': ['101', '102', '103', '104', '105', '106', '107', '108']
},
index=pd.IntervalIndex.from_breaks(
breaks=[-np.inf, 29, 35, 40, 45, 50, 55, 59, np.inf],
closed='left',
name='age',
)
)
# Combine both categories into one.
mapper = pd.concat(
objs=[age_cat_wo_children, age_cat_w_children],
axis=0,
keys=[False, True],
names=['children']
)
# These works, as they are supposed to.
print(mapper.loc[(True, 40)])
print(mapper.loc[(False, 40)])
print(mapper.loc[[(False, 45), (False, 35)], :])
# However, these fails.
print(mapper.loc[(False, 99)])
print(mapper.loc[[(False, 45), (False, 99)], :])
Issue Description
Selecting rows from MultiIndex containing IntervalIndex fails in some inconsistent cases.
print(mapper.loc[(False, 99)])
gives
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\lib\code.py", line 90, in runcode
exec(code, self.locals)
File "<input>", line 1, in <module>
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexing.py", line 961, in __getitem__
return self._getitem_tuple(key)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexing.py", line 1140, in _getitem_tuple
return self._getitem_lowerdim(tup)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexing.py", line 859, in _getitem_lowerdim
return self._handle_lowerdim_multi_index_axis0(tup)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexing.py", line 1160, in _handle_lowerdim_multi_index_axis0
return self._get_label(tup, axis=axis)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexing.py", line 1153, in _get_label
return self.obj.xs(label, axis=axis)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\generic.py", line 3857, in xs
loc, new_index = index._get_loc_level(key, level=0)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexes\multi.py", line 3043, in _get_loc_level
return (self._engine.get_loc(key), None)
File "pandas\_libs\index.pyx", line 777, in pandas._libs.index.BaseMultiIndexCodesEngine.get_loc
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2,) + inhomogeneous part.
Expected Behavior
mapper.loc[(False, 99)]
and
mapper.loc[[(False, 45), (False, 99)], :]
should work.
Installed Versions
INSTALLED VERSIONS
commit : 4bfe3d0
python : 3.9.7.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19042
machine : AMD64
processor : Intel64 Family 6 Model 142 Stepping 12, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : Danish_Denmark.1252
pandas : 1.4.2
numpy : 1.22.3
pytz : 2021.3
dateutil : 2.8.2
pip : 20.2.1
setuptools : 58.0.4
Cython : 0.29.24
pytest : 6.2.4
hypothesis : None
sphinx : 4.2.0
blosc : None
feather : None
xlsxwriter : 3.0.1
lxml.etree : 4.6.3
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.3
IPython : 7.29.0
pandas_datareader: None
bs4 : 4.10.0
bottleneck : 1.3.2
brotli :
fastparquet : None
fsspec : 2021.10.1
gcsfs : None
markupsafe : 1.1.1
matplotlib : 3.5.1
numba : None
numexpr : 2.7.3
odfpy : None
openpyxl : 3.0.9
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.7.1
snappy : None
sqlalchemy : 1.4.22
tables : 3.6.1
tabulate : 0.8.9
xarray : None
xlrd : 2.0.1
xlwt : 1.3.0
zstandard : None