Skip to content

BUG: MultiIndex with IntervalIndex level fails when indexing into interval with inf #46699

Open
@dr-vinnie

Description

@dr-vinnie

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import numpy as np
import pandas as pd

# Age categories for 'with children'
age_cat_w_children = pd.DataFrame(
    data={
        'Result': ['001', '002', '003', '004', '005', '006', '007', '008', '009', '010']
    },
    index=pd.IntervalIndex.from_breaks(
        breaks=[-np.inf, 29, 35, 40, 45, 50, 55, 59, 65, 70, np.inf],
        closed='left',
        name='age',
    )
)
# Age categories for 'without children'
age_cat_wo_children = pd.DataFrame(
    data={
        'Result': ['101', '102', '103', '104', '105', '106', '107', '108']
    },
    index=pd.IntervalIndex.from_breaks(
        breaks=[-np.inf, 29, 35, 40, 45, 50, 55, 59, np.inf],
        closed='left',
        name='age',
    )
)
# Combine both categories into one.
mapper = pd.concat(
    objs=[age_cat_wo_children, age_cat_w_children],
    axis=0,
    keys=[False, True],
    names=['children']
)

# These works, as they are supposed to.
print(mapper.loc[(True, 40)])
print(mapper.loc[(False, 40)])
print(mapper.loc[[(False, 45), (False, 35)], :])
# However, these fails.
print(mapper.loc[(False, 99)])
print(mapper.loc[[(False, 45), (False, 99)], :])

Issue Description

Selecting rows from MultiIndex containing IntervalIndex fails in some inconsistent cases.
print(mapper.loc[(False, 99)]) gives

Traceback (most recent call last):
  File "C:\ProgramData\Anaconda3\lib\code.py", line 90, in runcode
    exec(code, self.locals)
  File "<input>", line 1, in <module>
  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexing.py", line 961, in __getitem__
    return self._getitem_tuple(key)
  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexing.py", line 1140, in _getitem_tuple
    return self._getitem_lowerdim(tup)
  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexing.py", line 859, in _getitem_lowerdim
    return self._handle_lowerdim_multi_index_axis0(tup)
  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexing.py", line 1160, in _handle_lowerdim_multi_index_axis0
    return self._get_label(tup, axis=axis)
  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexing.py", line 1153, in _get_label
    return self.obj.xs(label, axis=axis)
  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\generic.py", line 3857, in xs
    loc, new_index = index._get_loc_level(key, level=0)
  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexes\multi.py", line 3043, in _get_loc_level
    return (self._engine.get_loc(key), None)
  File "pandas\_libs\index.pyx", line 777, in pandas._libs.index.BaseMultiIndexCodesEngine.get_loc
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2,) + inhomogeneous part.

Expected Behavior

mapper.loc[(False, 99)]
and
mapper.loc[[(False, 45), (False, 99)], :]
should work.

Installed Versions

INSTALLED VERSIONS

commit : 4bfe3d0
python : 3.9.7.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19042
machine : AMD64
processor : Intel64 Family 6 Model 142 Stepping 12, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : Danish_Denmark.1252
pandas : 1.4.2
numpy : 1.22.3
pytz : 2021.3
dateutil : 2.8.2
pip : 20.2.1
setuptools : 58.0.4
Cython : 0.29.24
pytest : 6.2.4
hypothesis : None
sphinx : 4.2.0
blosc : None
feather : None
xlsxwriter : 3.0.1
lxml.etree : 4.6.3
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.3
IPython : 7.29.0
pandas_datareader: None
bs4 : 4.10.0
bottleneck : 1.3.2
brotli :
fastparquet : None
fsspec : 2021.10.1
gcsfs : None
markupsafe : 1.1.1
matplotlib : 3.5.1
numba : None
numexpr : 2.7.3
odfpy : None
openpyxl : 3.0.9
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.7.1
snappy : None
sqlalchemy : 1.4.22
tables : 3.6.1
tabulate : 0.8.9
xarray : None
xlrd : 2.0.1
xlwt : 1.3.0
zstandard : None

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugIndexingRelated to indexing on series/frames, not to indexes themselvesIntervalInterval data type

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions