Closed
Description
Pandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
This code produces a KeyError with an incorrect message (aka incorrect key tested)
df = pd.DataFrame({(1,2): ['a', 'b', 'c'],
(1,3): ['d', 'e', 'f'],
(2,2): ['g', 'h', 'i'],
(2,4): ['j', 'k', 'l']})
print(df)
# (1,4) is invalid as a pair but
# 1 is a valid primary key with [2,3] as secondary keys
# and 4 is a valid secondary key for the primary key 2
df.loc[0, (1, 4)]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.11/site-packages/pandas/core/indexing.py", line 1101, in __getitem__
return self._getitem_tuple(key)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pandas/core/indexing.py", line 1284, in _getitem_tuple
return self._getitem_lowerdim(tup)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pandas/core/indexing.py", line 980, in _getitem_lowerdim
return self._getitem_nested_tuple(tup)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pandas/core/indexing.py", line 1081, in _getitem_nested_tuple
obj = getattr(obj, self.name)._getitem_axis(key, axis=axis)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pandas/core/indexing.py", line 1347, in _getitem_axis
return self._get_label(key, axis=axis)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pandas/core/indexing.py", line 1297, in _get_label
return self.obj.xs(label, axis=axis)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pandas/core/generic.py", line 4069, in xs
return self[key]
~~~~^^^^^
File "/usr/local/lib/python3.11/site-packages/pandas/core/frame.py", line 3739, in __getitem__
return self._getitem_multilevel(key)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pandas/core/frame.py", line 3794, in _getitem_multilevel
loc = self.columns.get_loc(key)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pandas/core/indexes/multi.py", line 2825, in get_loc
return self._engine.get_loc(key)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "pandas/_libs/index.pyx", line 832, in pandas._libs.index.BaseMultiIndexCodesEngine.get_loc
File "pandas/_libs/index.pyx", line 147, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 176, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 2152, in pandas._libs.hashtable.UInt64HashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 2176, in pandas._libs.hashtable.UInt64HashTable.get_item
KeyError: 20
FWIW: this is a similarly wrong key but notice the different integer in the KeyError
df.loc[0, (2, 3)]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.11/site-packages/pandas/core/indexing.py", line 1101, in __getitem__
return self._getitem_tuple(key)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pandas/core/indexing.py", line 1284, in _getitem_tuple
return self._getitem_lowerdim(tup)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pandas/core/indexing.py", line 980, in _getitem_lowerdim
return self._getitem_nested_tuple(tup)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pandas/core/indexing.py", line 1081, in _getitem_nested_tuple
obj = getattr(obj, self.name)._getitem_axis(key, axis=axis)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pandas/core/indexing.py", line 1347, in _getitem_axis
return self._get_label(key, axis=axis)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pandas/core/indexing.py", line 1297, in _get_label
return self.obj.xs(label, axis=axis)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pandas/core/generic.py", line 4069, in xs
return self[key]
~~~~^^^^^
File "/usr/local/lib/python3.11/site-packages/pandas/core/frame.py", line 3739, in __getitem__
return self._getitem_multilevel(key)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pandas/core/frame.py", line 3794, in _getitem_multilevel
loc = self.columns.get_loc(key)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pandas/core/indexes/multi.py", line 2825, in get_loc
return self._engine.get_loc(key)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "pandas/_libs/index.pyx", line 832, in pandas._libs.index.BaseMultiIndexCodesEngine.get_loc
File "pandas/_libs/index.pyx", line 147, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 176, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 2152, in pandas._libs.hashtable.UInt64HashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 2176, in pandas._libs.hashtable.UInt64HashTable.get_item
KeyError: 27
### Issue Description
In a multi indexed dataframe, pd.DataFrame.loc with a partially or completely incorrect index raises a KeyError and provides the missing key in the error message. However, providing a key pair `(kouter_1, kinner_1)` where `(kouter_1, kinner_1)` is not a valid keypair but `(kouter_2, kinner_1)` is a valid keypair (i.e the secondary key exists in the union of all secondary keys but is not a valid partner for that primary key) raises the expected KeyError but an unhelpful random integer as the incorrect key it is attempting to retrieve from the df. Interestingly, the integer isn't the same across machines or versions, however on the same machine + version the value is consistent. I've tested this with 1. 3.4, 1.5.0, 1.5.3, and 2.0.0rc0
### Expected Behavior
The call to `df.loc[0, (1, 4)]` above should raise the KeyError
KeyError: (1, 4)
### Installed Versions
/usr/local/lib/python3.11/site-packages/_distutils_hack/__init__.py:33: UserWarning: Setuptools is replacing distutils.
warnings.warn("Setuptools is replacing distutils.")
INSTALLED VERSIONS
------------------
commit : 1a2e300170efc08cb509a0b4ff6248f8d55ae777
python : 3.11.2.final.0
python-bits : 64
OS : Linux
OS-release : 5.15.0-1030-aws
Version : #34~20.04.1-Ubuntu SMP Tue Jan 24 15:16:46 UTC 2023
machine : x86_64
processor :
byteorder : little
LC_ALL : None
LANG : C.UTF-8
LOCALE : en_US.UTF-8
pandas : 2.0.0rc0
numpy : 1.24.2
pytz : 2022.7.1
dateutil : 2.8.2
setuptools : 65.5.1
pip : 22.3.1
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
brotli : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : None
snappy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
zstandard : None
tzdata : None
qtpy : None
pyqt5 : None