Description
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
(optional) I have confirmed this bug exists on the master branch of pandas.
Code Sample, a copy-pastable example
import pandas as pd
df = pd.DataFrame({"A": [12,23,34,45]}, index = [list("aabb"), [0,1,2,3]])
print(df)
print("- - -")
print(df.loc[df.A < 30, :].loc[["b"], :]) # empty as expected
print("- - -")
print(df.loc[df.A < 10, :].loc[["b"], :]) # raises ValueError
Complete Output of Code Sample
A
a 0 12
1 23
b 2 34
3 45
- - -
Empty DataFrame
Columns: [A]
Index: []
- - -
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\test\venv124\lib\site-packages\pandas\core\indexing.py", line 889, in __getitem__
return self._getitem_tuple(key)
File "C:\test\venv124\lib\site-packages\pandas\core\indexing.py", line 1060, in _getitem_tuple
return self._getitem_lowerdim(tup)
File "C:\test\venv124\lib\site-packages\pandas\core\indexing.py", line 791, in _getitem_lowerdim
return self._getitem_nested_tuple(tup)
File "C:\test\venv124\lib\site-packages\pandas\core\indexing.py", line 865, in _getitem_nested_tuple
obj = getattr(obj, self.name)._getitem_axis(key, axis=axis)
File "C:\test\venv124\lib\site-packages\pandas\core\indexing.py", line 1113, in _getitem_axis
return self._getitem_iterable(key, axis=axis)
File "C:\test\venv124\lib\site-packages\pandas\core\indexing.py", line 1053, in _getitem_iterable
keyarr, indexer = self._get_listlike_indexer(key, axis, raise_missing=False)
File "C:\test\venv124\lib\site-packages\pandas\core\indexing.py", line 1254, in _get_listlike_indexer
indexer, keyarr = ax._convert_listlike_indexer(key)
File "C:\test\venv124\lib\site-packages\pandas\core\indexes\multi.py", line 2559, in _convert_listlike_indexer
_, indexer = self.reindex(keyarr, level=level)
File "C:\test\venv124\lib\site-packages\pandas\core\indexes\multi.py", line 2470, in reindex
target, indexer, _ = self._join_level(
File "C:\test\venv124\lib\site-packages\pandas\core\indexes\base.py", line 3924, in _join_level
ngroups = 1 + new_lev_codes.max()
File "C:\test\venv124\lib\site-packages\numpy\core\_methods.py", line 39, in _amax
return umr_maximum(a, axis, None, out, keepdims, initial, where)
ValueError: zero-size array to reduction operation maximum which has no identity
Problem description
In both cases the df.loc[df.A...]
returns a dataframe that doesn't contain any rows with an index value of "b".
Accordingly in the first case the result of .loc[["b"], :]
is an empty dataframe, but in the second case a ValueError is raised. The difference between the cases is that in the first case df.loc[df.A...]
returns a dataframe with some rows (though none with index value "b"), while in the second case df.loc[df.A...]
returns a dataframe with zero rows.
I think that shouldn't make a difference.
In the original code .loc[df.A...]
and .loc[["b"], :]
are not directly combined in one expression, but the first one creates a selection of rows of the dataframe, this selection is processed further, and during this another expression uses the second .loc
.
The traceback looks very similar to the one in #40235. Maybe both bugs have a common root cause.
Expected Output
df.loc[df.A < 10, :].loc[["b"], :]
returns an empty dataframe like df.loc[df.A < 30, :].loc[["b"], :]
does.
Output of pd.show_versions()
INSTALLED VERSIONS
commit : 2cb9652
python : 3.9.4.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.18362
machine : AMD64
...
pandas : 1.2.4
numpy : 1.20.2
pytz : 2021.1
dateutil : 2.8.1
...