Skip to content

BUG: Series.loc raising KeyError for Iterator indexer in case of setitem #39623

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Apr 16, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions doc/source/whatsnew/v1.3.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -691,6 +691,8 @@ Indexing
- Bug in :meth:`DataFrame.__setitem__` raising ``ValueError`` with empty :class:`DataFrame` and specified columns for string indexer and non empty :class:`DataFrame` to set (:issue:`38831`)
- Bug in :meth:`DataFrame.loc.__setitem__` raising ValueError when expanding unique column for :class:`DataFrame` with duplicate columns (:issue:`38521`)
- Bug in :meth:`DataFrame.iloc.__setitem__` and :meth:`DataFrame.loc.__setitem__` with mixed dtypes when setting with a dictionary value (:issue:`38335`)
- Bug in :meth:`Series.loc.__setitem__` and :meth:`DataFrame.loc.__setitem__` raising ``KeyError`` for boolean Iterator indexer (:issue:`39614`)
- Bug in :meth:`Series.iloc` and :meth:`DataFrame.iloc` raising ``KeyError`` for Iterator indexer (:issue:`39614`)
- Bug in :meth:`DataFrame.__setitem__` not raising ``ValueError`` when right hand side is a :class:`DataFrame` with wrong number of columns (:issue:`38604`)
- Bug in :meth:`Series.__setitem__` raising ``ValueError`` when setting a :class:`Series` with a scalar indexer (:issue:`38303`)
- Bug in :meth:`DataFrame.loc` dropping levels of :class:`MultiIndex` when :class:`DataFrame` used as input has only one row (:issue:`10521`)
Expand Down
10 changes: 10 additions & 0 deletions pandas/core/indexing.py
Original file line number Diff line number Diff line change
Expand Up @@ -703,6 +703,7 @@ def _ensure_listlike_indexer(self, key, axis=None, value=None):

def __setitem__(self, key, value):
if isinstance(key, tuple):
key = tuple(list(x) if is_iterator(x) else x for x in key)
key = tuple(com.apply_if_callable(x, self.obj) for x in key)
else:
key = com.apply_if_callable(key, self.obj)
Expand Down Expand Up @@ -911,6 +912,7 @@ def _convert_to_indexer(self, key, axis: int, is_setter: bool = False):

def __getitem__(self, key):
if type(key) is tuple:
key = tuple(list(x) if is_iterator(x) else x for x in key)
key = tuple(com.apply_if_callable(x, self.obj) for x in key)
if self._is_scalar_access(key):
with suppress(KeyError, IndexError, AttributeError):
Expand Down Expand Up @@ -1242,6 +1244,9 @@ def _convert_to_indexer(self, key, axis: int, is_setter: bool = False):

elif is_list_like_indexer(key):

if is_iterator(key):
key = list(key)

if com.is_bool_indexer(key):
key = check_bool_indexer(labels, key)
(inds,) = key.nonzero()
Expand Down Expand Up @@ -1530,6 +1535,9 @@ def _getitem_axis(self, key, axis: int):
if isinstance(key, slice):
return self._get_slice_axis(key, axis=axis)

if is_iterator(key):
key = list(key)

if isinstance(key, list):
key = np.asarray(key)

Expand Down Expand Up @@ -1571,6 +1579,8 @@ def _convert_to_indexer(self, key, axis: int, is_setter: bool = False):

def _get_setitem_indexer(self, key):
# GH#32257 Fall through to let numpy do validation
if is_iterator(key):
return list(key)
return key

# -------------------------------------------------------------------
Expand Down
15 changes: 15 additions & 0 deletions pandas/tests/frame/indexing/test_getitem.py
Original file line number Diff line number Diff line change
Expand Up @@ -144,6 +144,21 @@ def test_getitem_listlike(self, idx_type, levels, float_frame):
with pytest.raises(KeyError, match="not in index"):
frame[idx]

def test_getitem_iloc_generator(self):
# GH#39614
df = DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
indexer = (x for x in [1, 2])
result = df.iloc[indexer]
expected = DataFrame({"a": [2, 3], "b": [5, 6]}, index=[1, 2])
tm.assert_frame_equal(result, expected)

def test_getitem_iloc_two_dimensional_generator(self):
df = DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
indexer = (x for x in [1, 2])
result = df.iloc[indexer, 1]
expected = Series([5, 6], name="b", index=[1, 2])
tm.assert_series_equal(result, expected)


class TestGetitemCallable:
def test_getitem_callable(self, float_frame):
Expand Down
15 changes: 15 additions & 0 deletions pandas/tests/frame/indexing/test_setitem.py
Original file line number Diff line number Diff line change
Expand Up @@ -610,6 +610,21 @@ def test_setitem_list_of_tuples(self, float_frame):
expected = Series(tuples, index=float_frame.index, name="tuples")
tm.assert_series_equal(result, expected)

def test_setitem_iloc_generator(self):
# GH#39614
df = DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
indexer = (x for x in [1, 2])
df.iloc[indexer] = 1
expected = DataFrame({"a": [1, 1, 1], "b": [4, 1, 1]})
tm.assert_frame_equal(df, expected)

def test_setitem_iloc_two_dimensional_generator(self):
df = DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
indexer = (x for x in [1, 2])
df.iloc[indexer, 1] = 1
expected = DataFrame({"a": [1, 2, 3], "b": [4, 1, 1]})
tm.assert_frame_equal(df, expected)


class TestSetitemTZAwareValues:
@pytest.fixture
Expand Down
6 changes: 6 additions & 0 deletions pandas/tests/series/indexing/test_setitem.py
Original file line number Diff line number Diff line change
Expand Up @@ -556,6 +556,9 @@ def test_int_key(self, obj, key, expected, val, indexer_sli, is_inplace):
indkey = np.array(ilkey)
self.check_indexer(obj, indkey, expected, val, indexer_sli, is_inplace)

genkey = (x for x in [key])
self.check_indexer(obj, genkey, expected, val, indexer_sli, is_inplace)

def test_slice_key(self, obj, key, expected, val, indexer_sli, is_inplace):
if not isinstance(key, slice):
return
Expand All @@ -570,6 +573,9 @@ def test_slice_key(self, obj, key, expected, val, indexer_sli, is_inplace):
indkey = np.array(ilkey)
self.check_indexer(obj, indkey, expected, val, indexer_sli, is_inplace)

genkey = (x for x in indkey)
self.check_indexer(obj, genkey, expected, val, indexer_sli, is_inplace)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these tests are good, but only cover Series. do we need analogous DataFrame tests?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@phofl i think this was the main outstanding question. if the answer is "no" then this should be good to go (merge master just to be on the safe side)

Copy link
Member Author

@phofl phofl Mar 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the long wait. I have more time on my handy in the coming weeks and months again :)

One-dimensional indexers on DataFrames work, but two dimensionals like

df = DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})

indexer = (x for x in [1, 2])
df.iloc[indexer, 1]

still don't, because we access the tuple elements to check things before doing anything, which leaves an empty generator behind. Only way I am seeing currently is converting this directly in setitem and getitem. Thoughts @jbrockmendel ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only way I am seeing currently is converting this directly in setitem and getitem

I expect this is probably right

Copy link
Member Author

@phofl phofl Mar 31, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will Put something up then tomorrow or the day after

def test_mask_key(self, obj, key, expected, val, indexer_sli):
# setitem with boolean mask
mask = np.zeros(obj.shape, dtype=bool)
Expand Down