Skip to content

BUG: Fixed assign failure when with Copy-on-Write #60941

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Closed
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v2.3.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ Other enhancements
updated to work correctly with NumPy >= 2 (:issue:`57739`)
- :meth:`Series.str.decode` result now has ``StringDtype`` when ``future.infer_string`` is True (:issue:`60709`)
- :meth:`~Series.to_hdf` and :meth:`~DataFrame.to_hdf` now round-trip with ``StringDtype`` (:issue:`60663`)
- The :meth:`DataFrame.iloc` now works correctly with ``copy_on_write`` option (:issue:`60309`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you give a little more detail here. Perhaps adding

after subsetting the columns of a DataFrame and using a slice

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, Solved in 725b41b

- The :meth:`~Series.cumsum`, :meth:`~Series.cummin`, and :meth:`~Series.cummax` reductions are now implemented for ``StringDtype`` columns when backed by PyArrow (:issue:`60633`)
- The :meth:`~Series.sum` reduction is now implemented for ``StringDtype`` columns (:issue:`59853`)

Expand Down
7 changes: 6 additions & 1 deletion pandas/core/internals/managers.py
Original file line number Diff line number Diff line change
Expand Up @@ -572,7 +572,12 @@ def setitem(self, indexer, value) -> Self:
0, blk_loc, values
)
# first block equals values
self.blocks[0].setitem((indexer[0], np.arange(len(blk_loc))), value)
col_indexer: slice | np.ndarray
if isinstance(indexer[1], slice) and indexer[1] == slice(None):
col_indexer = slice(None)
else:
col_indexer = np.arange(len(blk_loc))
self.blocks[0].setitem((indexer[0], col_indexer), value)
Comment on lines +576 to +580
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear to me why it's correct to pass through slice(None) but not other cases, e.g. slice(0, 3, 2) or [0, 1].

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right. My current changes failed under the following test case. I'll mark the PR as a draft until I properly handle the bug and will also add more test cases. Thanks!

dftest3 = pd.DataFrame(
    {"A": [1, 4, 1, 5], "B": [2, 5, 2, 6], "C": [3, 6, 1, 7], "D": [8, 9, 10, 11]}
)
df3 = dftest3[["B", "C", "D"]]
df3.iloc[[1, 3], 0:3:2] = [[2, 2], [2, 2]]

return self
# No need to split if we either set all columns or on a single block
# manager
Expand Down
24 changes: 24 additions & 0 deletions pandas/tests/indexing/test_iloc.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@
date_range,
interval_range,
isna,
option_context,
to_datetime,
)
import pandas._testing as tm
Expand Down Expand Up @@ -1448,3 +1449,26 @@ def test_iloc_nullable_int64_size_1_nan(self):
result = DataFrame({"a": ["test"], "b": [np.nan]})
with pytest.raises(TypeError, match="Invalid value"):
result.loc[:, "b"] = result.loc[:, "b"].astype("Int64")

def test_iloc_setitem_list_with_cow(self):
# GH#60309
with option_context("mode.copy_on_write", True):
dftest = DataFrame(
{"A": [1, 4, 1, 5], "B": [2, 5, 2, 6], "C": [3, 6, 1, 7]}
)
df = dftest[["B", "C"]]

# Perform the iloc operation
df.iloc[[1, 3], :] = [[2, 2], [2, 2]]

# Check that original DataFrame is unchanged
expected_orig = DataFrame(
{"A": [1, 4, 1, 5], "B": [2, 5, 2, 6], "C": [3, 6, 1, 7]}
)
tm.assert_frame_equal(dftest, expected_orig)

# Check that view is modified correctly
expected_view = DataFrame(
{"B": [2, 2, 2, 2], "C": [3, 2, 1, 2]}, index=df.index
)
tm.assert_frame_equal(df, expected_view)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Once the assignment is done, this is no longer a view. Can you change to just "df" instead of "view".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, sovled in fcffbf2

Loading