CLN: Ensure that setitem ops don't coerce values #51671

phofl · 2023-02-27T11:45:48Z

closes #xxxx (Replace xxxx with the GitHub issue number)
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

Want to ensure that we don't coerce dtypes accidentally in the future. We shouldn't get here with DataFrames if we can avoid it. Changed the behavior of the single_block case a couple of weeks back.

jbrockmendel · 2023-02-27T18:51:06Z

pandas/tests/frame/indexing/test_insert.py

@@ -100,6 +100,6 @@ def test_insert_frame(self):
        # GH#42403
        df = DataFrame({"col1": [1, 2], "col2": [3, 4]})

-        msg = r"Expected a 1D array, got an array with shape \(2, 2\)"
+        msg = "Expected a one-dimensional object, got a DataFrame with 2 instead."


"2" here sounds like it is referring to 2 dimensions, which is accurate but i think not intended

Yikes yes, I forgot to add columns, thx

Updated, otherwise ok?

I think making this stricter helps in the long run

jbrockmendel · 2023-02-27T21:33:51Z

pandas/core/frame.py

-            return _reindex_for_setitem(value, self.index)
-        elif is_dict_like(value):
+        # Using a DataFrame would mean coercing values to one dtype
+        assert not isinstance(value, DataFrame)


no other cases that can be affected by disallowing this here?

None that I can think of or find. Isetitem iterates over the columns now. Don’t think that we can get here with a DataFrame now

Thanks for checking.

Not for this PR, but some things I saw while double-checking:

in isetitem isinstance(value, DataFrame) case should we check that len(loc) == len(value.columns)?
_iset_item is only called from _replace_columnwise, where we know we have a Series with matching index, might be simpler to skip sanitize_column?
Might make sense to refactor _set_item back into __setitem__, as the latter isn't that complicated and the former has a name really similar to a bunch of other names

Yes we should probably raise if they don't match

Should be faster at least

Makes setitem pretty long, but shouldn't really hurt?

jbrockmendel

LGTM

CLN: Ensure that setitem ops don't coerce values

99e4a77

phofl added the Refactor Internal refactoring of code label Feb 27, 2023

jbrockmendel reviewed Feb 27, 2023

View reviewed changes

Fix message

3365a61

jbrockmendel reviewed Feb 27, 2023

View reviewed changes

jbrockmendel approved these changes Feb 28, 2023

View reviewed changes

phofl added this to the 2.1 milestone Feb 28, 2023

phofl added the Indexing Related to indexing on series/frames, not to indexes themselves label Feb 28, 2023

phofl merged commit 7e410c1 into pandas-dev:main Feb 28, 2023

phofl deleted the indexing_coercion branch February 28, 2023 10:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

CLN: Ensure that setitem ops don't coerce values #51671

CLN: Ensure that setitem ops don't coerce values #51671

Uh oh!

phofl commented Feb 27, 2023

Uh oh!

jbrockmendel Feb 27, 2023

Uh oh!

phofl Feb 27, 2023

Uh oh!

phofl Feb 27, 2023

Uh oh!

jbrockmendel Feb 27, 2023

Uh oh!

phofl Feb 27, 2023

Uh oh!

jbrockmendel Feb 28, 2023

Uh oh!

jbrockmendel Feb 28, 2023

Uh oh!

phofl Feb 28, 2023

Uh oh!

jbrockmendel left a comment

Uh oh!

Uh oh!

Uh oh!

CLN: Ensure that setitem ops don't coerce values #51671

CLN: Ensure that setitem ops don't coerce values #51671

Uh oh!

Conversation

phofl commented Feb 27, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jbrockmendel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!