BUG: ensure reindex / getitem to select columns properly copies data for extension dtypes #51197
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I encountered this while writing more tests for Copy-on-Write. Currently, the general rule is that selecting columns with a list-like indexer using getitem gives a copy:
However, that doesn't seem to be the case when the columns we select are extension dtypes. When using
dtype="Float64"
in the above example, the originaldf
gets updated because subset isn't a copy.While I am not sure this is an explicitly documented rule (AFAIK this is de-facto behaviour, and only described as such in the discussions related to copy/view and CoW), I do think it would be expected that extension dtypes behave the same as numpy dtypes on this front.
(it also makes writing tests for copy/view behaviour harder if the behaviour doesn't only change for CoW or not, but also depending on numpy vs extension dtypes. This is where I encountered the issue)
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.