-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
CLN: In DecimalArray implement aliases as properties #26900
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -59,14 +59,30 @@ def __init__(self, values, dtype=None, copy=False, context=None): | |
values = np.asarray(values, dtype=object) | ||
|
||
self._data = values | ||
# Some aliases for common attribute names to ensure pandas supports | ||
# these | ||
self._items = self.data = self._data | ||
# those aliases are currently not working due to assumptions | ||
# in internal code (GH-20735) | ||
# self._values = self.values = self.data | ||
self._dtype = DecimalDtype(context) | ||
|
||
# Certain pandas function currently refer to these attributes | ||
# expecting them to return a numpy array of values. here we | ||
# just treat them as synonyms for `self._data`. | ||
@property | ||
def _items(self): | ||
return self._data | ||
|
||
@property | ||
def data(self): | ||
return self._data | ||
|
||
# those aliases are currently not working due to assumptions | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. don't added commented code unless its germane (its not here) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Which assertion do you mean? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There were 2 commented out attributes. I tuned them into properties with the rest, then commented them out to not delete what the author saw fit to leave in. Not sure what to do here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, it is fine to keep this in comments. But for briefness, I think it is enough to have only one property (I would take |
||
# in internal code (GH-20735) | ||
# @property | ||
# def _values(self): | ||
# return self._data | ||
|
||
# @property | ||
# def values(self): | ||
# return self._data | ||
# end aliases | ||
|
||
@property | ||
def dtype(self): | ||
return self._dtype | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not correct, so some historical note (not that you should have known, so no blame!): pandas relied in several places on doing things like
getattr(object, 'values')
to ensure to get the underlying ndarray values of an object if it was not yet an ndarray (eg where the object could have been an Index or Series). That's something we don't really like and are trying to get more structure in (but also for us, the pandas code is huge and complex). But so those attributes we added here are rather to ensure that we are not using those wrongly. As from pandas code, we should never directly access the "underlying data" of an EA; all interaction with the EA should go through the official documented interface.So those attributes here were added in order to have an EA exercise the tests that has some additional attributes that are not part of the official interface, to make sure they don't accidentally mess up some functionality covered in the tests.
And since
values
and_values
are still commented out, it seems we were (at the time) not yet sure that it would work, or it probably did not yet work. It would be good (but not here in this PR) to at some point try to fix this.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See eg #20735 as one example of this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahhh, I see. Not brief, but clear!
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, this PR is trying to correct something which was put there as a canary, to catch these bugs? that still doesn't make sense. If you want to catch accesses to these, this is a bad way to do it, since you're relying on the access to have an effect you test for. It won't catch reads using the wrong attribute name for example.
Why not make them properties which throw when accessed?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to to ensure pandas does not try to "access" the value (which this tests are a way to do), but I agree that "setting" the value (which we also don't want pandas to do) can throw an error. So converting to the property instead of attribute is indeed fine.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How could the original approach catch reads? OTOH, converting to properties will completely hide the problem even of writes, unless the properties are boobytrapped.
Maybe I still don't understand what you were trying to accomplish.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: uncomment the
values
and_values
aliasing, run the decimals tests, and see if any test fail.Tests will be failing, because pandas is assuming it can do
getattr(obj, 'values'
) on some object and expect that it extracts the data of eg a Series, but not the private data of an EA. See #20735 for an example.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. and what about data and _items which weren't commented out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the tests are working, pandas does not call those (anymore). Or at least not in a way that gives incorrect results.
But we added them here to have that tested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the failing test due to _values, and why it's difficult to figure out the cases. This PR is kind of mired in that purgatory, so I think I'll just close. Ok?