Skip to content

CLN: In DecimalArray implement aliases as properties #26900

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 22 additions & 6 deletions pandas/tests/extension/decimal/array.py
Original file line number Diff line number Diff line change
Expand Up @@ -59,14 +59,30 @@ def __init__(self, values, dtype=None, copy=False, context=None):
values = np.asarray(values, dtype=object)

self._data = values
# Some aliases for common attribute names to ensure pandas supports
# these
self._items = self.data = self._data
# those aliases are currently not working due to assumptions
# in internal code (GH-20735)
# self._values = self.values = self.data
self._dtype = DecimalDtype(context)

# Certain pandas function currently refer to these attributes
# expecting them to return a numpy array of values. here we
# just treat them as synonyms for `self._data`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not correct, so some historical note (not that you should have known, so no blame!): pandas relied in several places on doing things like getattr(object, 'values') to ensure to get the underlying ndarray values of an object if it was not yet an ndarray (eg where the object could have been an Index or Series). That's something we don't really like and are trying to get more structure in (but also for us, the pandas code is huge and complex). But so those attributes we added here are rather to ensure that we are not using those wrongly. As from pandas code, we should never directly access the "underlying data" of an EA; all interaction with the EA should go through the official documented interface.

So those attributes here were added in order to have an EA exercise the tests that has some additional attributes that are not part of the official interface, to make sure they don't accidentally mess up some functionality covered in the tests.

And since values and _values are still commented out, it seems we were (at the time) not yet sure that it would work, or it probably did not yet work. It would be good (but not here in this PR) to at some point try to fix this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See eg #20735 as one example of this

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahhh, I see. Not brief, but clear!

Copy link
Author

@ghost ghost Jun 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, this PR is trying to correct something which was put there as a canary, to catch these bugs? that still doesn't make sense. If you want to catch accesses to these, this is a bad way to do it, since you're relying on the access to have an effect you test for. It won't catch reads using the wrong attribute name for example.

Why not make them properties which throw when accessed?

Copy link
Member

@jorisvandenbossche jorisvandenbossche Jun 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to to ensure pandas does not try to "access" the value (which this tests are a way to do), but I agree that "setting" the value (which we also don't want pandas to do) can throw an error. So converting to the property instead of attribute is indeed fine.

Copy link
Author

@ghost ghost Jun 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How could the original approach catch reads? OTOH, converting to properties will completely hide the problem even of writes, unless the properties are boobytrapped.

Maybe I still don't understand what you were trying to accomplish.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: uncomment the values and _values aliasing, run the decimals tests, and see if any test fail.

Tests will be failing, because pandas is assuming it can do getattr(obj, 'values') on some object and expect that it extracts the data of eg a Series, but not the private data of an EA. See #20735 for an example.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. and what about data and _items which weren't commented out?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the tests are working, pandas does not call those (anymore). Or at least not in a way that gives incorrect results.
But we added them here to have that tested.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the failing test due to _values, and why it's difficult to figure out the cases. This PR is kind of mired in that purgatory, so I think I'll just close. Ok?

@property
def _items(self):
return self._data

@property
def data(self):
return self._data

# those aliases are currently not working due to assumptions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't added commented code unless its germane (its not here)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which assertion do you mean?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There were 2 commented out attributes. I tuned them into properties with the rest, then commented them out to not delete what the author saw fit to leave in. Not sure what to do here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is fine to keep this in comments.

But for briefness, I think it is enough to have only one property (I would take data) and then you can simply do items = data as alias, and keep the values and _values in a similar way as alias in comments.

# in internal code (GH-20735)
# @property
# def _values(self):
# return self._data

# @property
# def values(self):
# return self._data
# end aliases

@property
def dtype(self):
return self._dtype
Expand Down