-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
API: rename labels to codes in core/groupby #29402
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API: rename labels to codes in core/groupby #29402
Conversation
@@ -240,9 +240,7 @@ class Grouping: | |||
------- | |||
**Attributes**: | |||
* indices : dict of {group -> index_list} | |||
* labels : ndarray, group labels | |||
* ids : mapping of label -> group | |||
* counts : array of group counts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No attributes on Grouping
are actually named ids
and counts
.
@@ -654,16 +654,16 @@ def value_counts( | |||
rep = partial(np.repeat, repeats=np.add.reduceat(inc, idx)) | |||
|
|||
# multi-index components | |||
labels = list(map(rep, self.grouper.recons_labels)) + [llab(lab, inc)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you use a list comprehension here
pandas/core/groupby/generic.py
Outdated
@@ -693,14 +693,14 @@ def value_counts( | |||
# for compat. with libgroupby.value_counts need to ensure every | |||
# bin is present at every index level, null filled with zeros | |||
diff = np.zeros(len(out), dtype="bool") | |||
for lab in labels[:-1]: | |||
diff |= np.r_[True, lab[1:] != lab[:-1]] | |||
for codes_ in codes[:-1]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code? val?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Codes is a list of arrays here, so I can call it level_codes.
pandas/core/groupby/generic.py
Outdated
@@ -710,7 +710,7 @@ def value_counts( | |||
out, left[-1] = out[sorter], left[-1][sorter] | |||
|
|||
# build the multi-index w/ full levels | |||
codes = list(map(lambda lab: np.repeat(lab[diff], nbin), labels[:-1])) | |||
codes = list(map(lambda codes: np.repeat(codes[diff], nbin), codes[:-1])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you use a different name for the lambda arg? also list comprehension pls
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice PR - wasn't aware myself that these were in fact not actually labels
pandas/core/groupby/grouper.py
Outdated
if self._labels is None: | ||
self._make_labels() | ||
return self._labels | ||
def codes(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you have a good docstring in mind having looked at this would be very helpful for others
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto on annotations...for any of these functions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I typed it up a little, though there's still more to do.
pandas/core/groupby/grouper.py
Outdated
@@ -59,7 +59,7 @@ class Grouper: | |||
<http://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases>`_. | |||
axis : number/name of the axis, defaults to 0 | |||
sort : bool, default to False | |||
whether to sort the resulting labels | |||
whether to sort the resulting codes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this remain as labels? I think our sorting is actually done that way and not off of any internal codes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, you're right, I'll revert that. As a sidenote, it sort by calling self.values
so it's very slow. I think it should be changed to be based on codes (future PR, but that breaks the API, which isn't good after 1.0?).
@WillAyd, Thanks. There are a few places where they actually are labels, and I even found one place where it started out as "real" labels and later changed into codes, while still being called labels...not easy to keep track of. I thinkt we should try to get the label names that actually are code arrays to be renamed to codes. For example in |
a23b1ca
to
862d611
Compare
Is this ok to take in, @WillAyd ? |
looks good. can you merge master and ping on green (we just pushed some groupby changes). |
862d611
to
e6b2819
Compare
cd29f5c
to
709439b
Compare
709439b
to
785e2f9
Compare
This PR renames the various
*label*
names in core/groupby to like-named*codes*
.I think the name
label
can be confused by the single values in a index, andcodes
sound smore like an array of ints, so by renaming we get a cleaner nomenclature, IMO.All these attributes/methods are internal, so no deprecations needed.