Skip to content

API: rename labels to codes in core/groupby #29402

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Nov 7, 2019

Conversation

topper-123
Copy link
Contributor

This PR renames the various *label* names in core/groupby to like-named *codes*.

I think the name label can be confused by the single values in a index, and codes sound smore like an array of ints, so by renaming we get a cleaner nomenclature, IMO.

All these attributes/methods are internal, so no deprecations needed.

@@ -240,9 +240,7 @@ class Grouping:
-------
**Attributes**:
* indices : dict of {group -> index_list}
* labels : ndarray, group labels
* ids : mapping of label -> group
* counts : array of group counts
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No attributes on Grouping are actually named ids and counts.

@@ -654,16 +654,16 @@ def value_counts(
rep = partial(np.repeat, repeats=np.add.reduceat(inc, idx))

# multi-index components
labels = list(map(rep, self.grouper.recons_labels)) + [llab(lab, inc)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use a list comprehension here

@@ -693,14 +693,14 @@ def value_counts(
# for compat. with libgroupby.value_counts need to ensure every
# bin is present at every index level, null filled with zeros
diff = np.zeros(len(out), dtype="bool")
for lab in labels[:-1]:
diff |= np.r_[True, lab[1:] != lab[:-1]]
for codes_ in codes[:-1]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code? val?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codes is a list of arrays here, so I can call it level_codes.

@@ -710,7 +710,7 @@ def value_counts(
out, left[-1] = out[sorter], left[-1][sorter]

# build the multi-index w/ full levels
codes = list(map(lambda lab: np.repeat(lab[diff], nbin), labels[:-1]))
codes = list(map(lambda codes: np.repeat(codes[diff], nbin), codes[:-1]))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use a different name for the lambda arg? also list comprehension pls

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice PR - wasn't aware myself that these were in fact not actually labels

if self._labels is None:
self._make_labels()
return self._labels
def codes(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have a good docstring in mind having looked at this would be very helpful for others

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto on annotations...for any of these functions

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I typed it up a little, though there's still more to do.

@@ -59,7 +59,7 @@ class Grouper:
<http://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases>`_.
axis : number/name of the axis, defaults to 0
sort : bool, default to False
whether to sort the resulting labels
whether to sort the resulting codes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this remain as labels? I think our sorting is actually done that way and not off of any internal codes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you're right, I'll revert that. As a sidenote, it sort by calling self.values so it's very slow. I think it should be changed to be based on codes (future PR, but that breaks the API, which isn't good after 1.0?).

@topper-123
Copy link
Contributor Author

topper-123 commented Nov 5, 2019

@WillAyd, Thanks. There are a few places where they actually are labels, and I even found one place where it started out as "real" labels and later changed into codes, while still being called labels...not easy to keep track of.

I thinkt we should try to get the label names that actually are code arrays to be renamed to codes. For example in pd.factorize the labels name is still used.

@topper-123 topper-123 added this to the 1.0 milestone Nov 5, 2019
@topper-123 topper-123 force-pushed the rename_groupby_labels branch 2 times, most recently from a23b1ca to 862d611 Compare November 5, 2019 08:16
@topper-123
Copy link
Contributor Author

Is this ok to take in, @WillAyd ?

@jreback
Copy link
Contributor

jreback commented Nov 6, 2019

looks good. can you merge master and ping on green (we just pushed some groupby changes).

@topper-123 topper-123 force-pushed the rename_groupby_labels branch from 862d611 to e6b2819 Compare November 6, 2019 21:18
@topper-123 topper-123 force-pushed the rename_groupby_labels branch 2 times, most recently from cd29f5c to 709439b Compare November 6, 2019 23:39
@topper-123 topper-123 force-pushed the rename_groupby_labels branch from 709439b to 785e2f9 Compare November 7, 2019 00:29
@topper-123 topper-123 merged commit e131b21 into pandas-dev:master Nov 7, 2019
@topper-123 topper-123 deleted the rename_groupby_labels branch November 7, 2019 01:34
Reksbril pushed a commit to Reksbril/pandas that referenced this pull request Nov 18, 2019
proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019
proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants