-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
ENH add cumcount groupby method #5510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2560,6 +2560,57 @@ def test_groupby_with_empty(self): | |
grouped = series.groupby(grouper) | ||
assert next(iter(grouped), None) is None | ||
|
||
def test_cumcount(self): | ||
df = DataFrame([['a'], ['a'], ['a'], ['b'], ['a']], columns=['A']) | ||
g = df.groupby('A') | ||
sg = g.A | ||
|
||
expected = Series([0, 1, 2, 0, 3]) | ||
|
||
assert_series_equal(expected, g.cumcount()) | ||
assert_series_equal(expected, sg.cumcount()) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. maybe test for what happens if you have empty DataFrame? grouped Series? cumcount on something that's not a column (i.e., passed into the object) and maybe one different dtype for good measure? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yup, tests are incredibly light. sg is a grouped Series. Will add empty, it does work. Not sure what you mean by not an column.... :S There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. not that that matters for your implementation, but might be good to have if we replace with something faster for some reason There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. replaced with something faster, and added this test. |
||
|
||
def test_cumcount_empty(self): | ||
ge = DataFrame().groupby() | ||
se = Series().groupby() | ||
|
||
e = Series(dtype='int') # edge case, as this is usually considered float | ||
|
||
assert_series_equal(e, ge.cumcount()) | ||
assert_series_equal(e, se.cumcount()) | ||
|
||
def test_cumcount_dupe_index(self): | ||
df = DataFrame([['a'], ['a'], ['a'], ['b'], ['a']], columns=['A'], index=[0] * 5) | ||
g = df.groupby('A') | ||
sg = g.A | ||
|
||
expected = Series([0, 1, 2, 0, 3], index=[0] * 5) | ||
|
||
assert_series_equal(expected, g.cumcount()) | ||
assert_series_equal(expected, sg.cumcount()) | ||
|
||
def test_cumcount_mi(self): | ||
mi = MultiIndex.from_tuples([[0, 1], [1, 2], [2, 2], [2, 2], [1, 0]]) | ||
df = DataFrame([['a'], ['a'], ['a'], ['b'], ['a']], columns=['A'], index=mi) | ||
g = df.groupby('A') | ||
sg = g.A | ||
|
||
expected = Series([0, 1, 2, 0, 3], index=mi) | ||
|
||
assert_series_equal(expected, g.cumcount()) | ||
assert_series_equal(expected, sg.cumcount()) | ||
|
||
def test_cumcount_groupby_not_col(self): | ||
df = DataFrame([['a'], ['a'], ['a'], ['b'], ['a']], columns=['A'], index=[0] * 5) | ||
g = df.groupby([0, 0, 0, 1, 0]) | ||
sg = g.A | ||
|
||
expected = Series([0, 1, 2, 0, 3], index=[0] * 5) | ||
|
||
assert_series_equal(expected, g.cumcount()) | ||
assert_series_equal(expected, sg.cumcount()) | ||
|
||
|
||
def test_filter_series(self): | ||
import pandas as pd | ||
s = pd.Series([1, 3, 20, 5, 22, 24, 7]) | ||
|
@@ -3180,7 +3231,7 @@ def test_tab_completion(self): | |
'min','name','ngroups','nth','ohlc','plot', 'prod', | ||
'size','std','sum','transform','var', 'count', 'head', 'describe', | ||
'cummax', 'dtype', 'quantile', 'rank', 'cumprod', 'tail', | ||
'resample', 'cummin', 'fillna', 'cumsum']) | ||
'resample', 'cummin', 'fillna', 'cumsum', 'cumcount']) | ||
self.assertEqual(results, expected) | ||
|
||
def assert_fp_equal(a, b): | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mind removing this blank line if you're editing this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added it in specifically, pep8 says it should be there, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, no idea.