Closed
Description
Hello everyone,
I stumbled upon the following behavior of groubpy with categorical, which seems at least inconsistent with the way groupby usually operates.
When grouping on a string type column with sort=False
, the order of the groups is the order in which the keys first appear in the column.
However, when grouping with a categorical column, the groups seem to be always ordered by the categorical, even when sort=False
.
import pandas as pd
d = {'foo': [10, 8, 5, 6, 4, 1, 7], 'bar': [10, 20, 30, 40, 50, 60, 70],
'baz': ['d', 'c', 'e', 'a', 'a', 'd', 'c']}
df = pd.DataFrame(d)
cat = pd.cut(df['foo'], np.linspace(0, 10, 5))
df['range'] = cat
groups = df.groupby('range', sort=True)
# Expected behaviour
result = groups.agg('mean')
# Why are the categorical still sorted in this case ?
groups2 = df.groupby('range', sort=False)
result2 = groups2.agg('mean')
# I would expect an output like this one: keep the order in which the groups
# are first encountered
groups3 = df.groupby('baz', sort=False)
result3 = groups3.agg('mean')
result
bar | foo | |
---|---|---|
range | ||
(0, 2.5] | 60 | 1.0 |
(2.5, 5] | 40 | 4.0 |
(5, 7.5] | 55 | 6.5 |
(7.5, 10] | 15 | CC |
result2
bar | foo | |
---|---|---|
range | ||
(0, 2.5] | 60 | 1.0 |
(2.5, 5] | 40 | 4.0 |
(5, 7.5] | 55 | 6.5 |
(7.5, 10] | 15 | CC |
result3
bar | foo | |
---|---|---|
baz | ||
d | 35 | 5.5 |
c | 45 | 7.5 |
e | 30 | 5.0 |
a | 45 | 9.0 |
pd.__version__
Out[110]: '0.15.1'
Setting as_index=False
does not change the presented bahavior.