Skip to content

BUG: sort=False ignored when grouping with a categorical column #8868

Closed
@aimboden

Description

@aimboden

Hello everyone,

I stumbled upon the following behavior of groubpy with categorical, which seems at least inconsistent with the way groupby usually operates.

When grouping on a string type column with sort=False, the order of the groups is the order in which the keys first appear in the column.

However, when grouping with a categorical column, the groups seem to be always ordered by the categorical, even when sort=False.

import pandas as pd
d = {'foo': [10, 8, 5, 6, 4, 1, 7], 'bar': [10, 20, 30, 40, 50, 60, 70],
     'baz': ['d', 'c', 'e', 'a', 'a', 'd', 'c']}
df = pd.DataFrame(d)
cat = pd.cut(df['foo'], np.linspace(0, 10, 5))
df['range'] = cat
groups = df.groupby('range', sort=True)
# Expected behaviour
result = groups.agg('mean')

# Why are the categorical still sorted in this case ?
groups2 = df.groupby('range', sort=False)
result2 = groups2.agg('mean')

# I would expect an output like this one: keep the order in which the groups
# are first encountered
groups3 = df.groupby('baz', sort=False)
result3 = groups3.agg('mean')
result
bar foo
range
(0, 2.5] 60 1.0
(2.5, 5] 40 4.0
(5, 7.5] 55 6.5
(7.5, 10] 15 CC
result2
bar foo
range
(0, 2.5] 60 1.0
(2.5, 5] 40 4.0
(5, 7.5] 55 6.5
(7.5, 10] 15 CC
result3
bar foo
baz
d 35 5.5
c 45 7.5
e 30 5.0
a 45 9.0
pd.__version__
Out[110]: '0.15.1'

Setting as_index=False does not change the presented bahavior.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions