Skip to content

BUG: groupby nunique with Categorical and missing categories gives ValueError #11635

Closed
@jorisvandenbossche

Description

@jorisvandenbossche

From SO: http://stackoverflow.com/questions/33775560/how-to-group-categorical-values-in-pandas

import pandas as pd

df = pd.DataFrame()
df['A'] = ['C1', 'C1', 'C2', 'C2', 'C3', 'C3']
df['B'] = [1,2,3,4,5,6]

df['A'] = df.loc[:,'A'].astype('category')
df2 = df[0:3]

result = df2.groupby(by='A')['B'].nunique()

This worked in 0.16.2, but is broken in 0.17.0 (it gives ValueError: Wrong number of items passed 2, placement implies 3)

It is related with the fact that not all categories are present in the actual values (due to the slicing).

Probably related with new nunique implementation in 0.16.2 (#10894, #11079)

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugCategoricalCategorical Data TypeGroupbyRegressionFunctionality that used to work in a prior pandas version

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions