Skip to content

BUG: groupby.rank with non-unique index groupers #16577

Closed
@jreback

Description

@jreback

groupby-with-rank and non-unique groupers, which include nan, raise
an odd error (the 2nd one), in reality this cannot reindex properly (which is really the error).

In [103]: df = DataFrame({'A': [1., 2., 3., np.nan], 'value': 1.}, index=[pd.Timestamp('20170101', tz='US/Eastern')] * 4)

In [104]: df.groupby([df.index, 'A']).value.rank(ascending=True, pct=True)     
ValueError: cannot reindex from a duplicate axis
AttributeError: 'SeriesGroupBy' object has no attribute '_aggregate_item_by_item'

but works when this is a column (and not an index)

In [105]: df.reset_index().groupby([df.index, 'A']).value.rank(ascending=True, pct=True)
Out[105]: 
0    1.0
1    1.0
2    1.0
3    NaN
Name: value, dtype: float64
```

so 2 interelated bugs here.

xref to #11759 

Metadata

Metadata

Assignees

No one assigned

    Labels

    AlgosNon-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diffGroupbyNeeds TestsUnit test(s) needed to prevent regressionsgood first issue

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions