Skip to content

additional keys in groupby indices when NAs are present #9304

Closed
@josepm

Description

@josepm
In [386]: h = pd.DataFrame({'a':[1,2,1,np.nan,1], 'b':[1,2,3,3,2], 'c':[2,3,1,4,2]})

In [387]: gh=h.groupby(['a', 'b'])

In [388]: gh.groups.keys()
Out[388]: [(1.0, 2), (nan, 3), (1.0, 3), (1.0, 1), (2.0, 2)]

In [389]: gh.indices.keys()
Out[389]: [(1.0, 2), (1.0, 3), (2.0, 3), (1.0, 1), (2.0, 2)]  # Incorrect

The tuple (2.0, 3) should not be here.
The problem goes away when there are no NAs

Metadata

Metadata

Assignees

Labels

BugGroupbyMissing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolate

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions