Skip to content

BUG: has_duplicates misbehaves when multiindex has a NaN #5873

Closed
@felixlawrence

Description

@felixlawrence

When (at least) one element in a MultiIndex contains a NaN, has_duplicates starts to behave strangely:

>>> idx = pd.MultiIndex.from_arrays([[101, 102], [3.5, np.nan]])
>>> idx
MultiIndex
[(101, 3.5), (102, nan)]
>>> idx.has_duplicates
True
>>> idx.get_duplicates()
[]

I would expect has_duplicates to return False here, because 102 is not the same as 101.

I would also expect it to return false for the MultiIndex

MultiIndex
[(101, 3.5), (101, nan)]

since 3.5 != NaN, but this case is more debatable.

This is important because you can't call .unstack() on a series with a MultiIndex for which has_duplicates is True, even if the MultiIndex is of high dimension and the dimensions containing the NaN(s) are not involved in the operation.

This is with pandas 0.12.0

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions