Skip to content

MultiIndex.get_level_values() replaces NA by another value #5074

Closed
@goyodiaz

Description

@goyodiaz

Test case:

In [1]: import numpy as np

In [2]: import pandas as pd

In [3]: index = pd.MultiIndex.from_arrays([
    ['a', 'b', 'b'],
    [1, np.nan, 2]
])

In [4]: index.get_level_values(1)
Out[4]: Float64Index([1.0, 2.0, 2.0], dtype=object)

The expected output is
Float64Index([1.0, nan, 2.0], dtype=object)

This happens because NA values are not stored in the MultiIndex levels and the corresponding label is set to -1. Then when labels are used as indexes to values in get_level_values() that -1 points to the last (not null) value.

I tried to fix this by appending a NA to the values if -1 is in levels.
https://github.com/goyodiaz/pandas/commit/f028513ad96a
It needs to be improved in order to return the proper NA value (NaN, None, maybe NaT?) depending on the index type. Does this approach makes sense?

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugIndexingRelated to indexing on series/frames, not to indexes themselvesMissing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolate

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions