MultiIndex.get_level_values() replaces NA by another value

Test case:

```
In [1]: import numpy as np

In [2]: import pandas as pd

In [3]: index = pd.MultiIndex.from_arrays([
    ['a', 'b', 'b'],
    [1, np.nan, 2]
])

In [4]: index.get_level_values(1)
Out[4]: Float64Index([1.0, 2.0, 2.0], dtype=object)
```

The expected output is
`Float64Index([1.0, nan, 2.0], dtype=object)`

This happens because NA values are not stored in the MultiIndex levels and the corresponding label is set to -1. Then when labels are used as indexes to values in `get_level_values()` that -1 points to the last (not null) value.

I tried to fix this by appending a NA to the values if -1 is in levels.
https://github.com/goyodiaz/pandas/commit/f028513ad96a
It needs to be improved in order to return the proper NA value (NaN, None, maybe NaT?) depending on the index type. Does this approach makes sense?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

MultiIndex.get_level_values() replaces NA by another value #5074

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

MultiIndex.get_level_values() replaces NA by another value #5074

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions