Skip to content

BUG: Setting a MultiIndex through set_index where one of the columns contains exclusively NaTs converts them to NaNs #38025

Closed
@Xnot

Description

@Xnot
  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


import pandas as pd

df1 = pd.DataFrame({
    "date": ["2015-03-22", pd.NaT],
    "a": ["a", "aa"],
    "b": [2, 3]
})

df1 = df1.set_index("date") # works as intended
print(df1)
df1 = df1.reset_index()
df1 = df1.set_index(["date", "a"]) # works as intended
print(df1)

df2 = pd.DataFrame({
    "date": [pd.NaT, pd.NaT],
    "a": ["a", "aa"],
    "b": [2, 3]
})

df2 = df2.set_index("date") # works as intended
print(df2)
df2 = df2.reset_index()
df2 = df2.set_index(["date", "a"]) # date will be converted to NaN here
print(df2)

df3 = pd.DataFrame({
    "date": [pd.NaT, pd.NaT],
    "a": ["a", "aa"],
    "b": [2, 3]
})

# working alternative, but at what cost
df3.index = pd.MultiIndex.from_frame(df3.loc[:, ["date", "a"]])
df3 = df3.drop(columns=["date", "a"]) 
print(df3)
df3.reset_index() # ValueError thrown here

python 3.8.3
pandas 1.1.3

This bug will not occur if the column in question contains any values other than NaT, nor if setting a single column as the index.

With df1, everything works as expected.

In df2 (identical to df1 except "date" contains exclusively pd.NaT values) when set_index is used with more than one column, the date level of the resulting MultiIndex has now been converted to NaNs.

With df3 (identical to df2), the use of set_index is circumvented and the MultiIndex is set as intended, however this now leads to another issue where attempting to use reset_index will throw "ValueError: cannot convert float NaN to integer".

Metadata

Metadata

Assignees

Labels

IndexingRelated to indexing on series/frames, not to indexes themselvesMissing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolateNeeds TestsUnit test(s) needed to prevent regressionsgood first issue

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions