Description
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
(optional) I have confirmed this bug exists on the master branch of pandas.
import pandas as pd
df1 = pd.DataFrame({
"date": ["2015-03-22", pd.NaT],
"a": ["a", "aa"],
"b": [2, 3]
})
df1 = df1.set_index("date") # works as intended
print(df1)
df1 = df1.reset_index()
df1 = df1.set_index(["date", "a"]) # works as intended
print(df1)
df2 = pd.DataFrame({
"date": [pd.NaT, pd.NaT],
"a": ["a", "aa"],
"b": [2, 3]
})
df2 = df2.set_index("date") # works as intended
print(df2)
df2 = df2.reset_index()
df2 = df2.set_index(["date", "a"]) # date will be converted to NaN here
print(df2)
df3 = pd.DataFrame({
"date": [pd.NaT, pd.NaT],
"a": ["a", "aa"],
"b": [2, 3]
})
# working alternative, but at what cost
df3.index = pd.MultiIndex.from_frame(df3.loc[:, ["date", "a"]])
df3 = df3.drop(columns=["date", "a"])
print(df3)
df3.reset_index() # ValueError thrown here
python 3.8.3
pandas 1.1.3
This bug will not occur if the column in question contains any values other than NaT, nor if setting a single column as the index.
With df1, everything works as expected.
In df2 (identical to df1 except "date" contains exclusively pd.NaT values) when set_index is used with more than one column, the date level of the resulting MultiIndex has now been converted to NaNs.
With df3 (identical to df2), the use of set_index is circumvented and the MultiIndex is set as intended, however this now leads to another issue where attempting to use reset_index will throw "ValueError: cannot convert float NaN to integer".