Closed
Description
When there are NaN values in the index, then reset_index introduces incorrect values. That is the case even if the reset_index operation occurs on a different index than the one containing the NaN values:
In [1]: import pandas as pd
In [2]: df = pd.DataFrame({
...: 'col1' : [1,2,3,4,5,6,7,8],
...: 'col2' : [8,7,6,5,4,3,2,1],
...: })
In [4]: arrays = [
...: ['a','a','a','a','b','b','b','b'],
...: ['c',None,'d',None,'e','e',None,'f']
...: ]
In [6]: idx = pd.MultiIndex.from_tuples(zip(*arrays),
...: names=['first', 'second'])
In [7]: df.index = idx
In [8]: df
Out[8]:
col1 col2
first second
a c 1 8
NaN 2 7
d 3 6
NaN 4 5
b e 5 4
e 6 3
NaN 7 2
f 8 1
In [9]: df.reset_index()
Out[9]:
first second col1 col2
0 a c 1 8
1 a f 2 7
2 a d 3 6
3 a f 4 5
4 b e 5 4
5 b e 6 3
6 b f 7 2
7 b f 8 1
In [10]: df.reset_index('second')
Out[10]:
second col1 col2
first
a c 1 8
a f 2 7
a d 3 6
a f 4 5
b e 5 4
b e 6 3
b f 7 2
b f 8 1
In [11]: df.reset_index('first')
Out[11]:
first col1 col2
second
c a 1 8
f a 2 7
d a 3 6
f a 4 5
e b 5 4
e b 6 3
f b 7 2
f b 8 1