Closed
Description
related to #3588
This test
def test_pivot_index_with_nan(self):
# GH 3588
nan = np.nan
df = DataFrame({"a":['R1', 'R2', nan, 'R4'], 'b':["C1", "C2", "C3" , "C4"], "c":[10, 15, nan , 20]})
result = df.pivot('a','b','c')
expected = DataFrame([[nan,nan,nan,nan],[nan,10,nan,nan],
[nan,nan,nan,nan],[nan,nan,15,20]],
index = Index(['R1','R2',nan,'R4'],name='a'),
columns = Index(['C1','C2','C3','C4'],name='b'))
tm.assert_frame_equal(result, expected)
seems very odd to me, even though the expected result is constructed by hand.
Here's df
and result
:
In [2]: df
Out[2]:
a b c
0 R1 C1 10
1 R2 C2 15
2 NaN C3 NaN
3 R4 C4 20
In [3]: result
Out[3]:
b C1 C2 C3 C4
a
R1 NaN NaN NaN NaN
R2 NaN 10 NaN NaN
NaN NaN NaN NaN NaN
R4 NaN NaN 15 20
The way I understand pivot
here is that it makes a DataFrame
with a
as the index
, b
as the columns and then uses the third argument, in this case c
as values, where a
and b
form a kind of coordinate system for c
. Thus, instead of the current output, I would expect the result to be
In [14]: e
Out[14]:
b C1 C2 C3 C4
a
R1 10 NaN NaN NaN
R2 NaN 15 NaN NaN
NaN NaN NaN NaN NaN
R4 NaN NaN NaN 20
If, on the other hand, you don't have any nan
s in your frame, the result is what I would expect:
In [16]: df.loc[2, 'a'] = 'R3'
In [17]: df.loc[2, 'c'] = 17
In [18]: df
Out[18]:
a b c
0 R1 C1 10
1 R2 C2 15
2 R3 C3 17
3 R4 C4 20
In [19]: df.pivot('a','b','c')
Out[19]:
b C1 C2 C3 C4
a
R1 10 NaN NaN NaN
R2 NaN 15 NaN NaN
R3 NaN NaN 17 NaN
R4 NaN NaN NaN 20
I'll have a look at #3588 to see if this is a regression, or if I'm just misunderstanding how this is supposed to work.
I have a suspicion that this is related to #7403