Skip to content

Assigning dataframe by specifying both row/col doesn't handle nan correctly #3626

Closed
@jianpan

Description

@jianpan

If I run the following code under pandas 0.11, I will get different results from the 2 identical assign statment at the bottom:

import numpy as np
import pandas as pd

if __name__ == '__main__':
    df = pd.DataFrame({'FC':['a','b','a','b','a','b'],
                       'PF':[0,0,0,0,1,1],
                       'col1':np.random.rand(6),
                       'col2':np.random.rand(6)});
    df.ix[1,0]=np.nan
    mask=~df.FC.isnull()
    cols=['col1', 'col2']

    dft = df * 2
    dft.ix[3,3] = np.nan


    # bug?
    df.ix[mask, cols]= dft.ix[mask, cols]
    print df
    df.ix[mask, cols]= dft.ix[mask, cols]
    print df

Notice in the second result, the NaN at [3,3] disappeared and all values below it got shifted up.

    FC  PF      col1      col2
0    a   0  0.679388  0.508501
1  NaN   0  0.168159  0.346730
2    a   0  1.802611  0.870135
3    b   0  0.577989       NaN
4    a   1  1.027070  0.530890
5    b   1  0.383662  1.855584

    FC  PF      col1      col2
0    a   0  0.679388  0.508501
1  NaN   0  0.168159  0.346730
2    a   0  1.802611  0.870135
3    b   0  0.577989  0.530890
4    a   1  1.027070  1.855584
5    b   1  0.383662  0.508501    

The issue seems to be in pandas index.py line 143:
v = v.reindex(self.obj[item].reindex(v.index).dropna().index)
notice it's dropping NA from the target.

I then tried to use df.ix[mask, cols]= dft.ix[mask, cols].values to bypass this, and it failed also. The problem is in pandas index.py line 149:
if len(labels) != len(value):
notice it's comparing number of columns to be assigned against number of rows in ndarray.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions