Skip to content

updating row data within apply from other rows doesn't work #8662

Closed
@lamakaha

Description

@lamakaha

I'm grouping rows by column investor id. within apply, when a group has more than 1 row, i'm copying a value in the second row in the group into the a column of the first row in that group. I'm using groupby with apply but the assembled result is incorrect - all rows except row 1 show the same investor B - seems like a bug?

In [1]: import pandas as pd
In [2]: import numpy as np

In [3]: df=pd.DataFrame({'InvestorID':[6,6,17,17,19,19,30,40],
   ...:                  'Investor':['AAA','AAA','BBB','BBB','CCC','CCC','EEE','FFF'],
   ...:                  'CurrentPeriodEnd':['7/31/2014','5/31/2014','7/31/2014','5/31/2014','7/31/2014','5/31/2014','7/31/2014','7/31/2014'],
   ...:                  'PriorPeriodEnd':np.nan*8})

In [4]: df
Out[4]: 
  CurrentPeriodEnd Investor  InvestorID  PriorPeriodEnd
0        7/31/2014      AAA           6             NaN
1        5/31/2014      AAA           6             NaN
2        7/31/2014      BBB          17             NaN
3        5/31/2014      BBB          17             NaN
4        7/31/2014      CCC          19             NaN
5        5/31/2014      CCC          19             NaN
6        7/31/2014      EEE          30             NaN
7        7/31/2014      FFF          40             NaN

this function updates the first positional row in a group with values from the second row
where a group has 2 rows or more

In [5]: def f(x):
   ...:     if len(x) >1:
   ...:         x.iloc[0,x.columns.get_loc('PriorPeriodEnd')]=x.iloc[1,x.columns.get_loc('CurrentPeriodEnd')] 
   ...:     return x.iloc[0]

In [6]: df.groupby(['InvestorID']).apply(f).reset_index(drop=True)

this is not an expected result - the investor column should not have repeating BBB values

Out[6]: 
  CurrentPeriodEnd Investor  InvestorID PriorPeriodEnd
0        7/31/2014      AAA           6      5/31/2014
1        7/31/2014      BBB          17      5/31/2014
2        7/31/2014      BBB          19      5/31/2014
3        7/31/2014      BBB          30      5/31/2014
4        7/31/2014      BBB          40      5/31/2014

the function below copies the first row first before making updates and it works as expected

In [7]: def f(x):
   ...:     top_row=pd.Series(x.iloc[0])
   ...:     if len(x) >1:
   ...:         top_row.iloc[x.columns.get_loc('PriorPeriodEnd')]=x.iloc[1,x.columns.get_loc('CurrentPeriodEnd')] 
   ...:     return top_row

this is what the result should look like

In [8]: df.groupby(['InvestorID']).apply(f).reset_index(drop=True)
Out[8]: 
  CurrentPeriodEnd Investor  InvestorID PriorPeriodEnd
0        7/31/2014      AAA           6      5/31/2014
1        7/31/2014      BBB          17      5/31/2014
2        7/31/2014      CCC          19      5/31/2014
3        7/31/2014      EEE          30            NaN
4        7/31/2014      FFF          40            NaN

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions