Closed
Description
I'm grouping rows by column investor id. within apply, when a group has more than 1 row, i'm copying a value in the second row in the group into the a column of the first row in that group. I'm using groupby with apply but the assembled result is incorrect - all rows except row 1 show the same investor B - seems like a bug?
In [1]: import pandas as pd
In [2]: import numpy as np
In [3]: df=pd.DataFrame({'InvestorID':[6,6,17,17,19,19,30,40],
...: 'Investor':['AAA','AAA','BBB','BBB','CCC','CCC','EEE','FFF'],
...: 'CurrentPeriodEnd':['7/31/2014','5/31/2014','7/31/2014','5/31/2014','7/31/2014','5/31/2014','7/31/2014','7/31/2014'],
...: 'PriorPeriodEnd':np.nan*8})
In [4]: df
Out[4]:
CurrentPeriodEnd Investor InvestorID PriorPeriodEnd
0 7/31/2014 AAA 6 NaN
1 5/31/2014 AAA 6 NaN
2 7/31/2014 BBB 17 NaN
3 5/31/2014 BBB 17 NaN
4 7/31/2014 CCC 19 NaN
5 5/31/2014 CCC 19 NaN
6 7/31/2014 EEE 30 NaN
7 7/31/2014 FFF 40 NaN
this function updates the first positional row in a group with values from the second row
where a group has 2 rows or more
In [5]: def f(x):
...: if len(x) >1:
...: x.iloc[0,x.columns.get_loc('PriorPeriodEnd')]=x.iloc[1,x.columns.get_loc('CurrentPeriodEnd')]
...: return x.iloc[0]
In [6]: df.groupby(['InvestorID']).apply(f).reset_index(drop=True)
this is not an expected result - the investor column should not have repeating BBB values
Out[6]:
CurrentPeriodEnd Investor InvestorID PriorPeriodEnd
0 7/31/2014 AAA 6 5/31/2014
1 7/31/2014 BBB 17 5/31/2014
2 7/31/2014 BBB 19 5/31/2014
3 7/31/2014 BBB 30 5/31/2014
4 7/31/2014 BBB 40 5/31/2014
the function below copies the first row first before making updates and it works as expected
In [7]: def f(x):
...: top_row=pd.Series(x.iloc[0])
...: if len(x) >1:
...: top_row.iloc[x.columns.get_loc('PriorPeriodEnd')]=x.iloc[1,x.columns.get_loc('CurrentPeriodEnd')]
...: return top_row
this is what the result should look like
In [8]: df.groupby(['InvestorID']).apply(f).reset_index(drop=True)
Out[8]:
CurrentPeriodEnd Investor InvestorID PriorPeriodEnd
0 7/31/2014 AAA 6 5/31/2014
1 7/31/2014 BBB 17 5/31/2014
2 7/31/2014 CCC 19 5/31/2014
3 7/31/2014 EEE 30 NaN
4 7/31/2014 FFF 40 NaN