Closed
Description
Suppose we wish to concatenate two dataframes with the same name. Because the columns share the same name, loc will not distinguish them. So we use iloc to manipulate them individually.
import pandas as pd,numpy as np
df0=pd.DataFrame(np.arange(10))
df1=pd.DataFrame(np.arange(10,20))
#1) Wrong
df2=pd.concat([df0,df1],axis=1)
df2.iloc[:,0]/=2.
This gives the surprising output that the result on the first column is applied to both.
0 0
0 0.0 0.0
1 0.5 0.5
2 1.0 1.0
3 1.5 1.5
4 2.0 2.0
5 2.5 2.5
6 3.0 3.0
7 3.5 3.5
8 4.0 4.0
9 4.5 4.5
The expected output is this.
0 0
0 0.0 10
1 0.5 11
2 1.0 12
3 1.5 13
4 2.0 14
5 2.5 15
6 3.0 16
7 3.5 17
8 4.0 18
9 4.5 19
This could be achieved with this code (although that requires renaming the columns).
df2=pd.concat([df0,df1],axis=1)
df2.columns=range(df2.shape[1])
df2.iloc[:,0]/=2.
Why do we get a different result when we use iloc? I understand the renaming the column would matter if we were using loc. Please help me understand why iloc cares about column names.
I'm using pandas: 0.17.1, numpy: 1.9.2.
Thank you.