Description
I know iterrows
is not the most recommended function, but I noticed a strange behaviour (triggered by a problem of a geopandas user: geopandas/geopandas#348). When using iterrows
on a df with mixed dtypes (so the resulting series is of object dtype), the numeric values are converted to python types, while with loc/iloc
the numpy types are preserved:
In [254]: df = pd.DataFrame({'int':[0,1], 'float':[0.1,0.2], 'str':['a','b']})
In [255]: df
Out[255]:
float int str
0 0.1 0 a
1 0.2 1 b
In [256]: row1 = df.iloc[0]
In [257]: i, row2 = next(df.iterrows())
In [258]: row3= next(df.itertuples())
In [260]: type(row1['float'])
Out[260]: numpy.float64
In [261]: type(row2['float'])
Out[261]: float
In [269]: type(row3.float)
Out[269]: numpy.float64
Is this intentional? (it's a consequence of using self.values
in the implementation, and numpy does this conversion to python types in an object array) And if so, is this worth documenting?
(note it was actually the numpy types in an object dtyped series that caused an issue for the geopandas user, because fiona couldn't handle those numpy scalars in an object dtyped column, but that's not an issue to blame pandas)