Description
I have tried the function df.iterrows()
but its performance is horrible. Which is not surprising given that iterrows()
returns a Series
with full schema and meta data, not just the values (which all that I need).
The second method that I have tried is for row in df.values
, which is significantly faster. However, I have recently realized that df.values
is not the internal data storage of the DataFrame, because df.values
converts all dtypes
to a common dtype
. For example, one of my columns has dtype int64
but the dtype of df.values
is all float64
. So I suspect that df.values
actually creates another copy of the internal data.
Also, another requirement is that the row iteration must return a list of values that preserve the original dtype
of the data.