Skip to content

What is the most efficient way to iterate over Pandas's DataFrame row by row? #10334

Closed
@zer0n

Description

@zer0n

I have tried the function df.iterrows() but its performance is horrible. Which is not surprising given that iterrows() returns a Series with full schema and meta data, not just the values (which all that I need).

The second method that I have tried is for row in df.values, which is significantly faster. However, I have recently realized that df.values is not the internal data storage of the DataFrame, because df.values converts all dtypes to a common dtype. For example, one of my columns has dtype int64 but the dtype of df.values is all float64. So I suspect that df.values actually creates another copy of the internal data.

Also, another requirement is that the row iteration must return a list of values that preserve the original dtype of the data.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions