What is the most efficient way to iterate over Pandas's DataFrame row by row?

I have tried the function `df.iterrows()` but its performance is horrible. Which is not surprising given that `iterrows()` returns a `Series` with full schema and meta data, not just the values (which all that I need).

The second method that I have tried is `for row in df.values`, which is significantly faster. However, I have recently realized that `df.values` is not the internal data storage of the DataFrame, because `df.values` converts all `dtypes` to a common `dtype`. For example, one of my columns has dtype `int64` but the dtype of `df.values` is all `float64`. So I suspect that `df.values` actually creates another copy of the internal data.

Also, another requirement is that the row iteration must return a list of values that preserve the original `dtype` of the data.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

What is the most efficient way to iterate over Pandas's DataFrame row by row? #10334

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

What is the most efficient way to iterate over Pandas's DataFrame row by row? #10334

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions