Avoiding DataFrame.apply unintended side effect when result_type is not specified.

According to the docs (https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.apply.html)

"In the current implementation apply calls func twice on the first column/row to decide whether it can take a fast or slow code path. This can lead to unexpected behavior if func has side-effects..."

Well it definitely is there in the docs, but took me several hours to trace down the bug to this "feature".

So I think it would be cleaner either to fully support side effects in apply (e.g. by calling func on a copy of the first column/row in the testing phase ) or ban it completely if technically possible.

I know there are plans to ban modification when using `groupby.apply` ( #12653 )
I don't see any issues with mutation inside a (non groupby) apply per se, but I may be wrong.

I also have to note, that the above note from the docs is not entirely correct. If `result_type` is specified the first row/column is not necessarily processed twice.





Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Avoiding DataFrame.apply unintended side effect when result_type is not specified. #24614

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Avoiding DataFrame.apply unintended side effect when result_type is not specified. #24614

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions