Skip to content

current implementation of DataFrame.apply where passed function has side effects is a real "gotcha" #10222

Closed
@jrenner

Description

@jrenner

From the documentation for DataFrame.apply:

In the current implementation apply calls func twice on the first column/row to decide whether it can take a fast or slow code path. This can lead to unexpected behavior if func has side-effects, as they will take effect twice for the first column/row.

I am sure there are well-thought out reasons why it is currently implemented this way, but this behavior can be a real "gotcha". I grant that having side effects inside an apply function is not good standard practice, but I would argue that there are times when it is a good solution to a problem. ( I can elaborate on this if needed). Thankfully this behavior is documented, but I think it is reasonable to expect that most users will not always be mindful of secondary notes that exist throughout the documentation, and the source of problems caused by this behavior is not at all obvious.

While I don't understand the engineering issues regarding the fast/slow path, I would suggest that some better solution to the problem be introduced.

Metadata

Metadata

Assignees

No one assigned

    Labels

    API DesignReshapingConcat, Merge/Join, Stack/Unstack, Explode

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions