Skip to content

PERF: use numpy.random.Generator #28440

Closed
@thrasibule

Description

@thrasibule

Numpy 1.17 introduced a new random module with faster PRNGs, and which drops the strict reproducibility of random streams guarantee, which allows some algorithmic improvements. In particular, the choice method is now a lot faster in the replace=False case. Would it make sense for random_state to return a np.random.Generator instead of a np.random.RandomState when numpy version >= 1.17 here: https://github.com/pandas-dev/pandas/blob/master/pandas/core/common.py#L408. This would automatically speed up the DataFrame.sample method for instance.
I can write a PR, but I'm not sure how to handle the different numpy versions. Should I just do the tests inside random_state or is something that needs to go inside numpy.compat?

Metadata

Metadata

Assignees

No one assigned

    Labels

    PerformanceMemory or execution speed performance

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions