Skip to content

ENH: implement fast isin() for nullable dtypes #38340

Closed
@jorisvandenbossche

Description

@jorisvandenbossche

Currently, you can get quite a slowdown:

In [41]: arr = np.random.randint(0, 10, 1_000_001)

In [42]: s = pd.Series(arr)

In [43]: %timeit s.isin([1, 2, 3, 20])
2.71 ms ± 175 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [44]: s = pd.Series(arr, dtype="Int64")

In [45]: %timeit s.isin([1, 2, 3, 20])
22.9 ms ± 96.7 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Metadata

Metadata

Assignees

No one assigned

    Labels

    EnhancementNA - MaskedArraysRelated to pd.NA and nullable extension arraysPerformanceMemory or execution speed performance

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions