Skip to content

Parameterize NA sentinel for hashtables #20328

Closed
@TomAugspurger

Description

@TomAugspurger

xref #19938. There's I'm abusing the fact that the Int64HashTable class treats iNaT as a missing value. It'd be cleaner to just pass the Categorical's codes with a flag saying that -1 is the missing value marker.

IIUC, the missing value condition is baked into the class definition.

{{py:

# name, dtype, null_condition, float_group
dtypes = [('Float64', 'float64', 'val != val', True),
          ('UInt64', 'uint64', 'False', False),
          ('Int64', 'int64', 'val == iNaT', False)]

I'm not familiar enough with Cython to know how that could be made a parameter. Possibly an optional scalar value. If that value is not None check if val == na_sentinel. Else, use the condition from the class definition?

Metadata

Metadata

Assignees

No one assigned

    Labels

    AlgosNon-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diffMissing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolate

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions