Skip to content

Dispatch pd.is_na for scalar extension value #27825

Open
@TomAugspurger

Description

@TomAugspurger

Right now, I don't believe there's a way for an ExtensionDtype to declare a custom scalar NA value and have pd.isna(scalar) do the right thing.

_nas = object()


class NaSType(str):
    """
    NA for String type.
    """

    # TODO: enforce singleton

    def __new__(cls, value):
        if value is not _nas:
            raise ValueError("Cannot create NaS from '{}'".format(value))
        return super().__new__(cls, value)

    def __eq__(self, other):
        # TODO: array comparisons, etc.
        return False

    def __str__(self):
        return "NaS"

    def __repr__(self):
        return str(self)


NaS = NaSType(_nas)


@register_extension_dtype
class StringDtype(ExtensionDtype):

    @property
    def na_value(self):
        return NaS

    @property
    def type(self) -> Type:
        return str

    @property
    def name(self) -> str:
        return "string"

    @classmethod
    def construct_from_string(cls, string: str):
        if string in {"string", "str"}:
            return cls()
        return super().construct_from_string(string)

    @classmethod
    def construct_array_type(cls) -> "Type[StringArray]":
        return StringArray
In [18]: NaS
Out[18]: NaS

In [19]: pd.isna(NaS)
Out[19]: False

That should be True. In https://github.com/pandas-dev/pandas/blob/master/pandas/core/dtypes/missing.py#L131-L132 we go straight to lib missing.checknull(obj) for scalar values.

Metadata

Metadata

Assignees

No one assigned

    Labels

    EnhancementExtensionArrayExtending pandas with custom dtypes or arrays.Missing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolate

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions