Skip to content

API: infer_dtype with skipna=True only skip valid-for-dtype NAs #45022

Closed
@jbrockmendel

Description

@jbrockmendel

I'm working on making lib.infer_dtype copy-free and finding things would be easier/more consistent if we tweaked the meaning of the skipna keyword.

In particular, instead of doing values = values[~isnaobj(values)], followed by e.g. is_string_array(values), we could do is_string_array(values, skipna=skipna). This would change the results in cases where we have NA values that are not considered valid_na by is_string_array, e.g. in the status quo:

import pandas as pd
import numpy as np
from pandas._libs import lib

arr = np.array(["foo", pd.NaT, "bar"], dtype=object)

In [2]: lib.infer_dtype(arr, skipna=True)
Out[2]: 'string'

In [3]: lib.is_string_array(arr, skipna=True)
Out[3]: False

So the suggestion here is to change [2] to give 'mixed'. I'm finding that to make this work without breaking the world we also need to change StringValidator.is_valid_null to accept np.nan and None (currently just accepts pd.NA)

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugMissing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolate

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions