Closed
Description
I'm working on making lib.infer_dtype copy-free and finding things would be easier/more consistent if we tweaked the meaning of the skipna keyword.
In particular, instead of doing values = values[~isnaobj(values)]
, followed by e.g. is_string_array(values)
, we could do is_string_array(values, skipna=skipna)
. This would change the results in cases where we have NA values that are not considered valid_na by is_string_array, e.g. in the status quo:
import pandas as pd
import numpy as np
from pandas._libs import lib
arr = np.array(["foo", pd.NaT, "bar"], dtype=object)
In [2]: lib.infer_dtype(arr, skipna=True)
Out[2]: 'string'
In [3]: lib.is_string_array(arr, skipna=True)
Out[3]: False
So the suggestion here is to change [2] to give 'mixed'. I'm finding that to make this work without breaking the world we also need to change StringValidator.is_valid_null to accept np.nan and None (currently just accepts pd.NA)