Skip to content

BUG: Inconsistency in DataFrame.where between inplace and not inplace with na like value for StringArray #46512

Open
@simonjayhawkins

Description

@simonjayhawkins

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

print(pd.__version__)
df = pd.DataFrame({"A": ["1", "", "3"]}, dtype="string")
try:
    result = df.where(df != "", np.nan)
    arr = result["A"]._values
    print(arr)
    print(type(arr[1]))
except Exception as e:
    print(e)
df.where(df != "", np.nan, inplace=True)
print(df)
arr = df["A"]._values
print(arr)
print(type(arr[1]))

Issue Description

code sample based on #46366

1.4.1
StringArray requires a sequence of strings or pandas.NA
     A
0    1
1  NaN
2    3
<StringArray>
['1', nan, '3']
Length: 3, dtype: string
<class 'float'>
1.5.0.dev0+595.gf99ec8bf80
<StringArray>
['1', <NA>, '3']
Length: 3, dtype: string
<class 'pandas._libs.missing.NAType'>
     A
0    1
1  NaN
2    3
<StringArray>
['1', nan, '3']
Length: 3, dtype: string
<class 'float'>

Expected Behavior

The behavior for the inplace=False case has changed from 1.4.1 to main since #45168 allows other na values in the StringArray Constructor.

Whether this is correct for the DataFrame.where case may need discussion. Either way, the results for the inplace=True case look incorrect to me and should be consistent with the inplace=False case.

Installed Versions

.

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugExtensionArrayExtending pandas with custom dtypes or arrays.StringsString extension data type and string data

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions