Skip to content

convert numeric column to dedicated pd.StringDtype() #31204

Closed
@vadella

Description

@vadella

Code Sample, a copy-pastable example if possible

pd.Series(range(5, 10), dtype="Int64").astype("string")

raises TypeError: data type not understood

while

pd.Series(range(5, 10)).astype("string")

raises ValueError: StringArray requires a sequence of strings or missing values.

If you first do astype(str):

pd.Series(range(5, 10)).astype(str).astype("string")

and

pd.Series(range(5, 10), dtype="Int64").astype(str).astype("string")

work as expected:

0    5
1    6
2    7
3    8
4    9
dtype: string

While astype(object) raises in both cases ValueError: StringArray requires a sequence of strings or missing values.

Problem description

I can understand the ValueError, since you don't feed strings to the StringArray. Best for me would be if the astype("string") converts it to strings, or if the astype(str) would return a StringArray, but in any case, I would expect both pd.Series(range(5, 10), dtype="Int64").astype("string") and pd.Series(range(5, 10)).astype("string") to raise the same error.

Expected Output

0    5
1    6
2    7
3    8
4    9
dtype: string

or

ValueError: StringArray requires a sequence of strings or missing values.

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line]

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugDtype ConversionsUnexpected or buggy dtype conversionsExtensionArrayExtending pandas with custom dtypes or arrays.NA - MaskedArraysRelated to pd.NA and nullable extension arrays

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions