Skip to content

DataFrame constructor is inconsistent when coercing values to strings with dtype=str. #24388

Closed
@TomAugspurger

Description

@TomAugspurger

When providing dtype=str to the DataFrame constructor, we're inconsistent about coercing values to strings.

When there's no overlap between the keys of data and columns, things are probably OK.

In [4]: pd.DataFrame(index=[0, 1], columns=[0, 1], dtype=str)
Out[4]:
     0    1
0  NaN  NaN
1  NaN  NaN

(those values are np.nan).

But when there is an overlap between keys of data and columns, the newly introduced values are coerced to strings.

In [8]: pd.DataFrame({'A': [1, 2]}, index=[0, 1], columns=['A', 'B'], dtype=str)
Out[8]:
   A    B
0  1  nan
1  2  nan

(everything in that dataframe is a string, like "1" or "nan")

That's be cause init_dict relies on arrays_to_mgr to coerce the values to the dtype, and arrays_to_mgr only gets a single dtype.

Metadata

Metadata

Assignees

No one assigned

    Labels

    DataFrameDataFrame data structureDtype ConversionsUnexpected or buggy dtype conversions

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions