Closed
Description
When providing dtype=str
to the DataFrame constructor, we're inconsistent about coercing values to strings.
When there's no overlap between the keys of data
and columns, things are probably OK.
In [4]: pd.DataFrame(index=[0, 1], columns=[0, 1], dtype=str)
Out[4]:
0 1
0 NaN NaN
1 NaN NaN
(those values are np.nan).
But when there is an overlap between keys of data
and columns, the newly introduced values are coerced to strings.
In [8]: pd.DataFrame({'A': [1, 2]}, index=[0, 1], columns=['A', 'B'], dtype=str)
Out[8]:
A B
0 1 nan
1 2 nan
(everything in that dataframe is a string, like "1"
or "nan"
)
That's be cause init_dict
relies on arrays_to_mgr
to coerce the values to the dtype
, and arrays_to_mgr
only gets a single dtype
.