Skip to content

API: common dtype for bool + numeric: upcast to object or coerce to numeric?  #39817

Closed
@jorisvandenbossche

Description

@jorisvandenbossche

We currently have an inconsistency in how we determine the common dtype for bool + numeric.

Numpy coerces booleans to numeric values when combining with numeric dtype:

>> np.concatenate([np.array([True]), np.array([1])])
array([1, 1])

In pandas, Series does the same:

>>> pd.concat([pd.Series([True], dtype=bool), pd.Series([1.0], dtype=float)])
0    1.0
0    1.0
dtype: float64

except if they are empty, then we ensure the result is object dtype:

>>> pd.concat([pd.Series([], dtype=bool), pd.Series([], dtype=float)])
Series([], dtype: object)

And for DataFrame we return object dtype in all cases:

>>> pd.concat([pd.DataFrame({'a': np.array([], dtype=bool)}), pd.DataFrame({'a': np.array([], dtype=float)})]).dtypes
a    object
dtype: object
>>> pd.concat([pd.DataFrame({'a': np.array([True], dtype=bool)}), pd.DataFrame({'a': np.array([1.0], dtype=float)})]).dtypes
a    object
dtype: object

For the nullable dtypes, we also have implemented this a bit inconsistently up to now:

>>> pd.concat([pd.Series([True], dtype="boolean"), pd.Series([1], dtype="Int64")])
0    1
0    1
dtype: Int64

>>> pd.concat([pd.Series([True], dtype="boolean"), pd.Series([1], dtype="Float64")])
0    True
0     1.0
dtype: object

So here we preserve numeric dtype for Integer, but convert to object for Float. Now, the reason for this is because IntegerDtype._get_common_dtype handles the case of boolean dtype and then uses the numpy rules to determine the result dtype, while the FloatingDtype doesn't yet handle non-float dtypes and thus results in object dtype for anything else (also for float numpy dtype, which is obviously a bug / missing feature)


Basically we need to decide what the desired behaviour is for the bool + numeric dtype combination: coerce to numeric or upcast to object? (and then fix the inconsistencies according to the decided rule)

Metadata

Metadata

Assignees

No one assigned

    Labels

    API - ConsistencyInternal Consistency of API/BehaviorDeprecateFunctionality to remove in pandasDtype ConversionsUnexpected or buggy dtype conversions

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions