Description
We currently have an inconsistency in how we determine the common dtype for bool + numeric.
Numpy coerces booleans to numeric values when combining with numeric dtype:
>> np.concatenate([np.array([True]), np.array([1])])
array([1, 1])
In pandas, Series does the same:
>>> pd.concat([pd.Series([True], dtype=bool), pd.Series([1.0], dtype=float)])
0 1.0
0 1.0
dtype: float64
except if they are empty, then we ensure the result is object dtype:
>>> pd.concat([pd.Series([], dtype=bool), pd.Series([], dtype=float)])
Series([], dtype: object)
And for DataFrame we return object dtype in all cases:
>>> pd.concat([pd.DataFrame({'a': np.array([], dtype=bool)}), pd.DataFrame({'a': np.array([], dtype=float)})]).dtypes
a object
dtype: object
>>> pd.concat([pd.DataFrame({'a': np.array([True], dtype=bool)}), pd.DataFrame({'a': np.array([1.0], dtype=float)})]).dtypes
a object
dtype: object
For the nullable dtypes, we also have implemented this a bit inconsistently up to now:
>>> pd.concat([pd.Series([True], dtype="boolean"), pd.Series([1], dtype="Int64")])
0 1
0 1
dtype: Int64
>>> pd.concat([pd.Series([True], dtype="boolean"), pd.Series([1], dtype="Float64")])
0 True
0 1.0
dtype: object
So here we preserve numeric dtype for Integer, but convert to object for Float. Now, the reason for this is because IntegerDtype._get_common_dtype
handles the case of boolean dtype and then uses the numpy rules to determine the result dtype, while the FloatingDtype doesn't yet handle non-float dtypes and thus results in object dtype for anything else (also for float numpy dtype, which is obviously a bug / missing feature)
Basically we need to decide what the desired behaviour is for the bool + numeric dtype combination: coerce to numeric or upcast to object? (and then fix the inconsistencies according to the decided rule)