Description
On today's dev call we discussed the dtype_backend global option (decided to revert for 2.0) and the use_nullable_dtypes keyword in IO methods (decided to change to dtype_backend, except where it already exists in 1.5 where it will be deprecated in favor of dtype_backend).
Some of the reasoning centered around the fact that constructors do not currently recognize the dtype_backend option, and there was an offhand reference to adding that as a keyword to the constructors. I think we should avoid adding keywords/options when there are viable alternatives. Going one step further: long term I think we need neither a dtype_backend option nor keyword anywhere.
For IO functions, the "engine" keyword (where applicable) should determine what kind of dtypes you get back. The most relevant case is engine="pyarrow"
. If you want something else, you can use obj.convert_dtypes(...)
.
For constructors, the "dtype" keyword should be sufficient in most cases. The two cases where it is not are
a) dtype=None
The natural thing to do is infer based on the data. If the data is already a pandas object we retain the dtype. If it is a numpy object we use a numpy dtype. If it is a pyarrow object we use pd.ArrowDtype. That leaves cases where it is e.g. a list. For that we could plausibly use a global option, but it'd be simpler to just have a sensible default and tell users to use convert_dtypes (or pass a keyword) if desired.
b) dtype=int|"int"|"int64"|... where we could plausibly default to either np.int64 or pa.int64. As above, could have a global option but better to just have a sensible default and tell users to be more specific or use convert_dtypes.