Skip to content

API: dtype_backend and constructors long-term #51846

Open
@jbrockmendel

Description

@jbrockmendel

On today's dev call we discussed the dtype_backend global option (decided to revert for 2.0) and the use_nullable_dtypes keyword in IO methods (decided to change to dtype_backend, except where it already exists in 1.5 where it will be deprecated in favor of dtype_backend).

Some of the reasoning centered around the fact that constructors do not currently recognize the dtype_backend option, and there was an offhand reference to adding that as a keyword to the constructors. I think we should avoid adding keywords/options when there are viable alternatives. Going one step further: long term I think we need neither a dtype_backend option nor keyword anywhere.

For IO functions, the "engine" keyword (where applicable) should determine what kind of dtypes you get back. The most relevant case is engine="pyarrow". If you want something else, you can use obj.convert_dtypes(...).

For constructors, the "dtype" keyword should be sufficient in most cases. The two cases where it is not are

a) dtype=None
The natural thing to do is infer based on the data. If the data is already a pandas object we retain the dtype. If it is a numpy object we use a numpy dtype. If it is a pyarrow object we use pd.ArrowDtype. That leaves cases where it is e.g. a list. For that we could plausibly use a global option, but it'd be simpler to just have a sensible default and tell users to use convert_dtypes (or pass a keyword) if desired.

b) dtype=int|"int"|"int64"|... where we could plausibly default to either np.int64 or pa.int64. As above, could have a global option but better to just have a sensible default and tell users to be more specific or use convert_dtypes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions