Skip to content

API: New global option to set the default dtypes to use #61620

Open
@datapythonista

Description

@datapythonista

This was already implemented before 2.0 in #50748, but then removed before the release in #51853, as in too many cases the option wasn't being respected.

The idea is to have a global option to let pandas know which dtype kind to use when data is created (the exact option name needs to be discussed, but I'll use use_arrow to illustrate):

pandas.options.mode.use_arrow = True

df = pandas.read_csv(...)  # The returned DataFrame will use pyarrow dtypes
df["foo"] = 1  # The added column will use pyarrow dtypes
df = pandas.DataFrame(...)  # The returned DataFrame will use pyarrow dtypes
...

I don't think adding the option is controversial, as it has no impact on users unless set, and it was already implemented without objections in the past.

I think the implementation requires a bit of discussion, as the exact behavior to implement is not immediately obvious, a least to me. Main points I can see

  1. Should we have an option to set pyarrow as the default (since those should be the types we expect people to use in the future), or a more generic option to set dtype_backend to numpy|nullable|pyarrow?
  2. I think at least initially it makes sense that if a user is specific about the dtype they want to use (e.g. Series([1, 2], dtype="Int32")) we let them do it. But could it make sense to have a second option force_arrow or force_dtype_backend so any operation that would use another dtype kind would fail? I think this could be helpful for users that only want to live in the pyarrow world, and it would also be helpful to identify undesired casts for us.
  3. The exact namespace (mode vs future vs others) and name of the option, which clearly will depend on the previous points

Metadata

Metadata

Assignees

No one assigned

    Labels

    API DesignNeeds DiscussionRequires discussion from core team before further action

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions