Skip to content

ENH: Adding pd.options.observed_true_on_all_groupbys #49904

Closed
@PMLP-novo

Description

@PMLP-novo

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

After adding observed = True to allot of my group by's in order to avoid memory crashes. I wish to be able to change the default in away that I can put on of my script.

Feature Description

I want to be able to set:

import pandas as pd
pd.options.observed_true_on_all_groupbys

I know it will make the following:
df.groupby("var",observed=False)
not being respected. But I don't think anybody would want that and I have tried to make it as clear as posible in the naming

Alternative Solutions

Impliment #43999
Or make a warning on memory usage if there is more than 100,000,000 buckets used and there is less than 1,000,000 unique values in any of the variables For example.

Additional Context

Allot of people are facing this problem https://stackoverflow.com/questions/50051210/avoiding-memory-issues-for-groupby-on-large-pandas-dataframe

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions