Skip to content

ENH: groupby.max() should not cast int to int64 but keep original data type  #42275

Closed
@rd-andreas-lay

Description

@rd-andreas-lay

Is your feature request related to a problem?

In pandas version 1.2.5., using groupby.max() on a large matrix of int8 datatype 0/1 values, pandas casts the dataframe to int64, resulting in

MemoryError: Unable to allocate 76.4 GiB for an array with shape (1915674, 5356) and data type int64

Traceback:

/python3.9/site-packages/pandas/core/dtypes/common.py in ensure_int_or_float(arr, copy)
    143     try:
    144         # error: Unexpected keyword argument "casting" for "astype"
--> 145         return arr.astype("int64", copy=copy, casting="safe")  # type: ignore[call-arg]
    146     except TypeError:
    147         pass

Describe the solution you'd like

Keep the original datatype, in this case int8.

Metadata

Metadata

Assignees

No one assigned

    Labels

    AlgosNon-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diffGroupbyPerformanceMemory or execution speed performance

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions