Skip to content

BUG: DataFrame.rank does not return EA types when original type was an EADtype #52829

Open
@tinadu0806

Description

@tinadu0806

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
import pyarrow as pa
s = pd.Series([1, 2], dtype=pd.ArrowDtype(pa.int32()))
r1 = s.rank(method="min")
df = s.to_frame(name="a")
r2 = df.rank(method="min")
>>> s
0    1
1    2
dtype: int32[pyarrow]
>>> df.dtypes
a    int32[pyarrow]
dtype: object
>>> r1
0    1
1    2
dtype: uint64[pyarrow]
>>> r2
     a
0  1.0
1  2.0
>>> r2.dtypes
a    float64
dtype: object

Issue Description

When we have a dataframe backed with pyarrow type data, when we call df.rank(method="min"), returned result is not arrow backed dataframe. This behavior does not happen for Series.rank(), we could see Series.rank() returned result is still arrow backed Series.

Incorrect:

df.dtypes
a int32[pyarrow]
dtype: object
r2 = df.rank(method="min")
r2.dtypes
a float64
dtype: object

Correct:

s
0 1
1 2
dtype: int32[pyarrow]
r1 = s.rank(method="min")
r1.dtype
uint64[pyarrow]

Expected Behavior

DataFrame.rank should return pyarrow backed dataframe when original dataframe filled with pyarrow.

Installed Versions

pd.version
'2.0.0'

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugDtype ConversionsUnexpected or buggy dtype conversionsExtensionArrayExtending pandas with custom dtypes or arrays.Reduction Operationssum, mean, min, max, etc.pyarrow dtype retentionop with pyarrow dtype -> expect pyarrow result

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions