Description
Pandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
import pyarrow as pa
s = pd.Series([1, 2], dtype=pd.ArrowDtype(pa.int32()))
r1 = s.rank(method="min")
df = s.to_frame(name="a")
r2 = df.rank(method="min")
>>> s
0 1
1 2
dtype: int32[pyarrow]
>>> df.dtypes
a int32[pyarrow]
dtype: object
>>> r1
0 1
1 2
dtype: uint64[pyarrow]
>>> r2
a
0 1.0
1 2.0
>>> r2.dtypes
a float64
dtype: object
Issue Description
When we have a dataframe backed with pyarrow type data, when we call df.rank(method="min"), returned result is not arrow backed dataframe. This behavior does not happen for Series.rank(), we could see Series.rank() returned result is still arrow backed Series.
Incorrect:
df.dtypes
a int32[pyarrow]
dtype: object
r2 = df.rank(method="min")
r2.dtypes
a float64
dtype: object
Correct:
s
0 1
1 2
dtype: int32[pyarrow]
r1 = s.rank(method="min")
r1.dtype
uint64[pyarrow]
Expected Behavior
DataFrame.rank should return pyarrow backed dataframe when original dataframe filled with pyarrow.
Installed Versions
pd.version
'2.0.0'