Open
Description
Pandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
import numpy as np
import ray
df = pd.DataFrame(np.zeros((100, 100)))
# Pickling the entire dataframe is zero-copy
try:
ray.get(ray.put(df)).values[0,0] = 10
except ValueError as err:
# ValueError: assignment destination is read-only
print("Zero-copy pickling")
else:
raise RuntimeError("Not zero-copy pickling")
# Pickling the subset of dataframe isn't zero-copy
subset = df.iloc[:, 0 : 32]
# It looks like when an object is recreated, additional memory is allocated
# Can this be avoided?
ray.get(ray.put(subset)).values[0,0] = 10
Issue Description
If it is possible to zero-copy pickle a subset of dataframe, then this will be an excellent optimization for memory use.
Looks like that the logic responsible for this behavior is implemented in internals.pyx
.
@jbrockmendel I saw that you recently edited pieces of code nearby, maybe you know how and whether this can be done?
Expected Behavior
Pickling the subset of dataframe is zero-copy.
Installed Versions
numpy: 1.25.2
pandas: 2.1.2
ray: 2.7.1