Skip to content

BUG: Pickling the subset of dataframe isn't zero-copy #55781

Open
@anmyachev

Description

@anmyachev

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
import numpy as np
import ray

df = pd.DataFrame(np.zeros((100, 100)))

# Pickling the entire dataframe is zero-copy
try:
    ray.get(ray.put(df)).values[0,0] = 10
except ValueError as err:
    # ValueError: assignment destination is read-only
    print("Zero-copy pickling")
else:
    raise RuntimeError("Not zero-copy pickling")

# Pickling the subset of dataframe isn't zero-copy
subset = df.iloc[:, 0 : 32]

# It looks like when an object is recreated, additional memory is allocated
# Can this be avoided?
ray.get(ray.put(subset)).values[0,0] = 10

Issue Description

If it is possible to zero-copy pickle a subset of dataframe, then this will be an excellent optimization for memory use.

Looks like that the logic responsible for this behavior is implemented in internals.pyx.

@jbrockmendel I saw that you recently edited pieces of code nearby, maybe you know how and whether this can be done?

Expected Behavior

Pickling the subset of dataframe is zero-copy.

Installed Versions

numpy: 1.25.2
pandas: 2.1.2
ray: 2.7.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions