Skip to content

BUG: pd.concat(..., copy=False) still causes copy on block consolidation. #34825

Open
@DamianBarabonkovQC

Description

@DamianBarabonkovQC

Problem description

When concatenating columns of the same dtype, even with copy=False option, the columns are consolidated together which involves a copy and a costly vstack. The performance is actually worse for copy=False than the default copy=True which is misleading.

There are use cases where consolidated data is not required for an application, so this unneeded performance penalty is undesired.

Sample Program

import time
import pandas as pd

template_series = pd.Series(list(range(10000)))

series_ls = []
for i in range(1000):
    series_ls.append(template_series.copy())

start_time = time.time()
df_no_copy = pd.concat(series_ls, copy=False)
print("No copy elapsed", time.time() - start_time)

start_time = time.time()
df_copy = pd.concat(series_ls, copy=True) # The default setting
print("Copy elapsed", time.time() - start_time)

Execution Time

No copy elapsed 0.07740044593811035
Copy elapsed 0.05434751510620117

Execution time is in seconds.

Problem Trace

The consolidation occurs as a result of:

if not self.copy:
    new_data._consolidate_inplace()

located near "pandas/core/reshape/concat.py:499".

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugReshapingConcat, Merge/Join, Stack/Unstack, Explode

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions