Skip to content

PERF: DataFrame.round() unnecessarily slow copared to np.round() #17254

Closed
@aberres

Description

@aberres

Code Sample, a copy-pastable example if possible

import pandas as pd
import numpy as np

np_df = np.random.randn(10000, 4000)
df = pd.DataFrame(np_df)

%timeit np.round(np_df, 2)
# 416 ms ± 10.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit df.round(2)
# 1.69 s ± 27.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit np.round(df, 2)
# 1.74 s ± 112 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Problem description

Completely unexpected DataFrage.round() showed up as a major hotspot during profiling.
When looking at the code, we see that even when rounding the complete data frame to a given number of decimals it is split into series objects which are then rounded.

I am wondering if there is a reason not to pass the underlying data frame to numpy and do the rounding there in this case.

A quick test showed that something like this would give us the numpy performance:

def faster_round(df, decimals):
    rounded = np.round(df.values, decimals)
    return pd.DataFrame(rounded, columns=df.columns, index=df.index)

%timeit faster_round(df, 2)
# 417 ms ± 14.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Metadata

Metadata

Assignees

Labels

PerformanceMemory or execution speed performance

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions