Skip to content

copy_to_readonly_numpy_array needlessly copies pandas series objects #1081

Closed
@RZachLamberty

Description

@RZachLamberty

the function _plotly_utils.basevalidators.copy_to_readonly_numpy_array performs a full copy of pd.Series objects which contain existing np.ndarray data as the values attribute. we could utilize the values attribute to dramatically speed up trace generation, especially for large dataframes.

environment: plotly version 3.1.0. macos high sierra 10.13.6. plotly installation via conda

working example:

import plotly
import _plotly_utils.basevalidators
import numpy

print('plotly version: {}'.format(plotly.__version__))

df = pd.DataFrame({'x': np.random.randint(0, 100, 1000000)})

# using `ipython` time magic
print('\ncoercing series')
%time v1 = _plotly_utils.basevalidators.copy_to_readonly_numpy_array(df.x)

print('\naccessing np values directly')
%time v2 = _plotly_utils.basevalidators.copy_to_readonly_numpy_array(df.x.values)

example output:

plotly version: 3.1.0

coercing series
CPU times: user 987 ms, sys: 35.5 ms, total: 1.02 s
Wall time: 854 ms

accessing np values directly
CPU times: user 1.45 ms, sys: 17 µs, total: 1.46 ms
Wall time: 1.49 ms

so a performance difference of approx 1000x

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions