PERF: fix JSON performance regression from 0.12 (GH5765) #6137

Komnomnomnom · 2014-01-28T08:05:02Z

This fixes the main JSON performance regression in v0.13 (closes #5765). The main bottleneck was the use of intermediate NumPy scalars (from PR #4498). I introduced code to avoid the use of NumPy scalars. Also:

checks current locale so locale fudging is only done if necessary
added numpy 1.6 preprocessor switches as it still requires the use of scalars due to its weird datetime handling
reorgranised some of the if/else checks in objToJSON.c based on likelihood.
added some JSON benchmarks to vbench to try and avoid future regressions.

0.12

In [1]: import pandas as pd, numpy as np

In [2]: df = pd.DataFrame(np.random.rand(100000,10))

In [3]: %timeit df.to_json(orient='split')
10 loops, best of 3: 119 ms per loop

In [4]: pd.__version__, np.__version__
Out[4]: ('0.12.0', '1.8.0')

This PR

In [1]: import pandas as pd, numpy as np

In [2]: df = pd.DataFrame(np.random.rand(100000,10))

In [3]: %timeit df.to_json(orient='split')
10 loops, best of 3: 119 ms per loop

In [4]: pd.__version__, np.__version__
Out[4]: ('0.13.0-406-ga18e5e6', '1.8.0')

While this solves the main performance issue, using vbench this PR still appears to be a bit slower than v0.12 (~ 2 - 5%) but it's unclear where the slowdown is coming from.

@jreback do you still have a windows box / vm that you could test on?

jreback · 2014-01-28T12:40:12Z


-------------------------------------------------------------------------------
Test name                                    | head[ms] | base[ms] |  ratio   |
-------------------------------------------------------------------------------
packers_write_json                           |  23.3610 |  42.2543 |   0.5529 |
packers_write_json_date_index                |  31.3267 |  45.5254 |   0.6881 |
packers_read_json                            |  39.0206 |  39.3570 |   0.9915 |
packers_read_json_date_index                 |  39.2287 |  39.2920 |   0.9984 |
-------------------------------------------------------------------------------
Test name                                    | head[ms] | base[ms] |  ratio   |
-------------------------------------------------------------------------------

Ratio < 1.0 means the target commit is faster then the baseline.
Seed used: 1234

Target [0975572] : Merge branch 'json-0.13-slowdown' of https://github.com/Komnomnomnom/pandas into Komnomnomnom-json-0.13-slowdown
Base   [464c1f9] : Add Scatter-CI link to README.md

PERF: fix JSON performance regression from 0.12 (GH5765)

jreback · 2014-01-28T12:41:07Z

looks good....tested fine on windows , fyi you can watch this: http://scatterci.github.io/ScatterCI-Pandas/

for all builds (linux/windows/sparc builds too)

Komnomnomnom · 2014-01-28T12:44:47Z

Awesome! That looks pretty sweet, thanks.

PERF: fix JSON performance regression from 0.12 (GH5765)

a18e5e6

jreback added a commit that referenced this pull request Jan 28, 2014

Merge pull request #6137 from Komnomnomnom/json-0.13-slowdown

54945de

PERF: fix JSON performance regression from 0.12 (GH5765)

jreback merged commit 54945de into pandas-dev:master Jan 28, 2014

Komnomnomnom deleted the json-0.13-slowdown branch January 28, 2014 12:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

PERF: fix JSON performance regression from 0.12 (GH5765) #6137

PERF: fix JSON performance regression from 0.12 (GH5765) #6137

Uh oh!

Komnomnomnom commented Jan 28, 2014

Uh oh!

jreback commented Jan 28, 2014

Uh oh!

jreback commented Jan 28, 2014

Uh oh!

Komnomnomnom commented Jan 28, 2014

Uh oh!

Uh oh!

Uh oh!

PERF: fix JSON performance regression from 0.12 (GH5765) #6137

PERF: fix JSON performance regression from 0.12 (GH5765) #6137

Uh oh!

Conversation

Komnomnomnom commented Jan 28, 2014

Uh oh!

jreback commented Jan 28, 2014

Uh oh!

jreback commented Jan 28, 2014

Uh oh!

Komnomnomnom commented Jan 28, 2014

Uh oh!

Uh oh!