Skip to content

PERF: fix JSON performance regression from 0.12 (GH5765) #6137

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 28, 2014

Conversation

Komnomnomnom
Copy link
Contributor

This fixes the main JSON performance regression in v0.13 (closes #5765). The main bottleneck was the use of intermediate NumPy scalars (from PR #4498). I introduced code to avoid the use of NumPy scalars. Also:

  • checks current locale so locale fudging is only done if necessary
  • added numpy 1.6 preprocessor switches as it still requires the use of scalars due to its weird datetime handling
  • reorgranised some of the if/else checks in objToJSON.c based on likelihood.
  • added some JSON benchmarks to vbench to try and avoid future regressions.

0.12

In [1]: import pandas as pd, numpy as np

In [2]: df = pd.DataFrame(np.random.rand(100000,10))

In [3]: %timeit df.to_json(orient='split')
10 loops, best of 3: 119 ms per loop

In [4]: pd.__version__, np.__version__
Out[4]: ('0.12.0', '1.8.0')

This PR

In [1]: import pandas as pd, numpy as np

In [2]: df = pd.DataFrame(np.random.rand(100000,10))

In [3]: %timeit df.to_json(orient='split')
10 loops, best of 3: 119 ms per loop

In [4]: pd.__version__, np.__version__
Out[4]: ('0.13.0-406-ga18e5e6', '1.8.0')

While this solves the main performance issue, using vbench this PR still appears to be a bit slower than v0.12 (~ 2 - 5%) but it's unclear where the slowdown is coming from.

@jreback do you still have a windows box / vm that you could test on?

@jreback
Copy link
Contributor

jreback commented Jan 28, 2014


-------------------------------------------------------------------------------
Test name                                    | head[ms] | base[ms] |  ratio   |
-------------------------------------------------------------------------------
packers_write_json                           |  23.3610 |  42.2543 |   0.5529 |
packers_write_json_date_index                |  31.3267 |  45.5254 |   0.6881 |
packers_read_json                            |  39.0206 |  39.3570 |   0.9915 |
packers_read_json_date_index                 |  39.2287 |  39.2920 |   0.9984 |
-------------------------------------------------------------------------------
Test name                                    | head[ms] | base[ms] |  ratio   |
-------------------------------------------------------------------------------

Ratio < 1.0 means the target commit is faster then the baseline.
Seed used: 1234

Target [0975572] : Merge branch 'json-0.13-slowdown' of https://github.com/Komnomnomnom/pandas into Komnomnomnom-json-0.13-slowdown
Base   [464c1f9] : Add Scatter-CI link to README.md

jreback added a commit that referenced this pull request Jan 28, 2014
PERF: fix JSON performance regression from 0.12 (GH5765)
@jreback jreback merged commit 54945de into pandas-dev:master Jan 28, 2014
@jreback
Copy link
Contributor

jreback commented Jan 28, 2014

looks good....tested fine on windows , fyi you can watch this: http://scatterci.github.io/ScatterCI-Pandas/

for all builds (linux/windows/sparc builds too)

@Komnomnomnom
Copy link
Contributor Author

Awesome! That looks pretty sweet, thanks.

@Komnomnomnom Komnomnomnom deleted the json-0.13-slowdown branch January 28, 2014 12:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

df.to_json() slower in 0.13.x vs 0.12.0
2 participants