Description
Benchmarking shows that dotty's parser scales reasonably (near linearly) just up to two threads. The throughput doesn't improve much when more than two threads are added that share the same context object.
See the exact numbers here:
one thread
----------
[info] Benchmark (filePath) Mode Cnt Score Error Units
[info] ParseBenchmark.parseTyper ../src/dotty/tools/dotc/typer/Typer.scala thrpt 60 159.786 ± 3.272 ops/s
two threads
-----------
[info] Benchmark (filePath) Mode Cnt Score Error Units
[info] ParseBenchmark.parseTyper ../src/dotty/tools/dotc/typer/Typer.scala thrpt 60 259.886 ± 5.944 ops/s
four threads
------------
[info] Benchmark (filePath) Mode Cnt Score Error Units
[info] ParseBenchmark.parseTyper ../src/dotty/tools/dotc/typer/Typer.scala thrpt 60 296.626 ± 14.163 ops/s
As you can see, the increase in throughput between two and four threads is minuscule.
From the numbers above you can see that dotty's parser achieves the performance of 300k LoC/s in single threaded execution. (300k = 159*1890, the last number is lines of code of Typers.scala)
A quick dip into a profiler suggests that parsing threads are contended on Names.termName
.
Another intriguing observation is that dotty's parsing performance decreases as JVM warms up:
[info] # Warmup Iteration 1: 102.741 ops/s
[info] # Warmup Iteration 2: 168.566 ops/s
[info] # Warmup Iteration 3: 209.503 ops/s
[info] # Warmup Iteration 4: 315.379 ops/s
[info] # Warmup Iteration 5: 439.984 ops/s
[info] # Warmup Iteration 6: 445.540 ops/s
[info] # Warmup Iteration 7: 447.955 ops/s
[info] # Warmup Iteration 8: 429.745 ops/s
[info] # Warmup Iteration 9: 474.859 ops/s
[info] # Warmup Iteration 10: 473.342 ops/s
[info] # Warmup Iteration 11: 439.681 ops/s
[info] # Warmup Iteration 12: 435.863 ops/s
[info] # Warmup Iteration 13: 393.850 ops/s
[info] # Warmup Iteration 14: 405.858 ops/s
[info] # Warmup Iteration 15: 414.373 ops/s
[info] # Warmup Iteration 16: 393.982 ops/s
[info] # Warmup Iteration 17: 388.191 ops/s
[info] # Warmup Iteration 18: 372.182 ops/s
[info] # Warmup Iteration 19: 367.328 ops/s
[info] # Warmup Iteration 20: 360.737 ops/s
[info] Iteration 1: 358.978 ops/s
[info] Iteration 2: 351.729 ops/s
[info] Iteration 3: 353.012 ops/s
[info] Iteration 4: 322.184 ops/s
[info] Iteration 5: 344.890 ops/s
[info] Iteration 6: 329.241 ops/s
[info] Iteration 7: 314.066 ops/s
[info] Iteration 8: 316.703 ops/s
[info] Iteration 9: 320.233 ops/s
[info] Iteration 10: 319.633 ops/s
[info] Iteration 11: 314.716 ops/s
[info] Iteration 12: 304.024 ops/s
[info] Iteration 13: 294.362 ops/s
[info] Iteration 14: 270.942 ops/s
[info] Iteration 15: 286.218 ops/s
[info] Iteration 16: 278.624 ops/s
[info] Iteration 17: 284.292 ops/s
[info] Iteration 18: 273.424 ops/s
[info] Iteration 19: 259.550 ops/s
[info] Iteration 20: 262.592 ops/s
I collected the numbers using JMH on MBP15 with 2.7GHz Intel Core i7 and Java 8 installed on it. The code is available at: https://github.com/lampepfl/dotty/compare/master...gkossakowski:parsing-perf?expand=1