Skip to content

Dotty's parser has scaling issues in parallel execution #1527

Closed
@gkossakowski

Description

@gkossakowski

Benchmarking shows that dotty's parser scales reasonably (near linearly) just up to two threads. The throughput doesn't improve much when more than two threads are added that share the same context object.

See the exact numbers here:

one thread
----------

[info] Benchmark                                                 (filePath)   Mode  Cnt    Score   Error  Units
[info] ParseBenchmark.parseTyper  ../src/dotty/tools/dotc/typer/Typer.scala  thrpt   60  159.786 ± 3.272  ops/s

two threads
-----------

[info] Benchmark                                                 (filePath)   Mode  Cnt    Score   Error  Units
[info] ParseBenchmark.parseTyper  ../src/dotty/tools/dotc/typer/Typer.scala  thrpt   60  259.886 ± 5.944  ops/s

four threads
------------

[info] Benchmark                                                 (filePath)   Mode  Cnt    Score    Error  Units
[info] ParseBenchmark.parseTyper  ../src/dotty/tools/dotc/typer/Typer.scala  thrpt   60  296.626 ± 14.163  ops/s

As you can see, the increase in throughput between two and four threads is minuscule.

From the numbers above you can see that dotty's parser achieves the performance of 300k LoC/s in single threaded execution. (300k = 159*1890, the last number is lines of code of Typers.scala)

A quick dip into a profiler suggests that parsing threads are contended on Names.termName.

Another intriguing observation is that dotty's parsing performance decreases as JVM warms up:

[info] # Warmup Iteration   1: 102.741 ops/s
[info] # Warmup Iteration   2: 168.566 ops/s
[info] # Warmup Iteration   3: 209.503 ops/s
[info] # Warmup Iteration   4: 315.379 ops/s
[info] # Warmup Iteration   5: 439.984 ops/s
[info] # Warmup Iteration   6: 445.540 ops/s
[info] # Warmup Iteration   7: 447.955 ops/s
[info] # Warmup Iteration   8: 429.745 ops/s
[info] # Warmup Iteration   9: 474.859 ops/s
[info] # Warmup Iteration  10: 473.342 ops/s
[info] # Warmup Iteration  11: 439.681 ops/s
[info] # Warmup Iteration  12: 435.863 ops/s
[info] # Warmup Iteration  13: 393.850 ops/s
[info] # Warmup Iteration  14: 405.858 ops/s
[info] # Warmup Iteration  15: 414.373 ops/s
[info] # Warmup Iteration  16: 393.982 ops/s
[info] # Warmup Iteration  17: 388.191 ops/s
[info] # Warmup Iteration  18: 372.182 ops/s
[info] # Warmup Iteration  19: 367.328 ops/s
[info] # Warmup Iteration  20: 360.737 ops/s
[info] Iteration   1: 358.978 ops/s
[info] Iteration   2: 351.729 ops/s
[info] Iteration   3: 353.012 ops/s
[info] Iteration   4: 322.184 ops/s
[info] Iteration   5: 344.890 ops/s
[info] Iteration   6: 329.241 ops/s
[info] Iteration   7: 314.066 ops/s
[info] Iteration   8: 316.703 ops/s
[info] Iteration   9: 320.233 ops/s
[info] Iteration  10: 319.633 ops/s
[info] Iteration  11: 314.716 ops/s
[info] Iteration  12: 304.024 ops/s
[info] Iteration  13: 294.362 ops/s
[info] Iteration  14: 270.942 ops/s
[info] Iteration  15: 286.218 ops/s
[info] Iteration  16: 278.624 ops/s
[info] Iteration  17: 284.292 ops/s
[info] Iteration  18: 273.424 ops/s
[info] Iteration  19: 259.550 ops/s
[info] Iteration  20: 262.592 ops/s

I collected the numbers using JMH on MBP15 with 2.7GHz Intel Core i7 and Java 8 installed on it. The code is available at: https://github.com/lampepfl/dotty/compare/master...gkossakowski:parsing-perf?expand=1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions