Description
SemanticDB is a fast index of program definitions in a workspace, used heavily by projects such as Metals e.g. to search for any definition while editing a file.
The Scala 3 compiler processes trees to produce SemanticDB in the ExtractSemanticDB phase, which happens in sequence with the other phases. In its present iteration it is very slow, taking about 12% of the whole compilation.
In theory, we should be able to compute SemanticDB away from the main pipeline. TASTy is an intermediate representation, also produced by the compiler, that stores all the information needed by SemanticDB - trees, positions, and symbol definitions. We propose to use TASTy Query for extracting SemanticDB in a parallel thread (rather than traversing the compiler's own trees). We hope that this will offer benefits to the user such as faster compilation times, and unlock the possibility for more throughput if control of extraction is delegated to the build tool.
By initiating this project, we expect that the TASTy format may have to change, e.g. if necessary information for SemanticDB is not included (One such example of missing information are end marker positions, necessary for correct "rename symbol" refactoring in IDEs)
A successful project will answer the following questions:
- Can SemanticDB be extracted in parallel without loss of information, compared to the control?
- What information is necessary to add to TASTy to prevent loss of information, can it be added optionally?
- Can a reduction in build times can be observed?, at least for single-module projects.
- What overhead is there in loading TASTy in parallel with TASTy Query? Is there a point that this could be impractical?
- Can the "time to produce TASTy" be reduced, compared to the control? (this has implications for pipelined compilation)