Skip to content

Commit 5ab9e4d

Browse files
committed
New article: dotc's overall structure
1 parent a73a1d9 commit 5ab9e4d

File tree

1 file changed

+165
-0
lines changed

1 file changed

+165
-0
lines changed
Lines changed: 165 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,165 @@
1+
# Dotc's Overall Structure
2+
3+
The compiler code is found in package `dotty.tools`. It spans the
4+
following three sub-packages:
5+
6+
backend Compiler backends (currently for JVM and JS)
7+
dotc The main compiler
8+
io Helper modules for file access and classpath handling.
9+
10+
The `dotc` package contains some main classes that can be run as separate
11+
programs. The most important one is class `Main`. `Main` inherits from `Driver` which
12+
contains the highest level functions for starting a compiler and processing some sources
13+
`Driver` in turn is based on two other high-level classes, `Compiler` and `Run`.
14+
15+
## Package Structure
16+
17+
Most functionality of `dotc` is implemented in subpackages of `dotc`. Here's a list of sub-packages
18+
and their focus.
19+
20+
ast Abstract syntax trees,
21+
config Compiler configuration, settings, platform specific definitions.
22+
core Core data structures and operations, with specific subpackages for:
23+
24+
core.classfile Reading of Java classfiles into core data structures
25+
core.tasty Reading and writing of TASTY files to/from core data structures
26+
core.unpickleScala2 Reading of Scala2 symbol information into core data structures
27+
28+
parsing Scanner and parser
29+
printing Pretty-printing trees, types and other data
30+
repl The interactive REPL
31+
reporting Reporting of error messages, warnings and other info.
32+
rewrite Helpers for rewriting Scala 2's constructs into dotty's.
33+
transform Miniphases and helpers for tree transformations.
34+
typer Type-checking and other frontend phases
35+
util General purpose utility classes and modules.
36+
37+
## Contexts
38+
39+
`dotc` has almost no global state (the only significant bit of global state is the name table,
40+
which is used to hash strings into unique names). Instead, all essential bits of information that
41+
can vary over a compiler run are collected in a [Context](https://github.com/lampepfl/dotty/blob/master/src/dotty/tools/dotc/core/Context.scala). Most methods in `dotc` take a Context value as an implicit parameter.
42+
43+
Contexts give a convenient way to customize values in some part of the
44+
call-graph. To run, e.g. some compiler function `f` at a given
45+
phase `phase`, we invoke `f` with an explicit context parameter, like
46+
this
47+
48+
f(/*normal args*/)(ctx.withPhase(phase))
49+
50+
This assumes that `f` is defined in way most compiler functions are:
51+
52+
def f(/*normal parameters*/)(implicit ctx: Context) ...
53+
54+
Compiler code follows the convention that all implicit `Context`
55+
parameters are named `ctx`. This is important to avoid implicit
56+
ambiguities in the case where nested methods contain each a Context
57+
parameters. The common name ensures then that the implicit parameters
58+
properly shadow each other.
59+
60+
Sometimes we want to make sure that implicit contexts are not captured
61+
in closures or other long-lived objects, be it because we want to
62+
enforce that nested methods each get their own implicit context, or
63+
because we want to avoid a space leak in the case where a closure can
64+
survive several compiler runs. A typical case is a completer for a
65+
symbol representing an external class, which produces the attributes
66+
of the symbol on demand, and which might never be invoked. In that
67+
case we follow the convention that any context parameter is explicit,
68+
not implicit, so we can track where it is used, and that it has a name
69+
different from `ctx`. Commonly used is `ictx` for "initialization
70+
context".
71+
72+
With these two conventions is has turned out that the use of implicit
73+
contexts as an dependency injection and bulk parameterization device
74+
worked exceptionally well. There were not very many bugs related to
75+
passing the wrong context by accident.
76+
77+
## Compiler Phases
78+
79+
Seen from a temporal perspective, the `dotc` compiler consists of a list of phases.
80+
The current list of phases is specified in class [Compiler] as follows:
81+
82+
```scala
83+
def phases: List[List[Phase]] = List(
84+
List(new FrontEnd), // Compiler frontend: scanner, parser, namer, typer
85+
List(new PostTyper), // Additional checks and cleanups after type checking
86+
List(new Pickler), // Generate TASTY info
87+
List(new FirstTransform, // Some transformations to put trees into a canonical form
88+
new CheckReentrant), // Internal use only: Check that compiled program has no data races involving global vars
89+
List(new RefChecks, // Various checks mostly related to abstract members and overriding
90+
new CheckStatic, // Check restrictions that apply to @static members
91+
new ElimRepeated, // Rewrite vararg parameters and arguments
92+
new NormalizeFlags, // Rewrite some definition flags
93+
new ExtensionMethods, // Expand methods of value classes with extension methods
94+
new ExpandSAMs, // Expand single abstract method closures to anonymous classes
95+
new TailRec, // Rewrite tail recursion to loops
96+
new LiftTry, // Put try expressions that might execute on non-empty stacks into their own methods
97+
new ClassOf), // Expand `Predef.classOf` calls.
98+
List(new PatternMatcher, // Compile pattern matches
99+
new ExplicitOuter, // Add accessors to outer classes from nested ones.
100+
new ExplicitSelf, // Make references to non-trivial self types explicit as casts
101+
new CrossCastAnd, // Normalize selections involving intersection types.
102+
new Splitter), // Expand selections involving union types into conditionals
103+
List(new VCInlineMethods, // Inlines calls to value class methods
104+
new SeqLiterals, // Express vararg arguments as arrays
105+
new InterceptedMethods, // Special handling of `==`, `|=`, `getClass` methods
106+
new Getters, // Replace non-private vals and vars with getter defs (fields are added later)
107+
new ElimByName, // Expand by-name parameters and arguments
108+
new AugmentScala2Traits, // Expand traits defined in Scala 2.11 to simulate old-style rewritings
109+
new ResolveSuper), // Implement super accessors and add forwarders to trait methods
110+
List(new Erasure), // Rewrite types to JVM model, erasing all type parameters, abstract types and refinements.
111+
List(new ElimErasedValueType, // Expand erased value types to their underlying implementation types
112+
new VCElideAllocations, // Peep-hole optimization to eliminate unnecessary value class allocations
113+
new Mixin, // Expand trait fields and trait initializers
114+
new LazyVals, // Expand lazy vals
115+
new Memoize, // Add private fields to getters and setters
116+
new LinkScala2ImplClasses, // Forward calls to the implementation classes of traits defined by Scala 2.11
117+
new NonLocalReturns, // Expand non-local returns
118+
new CapturedVars, // Represent vars captured by closures as heap objects
119+
new Constructors, // Collect initialization code in primary constructors
120+
// Note: constructors changes decls in transformTemplate, no InfoTransformers should be added after it
121+
new FunctionalInterfaces,// Rewrites closures to implement @specialized types of Functions.
122+
new GetClass), // Rewrites getClass calls on primitive types.
123+
List(new LambdaLift, // Lifts out nested functions to class scope, storing free variables in environments
124+
// Note: in this mini-phase block scopes are incorrect. No phases that rely on scopes should be here
125+
new ElimStaticThis, // Replace `this` references to static objects by global identifiers
126+
new Flatten, // Lift all inner classes to package scope
127+
new RestoreScopes), // Repair scopes rendered invalid by moving definitions in prior phases of the group
128+
List(new ExpandPrivate, // Widen private definitions accessed from nested classes
129+
new CollectEntryPoints, // Find classes with main methods
130+
new LabelDefs), // Converts calls to labels to jumps
131+
List(new GenSJSIR), // Generate .js code
132+
List(new GenBCode) // Generate JVM bytecode
133+
)
134+
```
135+
136+
Note that phases are grouped, so the `phases` value is a
137+
`List[List[Phase]]`. The idea is that all phases in a group are be
138+
*fused* into a single tree traversal. That way, phases can be kept
139+
small (most phases perform a single function) without requiring an
140+
excessive number of tree traversals (which are costly, because they
141+
have generally bad cache locality).
142+
143+
Phases fall into 4 categories:
144+
145+
- Frontend phases: `Frontend`, `PostTyper` and `Pickler`. `FrontEnd` parses the source programs and generates
146+
untyped abstract syntax trees, which are then typechecked and transformed into typed abstract syntax trees.
147+
`PostTyper` performs checks and cleanups that require a fully typed program. In particular, it
148+
149+
- creates super accessors representing `super` calls in traits
150+
- creates implementations of synthetic (compiler-implemented) methods
151+
- avoids storing parameters passed unchanged from subclass to superclass in duplicate fields.
152+
153+
Finally `Pickler` serializes the typed syntax trees produced by the frontend as TASTY data structures.
154+
155+
- High-level transformations: All phases from `FirstTransform` to `Erasure`. Most of these phases transform
156+
syntax trees, expanding high-level constructs to more primitive ones. The last phase in the group, `Erasure`
157+
translates all types into types supported directly by the JVM. To do this, it performs another type checking
158+
pass, but using the rules of the JVM's type system instead of Scala's.
159+
160+
- Low-level transformations: All phases from `ElimErasedValueType` to `LabelDefs`. These
161+
further transform trees until they are just a structured version of Java bytecode.
162+
163+
- Code generators: These map the transformed trees to Java classfiles or Javascript files.
164+
165+

0 commit comments

Comments
 (0)