Skip to content

Commit d1ffa3e

Browse files
committed
Merge pull request #1172 from dotty-staging/compiler-docs
First of a series of compiler design documents
2 parents 6ae7051 + 98a69ca commit d1ffa3e

35 files changed

+345
-82
lines changed
Lines changed: 174 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,174 @@
1+
# Dotc's Overall Structure
2+
3+
The compiler code is found in package [dotty.tools](https://github.com/lampepfl/dotty/tree/master/src/dotty/tools). It spans the
4+
following three sub-packages:
5+
6+
backend Compiler backends (currently for JVM and JS)
7+
dotc The main compiler
8+
io Helper modules for file access and classpath handling.
9+
10+
The [dotc](https://github.com/lampepfl/dotty/tree/master/src/dotty/tools/dotc)
11+
package contains some main classes that can be run as separate
12+
programs. The most important one is class
13+
[Main](https://github.com/lampepfl/dotty/blob/master/src/dotty/tools/dotc/Main.scala).
14+
`Main` inherits from
15+
[Driver](https://github.com/lampepfl/dotty/blob/master/src/dotty/tools/dotc/Driver.scala) which
16+
contains the highest level functions for starting a compiler and processing some sources.
17+
`Driver` in turn is based on two other high-level classes,
18+
[Compiler](https://github.com/lampepfl/dotty/blob/master/src/dotty/tools/dotc/Compiler.scala) and
19+
[Run](https://github.com/lampepfl/dotty/blob/master/src/dotty/tools/dotc/Run.scala).
20+
21+
## Package Structure
22+
23+
Most functionality of `dotc` is implemented in subpackages of `dotc`. Here's a list of sub-packages
24+
and their focus.
25+
26+
ast Abstract syntax trees,
27+
config Compiler configuration, settings, platform specific definitions.
28+
core Core data structures and operations, with specific subpackages for:
29+
30+
core.classfile Reading of Java classfiles into core data structures
31+
core.tasty Reading and writing of TASTY files to/from core data structures
32+
core.unpickleScala2 Reading of Scala2 symbol information into core data structures
33+
34+
parsing Scanner and parser
35+
printing Pretty-printing trees, types and other data
36+
repl The interactive REPL
37+
reporting Reporting of error messages, warnings and other info.
38+
rewrite Helpers for rewriting Scala 2's constructs into dotty's.
39+
transform Miniphases and helpers for tree transformations.
40+
typer Type-checking and other frontend phases
41+
util General purpose utility classes and modules.
42+
43+
## Contexts
44+
45+
`dotc` has almost no global state (the only significant bit of global state is the name table,
46+
which is used to hash strings into unique names). Instead, all essential bits of information that
47+
can vary over a compiler run are collected in a
48+
[Context](https://github.com/lampepfl/dotty/blob/master/src/dotty/tools/dotc/core/Contexts.scala).
49+
Most methods in `dotc` take a Context value as an implicit parameter.
50+
51+
Contexts give a convenient way to customize values in some part of the
52+
call-graph. To run, e.g. some compiler function `f` at a given
53+
phase `phase`, we invoke `f` with an explicit context parameter, like
54+
this
55+
56+
f(/*normal args*/)(ctx.withPhase(phase))
57+
58+
This assumes that `f` is defined in the way most compiler functions are:
59+
60+
def f(/*normal parameters*/)(implicit ctx: Context) ...
61+
62+
Compiler code follows the convention that all implicit `Context`
63+
parameters are named `ctx`. This is important to avoid implicit
64+
ambiguities in the case where nested methods contain each a Context
65+
parameters. The common name ensures then that the implicit parameters
66+
properly shadow each other.
67+
68+
Sometimes we want to make sure that implicit contexts are not captured
69+
in closures or other long-lived objects, be it because we want to
70+
enforce that nested methods each get their own implicit context, or
71+
because we want to avoid a space leak in the case where a closure can
72+
survive several compiler runs. A typical case is a completer for a
73+
symbol representing an external class, which produces the attributes
74+
of the symbol on demand, and which might never be invoked. In that
75+
case we follow the convention that any context parameter is explicit,
76+
not implicit, so we can track where it is used, and that it has a name
77+
different from `ctx`. Commonly used is `ictx` for "initialization
78+
context".
79+
80+
With these two conventions in place, it has turned out that implicit
81+
contexts work amazingly well as a device for dependency injection and
82+
bulk parameterization. There is of course always the danger that
83+
an unexpected implicit will be passed, but in practice this has not turned out to
84+
be much of a problem.
85+
86+
## Compiler Phases
87+
88+
Seen from a temporal perspective, the `dotc` compiler consists of a list of phases.
89+
The current list of phases is specified in class [Compiler](https://github.com/lampepfl/dotty/blob/master/src/dotty/tools/dotc/Compiler.scala) as follows:
90+
91+
```scala
92+
def phases: List[List[Phase]] = List(
93+
List(new FrontEnd), // Compiler frontend: scanner, parser, namer, typer
94+
List(new PostTyper), // Additional checks and cleanups after type checking
95+
List(new Pickler), // Generate TASTY info
96+
List(new FirstTransform, // Some transformations to put trees into a canonical form
97+
new CheckReentrant), // Internal use only: Check that compiled program has no data races involving global vars
98+
List(new RefChecks, // Various checks mostly related to abstract members and overriding
99+
new CheckStatic, // Check restrictions that apply to @static members
100+
new ElimRepeated, // Rewrite vararg parameters and arguments
101+
new NormalizeFlags, // Rewrite some definition flags
102+
new ExtensionMethods, // Expand methods of value classes with extension methods
103+
new ExpandSAMs, // Expand single abstract method closures to anonymous classes
104+
new TailRec, // Rewrite tail recursion to loops
105+
new LiftTry, // Put try expressions that might execute on non-empty stacks into their own methods
106+
new ClassOf), // Expand `Predef.classOf` calls.
107+
List(new PatternMatcher, // Compile pattern matches
108+
new ExplicitOuter, // Add accessors to outer classes from nested ones.
109+
new ExplicitSelf, // Make references to non-trivial self types explicit as casts
110+
new CrossCastAnd, // Normalize selections involving intersection types.
111+
new Splitter), // Expand selections involving union types into conditionals
112+
List(new VCInlineMethods, // Inlines calls to value class methods
113+
new SeqLiterals, // Express vararg arguments as arrays
114+
new InterceptedMethods, // Special handling of `==`, `|=`, `getClass` methods
115+
new Getters, // Replace non-private vals and vars with getter defs (fields are added later)
116+
new ElimByName, // Expand by-name parameters and arguments
117+
new AugmentScala2Traits, // Expand traits defined in Scala 2.11 to simulate old-style rewritings
118+
new ResolveSuper), // Implement super accessors and add forwarders to trait methods
119+
List(new Erasure), // Rewrite types to JVM model, erasing all type parameters, abstract types and refinements.
120+
List(new ElimErasedValueType, // Expand erased value types to their underlying implementation types
121+
new VCElideAllocations, // Peep-hole optimization to eliminate unnecessary value class allocations
122+
new Mixin, // Expand trait fields and trait initializers
123+
new LazyVals, // Expand lazy vals
124+
new Memoize, // Add private fields to getters and setters
125+
new LinkScala2ImplClasses, // Forward calls to the implementation classes of traits defined by Scala 2.11
126+
new NonLocalReturns, // Expand non-local returns
127+
new CapturedVars, // Represent vars captured by closures as heap objects
128+
new Constructors, // Collect initialization code in primary constructors
129+
// Note: constructors changes decls in transformTemplate, no InfoTransformers should be added after it
130+
new FunctionalInterfaces,// Rewrites closures to implement @specialized types of Functions.
131+
new GetClass), // Rewrites getClass calls on primitive types.
132+
List(new LambdaLift, // Lifts out nested functions to class scope, storing free variables in environments
133+
// Note: in this mini-phase block scopes are incorrect. No phases that rely on scopes should be here
134+
new ElimStaticThis, // Replace `this` references to static objects by global identifiers
135+
new Flatten, // Lift all inner classes to package scope
136+
new RestoreScopes), // Repair scopes rendered invalid by moving definitions in prior phases of the group
137+
List(new ExpandPrivate, // Widen private definitions accessed from nested classes
138+
new CollectEntryPoints, // Find classes with main methods
139+
new LabelDefs), // Converts calls to labels to jumps
140+
List(new GenSJSIR), // Generate .js code
141+
List(new GenBCode) // Generate JVM bytecode
142+
)
143+
```
144+
145+
Note that phases are grouped, so the `phases` method is of type
146+
`List[List[Phase]]`. The idea is that all phases in a group are
147+
*fused* into a single tree traversal. That way, phases can be kept
148+
small (most phases perform a single function) without requiring an
149+
excessive number of tree traversals (which are costly, because they
150+
have generally bad cache locality).
151+
152+
Phases fall into four categories:
153+
154+
- Frontend phases: `Frontend`, `PostTyper` and `Pickler`. `FrontEnd` parses the source programs and generates
155+
untyped abstract syntax trees, which are then typechecked and transformed into typed abstract syntax trees.
156+
`PostTyper` performs checks and cleanups that require a fully typed program. In particular, it
157+
158+
- creates super accessors representing `super` calls in traits
159+
- creates implementations of synthetic (compiler-implemented) methods
160+
- avoids storing parameters passed unchanged from subclass to superclass in duplicate fields.
161+
162+
Finally `Pickler` serializes the typed syntax trees produced by the frontend as TASTY data structures.
163+
164+
- High-level transformations: All phases from `FirstTransform` to `Erasure`. Most of these phases transform
165+
syntax trees, expanding high-level constructs to more primitive ones. The last phase in the group, `Erasure`
166+
translates all types into types supported directly by the JVM. To do this, it performs another type checking
167+
pass, but using the rules of the JVM's type system instead of Scala's.
168+
169+
- Low-level transformations: All phases from `ElimErasedValueType` to `LabelDefs`. These
170+
further transform trees until they are essentially a structured version of Java bytecode.
171+
172+
- Code generators: These map the transformed trees to Java classfiles or Javascript files.
173+
174+

docs/dotc-internals/periods.md

Lines changed: 94 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,94 @@
1+
# Dotc's concept of time
2+
3+
Conceptually, the `dotc` compiler's job is to maintain views of
4+
various artifacts associated with source code at all points in time.
5+
But what is *time* for `dotc`? In fact, it is a combination of
6+
compiler runs and compiler phases.
7+
8+
The *hours* of the compiler's clocks are measured in compiler
9+
[runs](https://github.com/lampepfl/dotty/blob/master/src/dotty/tools/dotc/Run.scala). Every
10+
run creates a new hour, which follows all the compiler runs (hours) that
11+
happened before. `dotc` is designed to be used as an incremental
12+
compiler that can support incremental builds, as well as interactions
13+
in an IDE and a REPL. This means that new runs can occur quite
14+
frequently. At the extreme, every keystroke in an editor or REPL can
15+
potentially launch a new compiler run, so potentially an "hour" of
16+
compiler time might take only a fraction of a second in real time.
17+
18+
The *minutes* of the compiler's clocks are measured in phases. At every
19+
compiler run, the compiler cycles through a number of
20+
[phases](https://github.com/lampepfl/dotty/blob/master/src/dotty/tools/dotc/core/Phases.scala).
21+
The list of phases is defined in the [Compiler]object
22+
There are currently about 60 phases per run, so the minutes/hours
23+
analogy works out roughly. After every phase the view the compiler has
24+
of the world changes: trees are transformed, types are gradually simplified
25+
from Scala types to JVM types, definitions are rearranged, and so on.
26+
27+
Many pieces in the information compiler are time-dependent. For
28+
instance, a Scala symbol representing a definition has a type, but
29+
that type will usually change as one goes from the higher-level Scala
30+
view of things to the lower-level JVM view. There are different ways
31+
to deal with this. Many compilers change the type of a symbol
32+
destructively according to the "current phase". Another, more
33+
functional approach might be to have different symbols representing
34+
the same definition at different phases, which each symbol carrying a
35+
different immutable type. `dotc` employs yet another scheme, which is
36+
inspired by functional reactive programming (FRP): Symbols carry not a
37+
single type, but a function from compiler phase to type. So the type
38+
of a symbol is a time-indexed function, where time ranges over
39+
compiler phases.
40+
41+
Typically, the definition of a symbol or other quantity remains stable
42+
for a number of phases. This leads us to the concept of a
43+
[period](https://github.com/lampepfl/dotty/blob/master/src/dotty/tools/dotc/core/Periods.scala).
44+
Conceptually, period is an interval of some given phases in a given
45+
compiler run. Periods are conceptually represented by three pieces of
46+
information
47+
48+
- the ID of the current run,
49+
- the ID of the phase starting the period
50+
- the number of phases in the period
51+
52+
All three pieces of information are encoded in a value class over a 32 bit integer.
53+
Here's the API for class `Period`:
54+
55+
```scala
56+
class Period(val code: Int) extends AnyVal {
57+
def runId: RunId // The run identifier of this period.
58+
def firstPhaseId: PhaseId // The first phase of this period
59+
def lastPhaseId: PhaseId // The last phase of this period
60+
def phaseId: PhaseId // The phase identifier of this single-phase period.
61+
62+
def containsPhaseId(id: PhaseId): Boolean
63+
def contains(that: Period): Boolean
64+
def overlaps(that: Period): Boolean
65+
66+
def & (that: Period): Period
67+
def | (that: Period): Period
68+
}
69+
```
70+
71+
We can access the parts of a period using `runId`, `firstPhaseId`,
72+
`lastPhaseId`, or using `phaseId` for periods consisting only of a
73+
single phase. They return `RunId` or `PhaseId` values, which are
74+
aliases of `Int`. `containsPhaseId`, `contains` and `overlaps` test
75+
whether a period contains a phase or a period as a sub-interval, or
76+
whether the interval overlaps with another period. Finally, `&` and
77+
`|` produce the intersection and the union of two period intervals
78+
(the union operation `|` takes as `runId` the `runId` of its left
79+
operand, as periods spanning different `runId`s cannot be constructed.
80+
81+
Periods are constructed using two `apply` methods:
82+
83+
```scala
84+
object Period {
85+
86+
/** The single-phase period consisting of given run id and phase id */
87+
def apply(rid: RunId, pid: PhaseId): Period }
88+
89+
/** The period consisting of given run id, and lo/hi phase ids */
90+
def apply(rid: RunId, loPid: PhaseId, hiPid: PhaseId): Period
91+
}
92+
```
93+
94+
As a sentinel value there's `Nowhere`, a period that is empty.

src/dotty/tools/dotc/Bench.scala

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,10 @@ package dotc
88
import core.Contexts.Context
99
import reporting.Reporter
1010

11+
/** A main class for running compiler benchmarks. Can instantiate a given
12+
* number of compilers and run each (sequentially) a given number of times
13+
* on the same sources.
14+
*/
1115
object Bench extends Driver {
1216

1317
@sharable private var numRuns = 1

0 commit comments

Comments
 (0)