Skip to content

Commit 1730e1b

Browse files
committed
new blog post
1 parent 31ebf30 commit 1730e1b

File tree

1 file changed

+153
-0
lines changed

1 file changed

+153
-0
lines changed
Lines changed: 153 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,153 @@
1+
---
2+
layout: blog
3+
post-type: blog
4+
by: Martin Odersky
5+
title: The Essence of Scala
6+
---
7+
8+
What do you get if you boil Scala on a slow flame and wait until all
9+
incidental features evaporate and only the most concentrated essence
10+
remains? After doing this for 8 years we believe we have the answer:
11+
it's DOT, the calculus of dependent object types, that underlies Scala.
12+
13+
A [paper on DOT](http://infoscience.epfl.ch/record/215280) will be
14+
presented in April at [Wadlerfest](http://events.inf.ed.ac.uk/wf2016),
15+
an event celebrating Phil Wadler's 60th birthday. There's also a prior
16+
[technical report](http://arxiv.org/abs/1510.05216v1) by Tiark Rompf
17+
and Nada Amin describing a slightly different version of the
18+
calculus. Each version describes a proof of type soundness that has
19+
been machine-checked for correctness.
20+
21+
## The DOT calculus
22+
23+
A calculus is a kind of mini-language that is small enough to be
24+
studied formally. Translated to Scala notation, the language covered
25+
by DOT is described by the following abstract grammar:
26+
27+
Value v = (x: T) => t Function
28+
new { x: T => ds } Object
29+
30+
Definition d = def a = t Method definition
31+
type A = T Type
32+
33+
Term t = v Value
34+
x Variable
35+
t1(t2) Application
36+
t.a Selection
37+
{ val x = t1; t2 } Local definition
38+
39+
Type T = Any Top type
40+
Nothing Bottom type
41+
x.A Selection
42+
(x: T1) => T2 Function
43+
{ def a: T } Method declaration
44+
{ type T >: T1 <: T2 } Type declaration
45+
T1 & T2 Intersection
46+
{ x => T } Recursio
47+
48+
The grammar uses several kinds of names:
49+
50+
x for (immutable) variables
51+
a for (parameterless) methods
52+
A for types
53+
54+
The full calculus adds to this syntax formal _typing rules_ that assign
55+
types `T` to terms `t` and formal _evaluation rules_ that describe how a
56+
program is evaluated. The main difference between the Wadlerfest paper
57+
and the prior technical report lies in the form of these evaluation
58+
rules. The technical report uses a "big step" semantics, i.e. it
59+
defines an abstract of interpreter that explains how terms map to
60+
values. The Wadlerfest paper uses a more common "small step"
61+
semantics, i.e. it explains evaluation as step-wise rewritings of
62+
terms. To do this, it needs to restrict terms to be in [A-normal
63+
form](https://en.wikipedia.org/wiki/A-normal_form), that is, every
64+
intermediate term is abstracted out in a local definition.
65+
66+
The following _type soundness_ property was shown in each case with a
67+
mechanized proof:
68+
69+
> If a term `t` has type `T`, and the evaluation of `t` terminates, then
70+
the result of the evaluation will be a value `v` of type `T`.
71+
72+
## Difficulties
73+
74+
Formulating the precise soundness theorem and proving it was unexpectedly hard,
75+
because it uncovered a problem complex that had not been
76+
studied in depth before. In DOT - as well as in many programming languages -
77+
you can have conflicting definitions. For instance you might have an abstract
78+
type declaration in a base class with two conflicting aliases in subclasses:
79+
80+
trait Base { type A }
81+
trait Sub1 { type A = String }
82+
trait Sub2 { type A = Int }
83+
84+
Now, if you combine `Sub1` and `Sub2` in one class, you get a conflict, since the type `A` is supposed to be equal to both `String` and `Int`. If you do
85+
not detect the conflict and assume the equalities at face value you
86+
get `String = A = Int`, hence by transitivity `String = Int`! Once you
87+
are that far, you can of course engineer all sorts of situations where
88+
a program will typecheck but cause a wrong execution at runtime. In
89+
other words, type soundness is violated.
90+
91+
Now, the problem is that one cannot always detect these
92+
inconsistencies, at least not by a local analysis that does not need
93+
to look at the whole program. What's worse, once you have an
94+
inconsistent set of definitions you can use these definitions to
95+
"prove" their own consistency - much like an mathematical theory that
96+
assumes `true = false` can "prove" every proposition including its own
97+
correctness.
98+
99+
The crucial reason why type soundness still holds is this: If one
100+
compares `T` with an alias, one does so always relative to some _path_
101+
`x` that refers to the object containing `T`. So it's really `x.T =
102+
Int`. Now, we can show that during evaluation every such path refers
103+
to some object that was created with a `new`, and that, furthermore,
104+
every such object has consistent type definitions. The tricky bit is
105+
to carefully distinguish between the full typing rules, which allow
106+
inconsistencies, and the typing rules arising from runtime values,
107+
which do not.
108+
109+
## Why is This Important?
110+
111+
There are at least four reasons why insights obtained in the DOT
112+
project are important.
113+
114+
1. They give us a well-founded explanation of _nominal typing_.
115+
Nominal typing means that a type is distinguished from others
116+
simply by having a different name.
117+
For instance, given two trait definitions
118+
119+
trait A extends AnyRef { def f: Int }
120+
trait B extends AnyRef { def f: Int }
121+
122+
we consider `A` and `B` to be different types, even though both
123+
traits have the same parents and both define the same members.
124+
The opposite of
125+
nominal typing is structural typing, which treats types
126+
that have the same structure as being the same. Most programming
127+
languages are at least in part nominal whereas most formal type systems,
128+
including DOT, are structural. But the abstract types in DOT
129+
provide a way to express nominal types such as classes and traits.
130+
The Wadlerfest paper contains examples that show how
131+
one can express classes for standard types such as `Boolean` and `List` in DOT.
132+
133+
2. They give us a stable basis on which we can study richer languages
134+
that resemble Scala more closely. For instance, we can encode
135+
type parameters as type members of objects in DOT. This encoding
136+
can give us a better understanding of the interactions of
137+
subtyping and generics. It can explain why variance rules
138+
are the way they are and what the precise typing rules for
139+
wildcard parameters `[_ <: T]`, `[_ >: T]` should be.
140+
141+
3. DOT also provides a blueprint for Scala compilation. The new Scala
142+
compiler _dotty_ has internal data structures that closely resemble DOT.
143+
In particular, type parameters are immediately mapped to type members,
144+
in the way we propose to encode them also in the calculus.
145+
146+
4. Finally, the proof principles explored in the DOT work give us guidelines
147+
to assess and treat other possible soundness issues. We now know much
148+
better what conditions must be fulfilled to ensure type soundness.
149+
This lets us put other constructs of the Scala language to the test,
150+
either to increase our confidence that they are indeed sound, or
151+
to show that they are unsound. In my next blog I will
152+
present some of the things we have discovered by that exercise.
153+

0 commit comments

Comments
 (0)