|
| 1 | +--- |
| 2 | +title: The Essence of Scala |
| 3 | +author: Martin Odersky |
| 4 | +authorImg: /images/martin.jpg |
| 5 | +--- |
| 6 | + |
| 7 | +What do you get if you boil Scala on a slow flame and wait until all |
| 8 | +incidental features evaporate and only the most concentrated essence |
| 9 | +remains? After doing this for 8 years we believe we have the answer: |
| 10 | +it's DOT, the calculus of dependent object types, that underlies Scala. |
| 11 | + |
| 12 | +A [paper on DOT](http://infoscience.epfl.ch/record/215280) will be |
| 13 | +presented in April at [Wadlerfest](http://events.inf.ed.ac.uk/wf2016), |
| 14 | +an event celebrating Phil Wadler's 60th birthday. There's also a prior |
| 15 | +technical report ([From F to DOT](http://arxiv.org/abs/1510.05216)) |
| 16 | +by Tiark Rompf and Nada Amin describing a slightly different version |
| 17 | +of the calculus. Each paper describes a proof of type soundness that |
| 18 | +has been machine-checked for correctness. |
| 19 | + |
| 20 | +## The DOT calculus |
| 21 | + |
| 22 | +A calculus is a kind of mini-language that is small enough to be |
| 23 | +studied formally. Translated to Scala notation, the language covered |
| 24 | +by DOT is described by the following abstract grammar: |
| 25 | + |
| 26 | + Value v = (x: T) => t Function |
| 27 | + new { x: T => ds } Object |
| 28 | + |
| 29 | + Definition d = def a = t Method definition |
| 30 | + type A = T Type |
| 31 | + |
| 32 | + Term t = v Value |
| 33 | + x Variable |
| 34 | + t1(t2) Application |
| 35 | + t.a Selection |
| 36 | + { val x = t1; t2 } Local definition |
| 37 | + |
| 38 | + Type T = Any Top type |
| 39 | + Nothing Bottom type |
| 40 | + x.A Selection |
| 41 | + (x: T1) => T2 Function |
| 42 | + { def a: T } Method declaration |
| 43 | + { type T >: T1 <: T2 } Type declaration |
| 44 | + T1 & T2 Intersection |
| 45 | + { x => T } Recursion |
| 46 | + |
| 47 | +The grammar uses several kinds of names: |
| 48 | + |
| 49 | + x for (immutable) variables |
| 50 | + a for (parameterless) methods |
| 51 | + A for types |
| 52 | + |
| 53 | +The full calculus adds to this syntax formal _typing rules_ that |
| 54 | +assign types `T` to terms `t` and formal _evaluation rules_ that |
| 55 | +describe how a program is evaluated. The following _type soundness_ |
| 56 | +property was shown with a mechanized, (i.e. machine-checked) proof: |
| 57 | + |
| 58 | +> If a term `t` has type `T`, and the evaluation of `t` terminates, then |
| 59 | + the result of the evaluation will be a value `v` of type `T`. |
| 60 | + |
| 61 | +## Difficulties |
| 62 | + |
| 63 | +Formulating the precise soundness theorem and proving it was unexpectedly hard, |
| 64 | +because it uncovered some technical challenges that had not been |
| 65 | +studied in depth before. In DOT - as well as in many programming languages - |
| 66 | +you can have conflicting definitions. For instance you might have an abstract |
| 67 | +type declaration in a base class with two conflicting aliases in subclasses: |
| 68 | + |
| 69 | + trait Base { type A } |
| 70 | + trait Sub1 extends Base { type A = String } |
| 71 | + trait Sub2 extends Base { type A = Int } |
| 72 | + trait Bad extends Sub1 with Sub2 |
| 73 | + |
| 74 | +Now, if you combine `Sub1` and `Sub2` in trait `Bad` you get a conflict, |
| 75 | +since the type `A` is supposed to be equal to both `String` and `Int`. If you do |
| 76 | +not detect the conflict and assume the equalities at face value you |
| 77 | +get `String = A = Int`, hence by transitivity `String = Int`! Once you |
| 78 | +are that far, you can of course engineer all sorts of situations where |
| 79 | +a program will typecheck but cause a wrong execution at runtime. In |
| 80 | +other words, type soundness is violated. |
| 81 | + |
| 82 | +Now, the problem is that one cannot always detect these |
| 83 | +inconsistencies, at least not by a local analysis that does not need |
| 84 | +to look at the whole program. What's worse, once you have an |
| 85 | +inconsistent set of definitions you can use these definitions to |
| 86 | +"prove" their own consistency - much like a mathematical theory that |
| 87 | +assumes `true = false` can "prove" every proposition including its own |
| 88 | +correctness. |
| 89 | + |
| 90 | +The crucial reason why type soundness still holds is this: If one |
| 91 | +compares `T` with an alias, one does so always relative to some _path_ |
| 92 | +`x` that refers to the object containing `T`. So it's really `x.T = |
| 93 | +Int`. Now, we can show that during evaluation every such path refers |
| 94 | +to some object that was created with a `new`, and that, furthermore, |
| 95 | +every such object has consistent type definitions. The tricky bit is |
| 96 | +to carefully distinguish between the full typing rules, which allow |
| 97 | +inconsistencies, and the typing rules arising from runtime values, |
| 98 | +which do not. |
| 99 | + |
| 100 | +## Why is This Important? |
| 101 | + |
| 102 | +There are at least four reasons why insights obtained in the DOT |
| 103 | +project are important. |
| 104 | + |
| 105 | + 1. They give us a well-founded explanation of _nominal typing_. |
| 106 | + Nominal typing means that a type is distinguished from others |
| 107 | + simply by having a different name. |
| 108 | + For instance, given two trait definitions |
| 109 | + |
| 110 | + trait A extends AnyRef { def f: Int } |
| 111 | + trait B extends AnyRef { def f: Int } |
| 112 | + |
| 113 | + we consider `A` and `B` to be different types, even though both |
| 114 | + traits have the same parents and both define the same members. |
| 115 | + The opposite of |
| 116 | + nominal typing is structural typing, which treats types |
| 117 | + that have the same structure as being the same. Most programming |
| 118 | + languages are at least in part nominal whereas most formal type systems, |
| 119 | + including DOT, are structural. But the abstract types in DOT |
| 120 | + provide a way to express nominal types such as classes and traits. |
| 121 | + The Wadlerfest paper contains examples that show how |
| 122 | + one can express classes for standard types such as `Boolean` and `List` in DOT. |
| 123 | + |
| 124 | + 2. They give us a stable basis on which we can study richer languages |
| 125 | + that resemble Scala more closely. For instance, we can encode |
| 126 | + type parameters as type members of objects in DOT. This encoding |
| 127 | + can give us a better understanding of the interactions of |
| 128 | + subtyping and generics. It can explain why variance rules |
| 129 | + are the way they are and what the precise typing rules for |
| 130 | + wildcard parameters `[_ <: T]`, `[_ >: T]` should be. |
| 131 | + |
| 132 | + 3. DOT also provides a blueprint for Scala compilation. The new Scala |
| 133 | + compiler _dotty_ has internal data structures that closely resemble DOT. |
| 134 | + In particular, type parameters are immediately mapped to type members, |
| 135 | + in the way we propose to encode them also in the calculus. |
| 136 | + |
| 137 | + 4. Finally, the proof principles explored in the DOT work give us guidelines |
| 138 | + to assess and treat other possible soundness issues. We now know much |
| 139 | + better what conditions must be fulfilled to ensure type soundness. |
| 140 | + This lets us put other constructs of the Scala language to the test, |
| 141 | + either to increase our confidence that they are indeed sound, or |
| 142 | + to show that they are unsound. In my next blog I will |
| 143 | + present some of the issues we have discovered through that exercise. |
| 144 | + |
0 commit comments