new blog post

odersky · odersky · commit 1730e1b8e71b · 2016-02-03T13:13:43.000+01:00
diff --git a/blog/_posts/2016-02-03-essence-of-scala.md b/blog/_posts/2016-02-03-essence-of-scala.md
@@ -0,0 +1,153 @@
+---
+layout: blog
+post-type: blog
+by: Martin Odersky
+title: The Essence of Scala
+---
+
+What do you get if you boil Scala on a slow flame and wait until all
+incidental features evaporate and only the most concentrated essence
+remains? After doing this for 8 years we believe we have the answer:
+it's DOT, the calculus of dependent object types, that underlies Scala.
+
+A [paper on DOT](http://infoscience.epfl.ch/record/215280) will be
+presented in April at [Wadlerfest](http://events.inf.ed.ac.uk/wf2016),
+an event celebrating Phil Wadler's 60th birthday. There's also a prior
+[technical report](http://arxiv.org/abs/1510.05216v1) by Tiark Rompf
+and Nada Amin describing a slightly different version of the
+calculus. Each version describes a proof of type soundness that has
+been machine-checked for correctness.
+
+## The DOT calculus
+
+A calculus is a kind of mini-language that is small enough to be
+studied formally. Translated to Scala notation, the language covered
+by DOT is described by the following abstract grammar:
+
+    Value       v  =  (x: T) => t            Function
+                      new { x: T => ds }     Object
+
+    Definition  d  =  def a = t              Method definition
+                      type A = T             Type
+
+    Term        t  =  v                      Value
+                      x                      Variable
+                      t1(t2)                 Application
+                      t.a                    Selection
+                      { val x = t1; t2 }     Local definition
+
+    Type        T  =  Any                    Top type
+                      Nothing                Bottom type
+                      x.A                    Selection
+                      (x: T1) => T2          Function
+                      { def a: T }           Method declaration
+                      { type T >: T1 <: T2 } Type declaration
+                      T1 & T2                Intersection
+                      { x => T }             Recursio
+
+The grammar uses several kinds of names:
+
+    x      for (immutable) variables
+    a      for (parameterless) methods
+    A      for types
+
+The full calculus adds to this syntax formal _typing rules_ that assign
+types `T` to terms `t` and formal _evaluation rules_ that describe how a
+program is evaluated. The main difference between the Wadlerfest paper
+and the prior technical report lies in the form of these evaluation
+rules. The technical report uses a "big step" semantics, i.e. it
+defines an abstract of interpreter that explains how terms map to
+values. The Wadlerfest paper uses a more common "small step"
+semantics, i.e. it explains evaluation as step-wise rewritings of
+terms. To do this, it needs to restrict terms to be in [A-normal
+form](https://en.wikipedia.org/wiki/A-normal_form), that is, every
+intermediate term is abstracted out in a local definition.
+
+The following _type soundness_ property was shown in each case with a
+mechanized proof:
+
+> If a term `t` has type `T`, and the evaluation of `t` terminates, then
+  the result of the evaluation will be a value `v` of type `T`.
+
+## Difficulties
+
+Formulating the precise soundness theorem and proving it was unexpectedly hard,
+because it uncovered a problem complex that had not been
+studied in depth before. In DOT - as well as in many programming languages -
+you can have conflicting definitions. For instance you might have an abstract
+type declaration in a base class with two conflicting aliases in subclasses:
+
+     trait Base { type A }
+     trait Sub1 { type A = String }
+     trait Sub2 { type A = Int }
+
+Now, if you combine `Sub1` and `Sub2` in one class, you get a conflict, since the type `A` is supposed to be equal to both `String` and `Int`. If you do
+not detect the conflict and assume the equalities at face value you
+get `String = A = Int`, hence by transitivity `String = Int`! Once you
+are that far, you can of course engineer all sorts of situations where
+a program will typecheck but cause a wrong execution at runtime. In
+other words, type soundness is violated.
+
+Now, the problem is that one cannot always detect these
+inconsistencies, at least not by a local analysis that does not need
+to look at the whole program. What's worse, once you have an
+inconsistent set of definitions you can use these definitions to
+"prove" their own consistency - much like an mathematical theory that
+assumes `true = false` can "prove" every proposition including its own
+correctness.
+
+The crucial reason why type soundness still holds is this: If one
+compares `T` with an alias, one does so always relative to some _path_
+`x` that refers to the object containing `T`.  So it's really `x.T =
+Int`. Now, we can show that during evaluation every such path refers
+to some object that was created with a `new`, and that, furthermore,
+every such object has consistent type definitions. The tricky bit is
+to carefully distinguish between the full typing rules, which allow
+inconsistencies, and the typing rules arising from runtime values,
+which do not.
+
+## Why is This Important?
+
+There are at least four reasons why insights obtained in the DOT
+project are important.
+
+ 1. They give us a well-founded explanation of _nominal typing_.
+    Nominal typing means that a type is distinguished from others
+    simply by having a different name.
+    For instance, given two trait definitions
+
+          trait A extends AnyRef { def f: Int }
+          trait B extends AnyRef { def f: Int }
+
+    we consider `A` and `B` to be different types, even though both
+    traits have the same parents and both define the same members.
+    The opposite of
+    nominal typing is structural typing, which treats types
+    that have the same structure as being the same. Most programming
+    languages are at least in part nominal whereas most formal type systems,
+    including DOT, are structural. But the abstract types in DOT
+    provide a way to express nominal types such as classes and traits.
+    The Wadlerfest paper contains examples that show how
+    one can express classes for standard types such as `Boolean` and `List` in DOT.
+
+ 2. They give us a stable basis on which we can study richer languages
+    that resemble Scala more closely. For instance, we can encode
+    type parameters as type members of objects in DOT. This encoding
+    can give us a better understanding of the interactions of
+    subtyping and generics. It can explain why variance rules
+    are the way they are and what the precise typing rules for
+    wildcard parameters `[_ <: T]`, `[_ >: T]` should be.
+
+ 3. DOT also provides a blueprint for Scala compilation. The new Scala
+    compiler _dotty_ has internal data structures that closely resemble DOT.
+    In particular, type parameters are immediately mapped to type members,
+    in the way we propose to encode them also in the calculus.
+
+ 4. Finally, the proof principles explored in the DOT work give us guidelines
+    to assess and treat other possible soundness issues. We now know much
+    better what conditions must be fulfilled to ensure type soundness.
+    This lets us put other constructs of the Scala language to the test,
+    either to increase our confidence that they are indeed sound, or
+    to show that they are unsound. In my next blog I will
+    present some of the things we have discovered by that exercise.
+