From 1730e1b8e71b3d89aa81e6116b007488cc8b4172 Mon Sep 17 00:00:00 2001 From: Martin Odersky Date: Wed, 3 Feb 2016 13:13:43 +0100 Subject: [PATCH 1/5] new blog post --- blog/_posts/2016-02-03-essence-of-scala.md | 153 +++++++++++++++++++++ 1 file changed, 153 insertions(+) create mode 100644 blog/_posts/2016-02-03-essence-of-scala.md diff --git a/blog/_posts/2016-02-03-essence-of-scala.md b/blog/_posts/2016-02-03-essence-of-scala.md new file mode 100644 index 000000000..457e9c1a6 --- /dev/null +++ b/blog/_posts/2016-02-03-essence-of-scala.md @@ -0,0 +1,153 @@ +--- +layout: blog +post-type: blog +by: Martin Odersky +title: The Essence of Scala +--- + +What do you get if you boil Scala on a slow flame and wait until all +incidental features evaporate and only the most concentrated essence +remains? After doing this for 8 years we believe we have the answer: +it's DOT, the calculus of dependent object types, that underlies Scala. + +A [paper on DOT](http://infoscience.epfl.ch/record/215280) will be +presented in April at [Wadlerfest](http://events.inf.ed.ac.uk/wf2016), +an event celebrating Phil Wadler's 60th birthday. There's also a prior +[technical report](http://arxiv.org/abs/1510.05216v1) by Tiark Rompf +and Nada Amin describing a slightly different version of the +calculus. Each version describes a proof of type soundness that has +been machine-checked for correctness. + +## The DOT calculus + +A calculus is a kind of mini-language that is small enough to be +studied formally. Translated to Scala notation, the language covered +by DOT is described by the following abstract grammar: + + Value v = (x: T) => t Function + new { x: T => ds } Object + + Definition d = def a = t Method definition + type A = T Type + + Term t = v Value + x Variable + t1(t2) Application + t.a Selection + { val x = t1; t2 } Local definition + + Type T = Any Top type + Nothing Bottom type + x.A Selection + (x: T1) => T2 Function + { def a: T } Method declaration + { type T >: T1 <: T2 } Type declaration + T1 & T2 Intersection + { x => T } Recursio + +The grammar uses several kinds of names: + + x for (immutable) variables + a for (parameterless) methods + A for types + +The full calculus adds to this syntax formal _typing rules_ that assign +types `T` to terms `t` and formal _evaluation rules_ that describe how a +program is evaluated. The main difference between the Wadlerfest paper +and the prior technical report lies in the form of these evaluation +rules. The technical report uses a "big step" semantics, i.e. it +defines an abstract of interpreter that explains how terms map to +values. The Wadlerfest paper uses a more common "small step" +semantics, i.e. it explains evaluation as step-wise rewritings of +terms. To do this, it needs to restrict terms to be in [A-normal +form](https://en.wikipedia.org/wiki/A-normal_form), that is, every +intermediate term is abstracted out in a local definition. + +The following _type soundness_ property was shown in each case with a +mechanized proof: + +> If a term `t` has type `T`, and the evaluation of `t` terminates, then + the result of the evaluation will be a value `v` of type `T`. + +## Difficulties + +Formulating the precise soundness theorem and proving it was unexpectedly hard, +because it uncovered a problem complex that had not been +studied in depth before. In DOT - as well as in many programming languages - +you can have conflicting definitions. For instance you might have an abstract +type declaration in a base class with two conflicting aliases in subclasses: + + trait Base { type A } + trait Sub1 { type A = String } + trait Sub2 { type A = Int } + +Now, if you combine `Sub1` and `Sub2` in one class, you get a conflict, since the type `A` is supposed to be equal to both `String` and `Int`. If you do +not detect the conflict and assume the equalities at face value you +get `String = A = Int`, hence by transitivity `String = Int`! Once you +are that far, you can of course engineer all sorts of situations where +a program will typecheck but cause a wrong execution at runtime. In +other words, type soundness is violated. + +Now, the problem is that one cannot always detect these +inconsistencies, at least not by a local analysis that does not need +to look at the whole program. What's worse, once you have an +inconsistent set of definitions you can use these definitions to +"prove" their own consistency - much like an mathematical theory that +assumes `true = false` can "prove" every proposition including its own +correctness. + +The crucial reason why type soundness still holds is this: If one +compares `T` with an alias, one does so always relative to some _path_ +`x` that refers to the object containing `T`. So it's really `x.T = +Int`. Now, we can show that during evaluation every such path refers +to some object that was created with a `new`, and that, furthermore, +every such object has consistent type definitions. The tricky bit is +to carefully distinguish between the full typing rules, which allow +inconsistencies, and the typing rules arising from runtime values, +which do not. + +## Why is This Important? + +There are at least four reasons why insights obtained in the DOT +project are important. + + 1. They give us a well-founded explanation of _nominal typing_. + Nominal typing means that a type is distinguished from others + simply by having a different name. + For instance, given two trait definitions + + trait A extends AnyRef { def f: Int } + trait B extends AnyRef { def f: Int } + + we consider `A` and `B` to be different types, even though both + traits have the same parents and both define the same members. + The opposite of + nominal typing is structural typing, which treats types + that have the same structure as being the same. Most programming + languages are at least in part nominal whereas most formal type systems, + including DOT, are structural. But the abstract types in DOT + provide a way to express nominal types such as classes and traits. + The Wadlerfest paper contains examples that show how + one can express classes for standard types such as `Boolean` and `List` in DOT. + + 2. They give us a stable basis on which we can study richer languages + that resemble Scala more closely. For instance, we can encode + type parameters as type members of objects in DOT. This encoding + can give us a better understanding of the interactions of + subtyping and generics. It can explain why variance rules + are the way they are and what the precise typing rules for + wildcard parameters `[_ <: T]`, `[_ >: T]` should be. + + 3. DOT also provides a blueprint for Scala compilation. The new Scala + compiler _dotty_ has internal data structures that closely resemble DOT. + In particular, type parameters are immediately mapped to type members, + in the way we propose to encode them also in the calculus. + + 4. Finally, the proof principles explored in the DOT work give us guidelines + to assess and treat other possible soundness issues. We now know much + better what conditions must be fulfilled to ensure type soundness. + This lets us put other constructs of the Scala language to the test, + either to increase our confidence that they are indeed sound, or + to show that they are unsound. In my next blog I will + present some of the things we have discovered by that exercise. + From 8fc8b4f3e3a064999b3af199b586d3cf8247561d Mon Sep 17 00:00:00 2001 From: Martin Odersky Date: Wed, 3 Feb 2016 13:15:49 +0100 Subject: [PATCH 2/5] new blog post --- blog/_posts/2016-02-03-essence-of-scala.md | 153 +++++++++++++++++++++ 1 file changed, 153 insertions(+) create mode 100644 blog/_posts/2016-02-03-essence-of-scala.md diff --git a/blog/_posts/2016-02-03-essence-of-scala.md b/blog/_posts/2016-02-03-essence-of-scala.md new file mode 100644 index 000000000..457e9c1a6 --- /dev/null +++ b/blog/_posts/2016-02-03-essence-of-scala.md @@ -0,0 +1,153 @@ +--- +layout: blog +post-type: blog +by: Martin Odersky +title: The Essence of Scala +--- + +What do you get if you boil Scala on a slow flame and wait until all +incidental features evaporate and only the most concentrated essence +remains? After doing this for 8 years we believe we have the answer: +it's DOT, the calculus of dependent object types, that underlies Scala. + +A [paper on DOT](http://infoscience.epfl.ch/record/215280) will be +presented in April at [Wadlerfest](http://events.inf.ed.ac.uk/wf2016), +an event celebrating Phil Wadler's 60th birthday. There's also a prior +[technical report](http://arxiv.org/abs/1510.05216v1) by Tiark Rompf +and Nada Amin describing a slightly different version of the +calculus. Each version describes a proof of type soundness that has +been machine-checked for correctness. + +## The DOT calculus + +A calculus is a kind of mini-language that is small enough to be +studied formally. Translated to Scala notation, the language covered +by DOT is described by the following abstract grammar: + + Value v = (x: T) => t Function + new { x: T => ds } Object + + Definition d = def a = t Method definition + type A = T Type + + Term t = v Value + x Variable + t1(t2) Application + t.a Selection + { val x = t1; t2 } Local definition + + Type T = Any Top type + Nothing Bottom type + x.A Selection + (x: T1) => T2 Function + { def a: T } Method declaration + { type T >: T1 <: T2 } Type declaration + T1 & T2 Intersection + { x => T } Recursio + +The grammar uses several kinds of names: + + x for (immutable) variables + a for (parameterless) methods + A for types + +The full calculus adds to this syntax formal _typing rules_ that assign +types `T` to terms `t` and formal _evaluation rules_ that describe how a +program is evaluated. The main difference between the Wadlerfest paper +and the prior technical report lies in the form of these evaluation +rules. The technical report uses a "big step" semantics, i.e. it +defines an abstract of interpreter that explains how terms map to +values. The Wadlerfest paper uses a more common "small step" +semantics, i.e. it explains evaluation as step-wise rewritings of +terms. To do this, it needs to restrict terms to be in [A-normal +form](https://en.wikipedia.org/wiki/A-normal_form), that is, every +intermediate term is abstracted out in a local definition. + +The following _type soundness_ property was shown in each case with a +mechanized proof: + +> If a term `t` has type `T`, and the evaluation of `t` terminates, then + the result of the evaluation will be a value `v` of type `T`. + +## Difficulties + +Formulating the precise soundness theorem and proving it was unexpectedly hard, +because it uncovered a problem complex that had not been +studied in depth before. In DOT - as well as in many programming languages - +you can have conflicting definitions. For instance you might have an abstract +type declaration in a base class with two conflicting aliases in subclasses: + + trait Base { type A } + trait Sub1 { type A = String } + trait Sub2 { type A = Int } + +Now, if you combine `Sub1` and `Sub2` in one class, you get a conflict, since the type `A` is supposed to be equal to both `String` and `Int`. If you do +not detect the conflict and assume the equalities at face value you +get `String = A = Int`, hence by transitivity `String = Int`! Once you +are that far, you can of course engineer all sorts of situations where +a program will typecheck but cause a wrong execution at runtime. In +other words, type soundness is violated. + +Now, the problem is that one cannot always detect these +inconsistencies, at least not by a local analysis that does not need +to look at the whole program. What's worse, once you have an +inconsistent set of definitions you can use these definitions to +"prove" their own consistency - much like an mathematical theory that +assumes `true = false` can "prove" every proposition including its own +correctness. + +The crucial reason why type soundness still holds is this: If one +compares `T` with an alias, one does so always relative to some _path_ +`x` that refers to the object containing `T`. So it's really `x.T = +Int`. Now, we can show that during evaluation every such path refers +to some object that was created with a `new`, and that, furthermore, +every such object has consistent type definitions. The tricky bit is +to carefully distinguish between the full typing rules, which allow +inconsistencies, and the typing rules arising from runtime values, +which do not. + +## Why is This Important? + +There are at least four reasons why insights obtained in the DOT +project are important. + + 1. They give us a well-founded explanation of _nominal typing_. + Nominal typing means that a type is distinguished from others + simply by having a different name. + For instance, given two trait definitions + + trait A extends AnyRef { def f: Int } + trait B extends AnyRef { def f: Int } + + we consider `A` and `B` to be different types, even though both + traits have the same parents and both define the same members. + The opposite of + nominal typing is structural typing, which treats types + that have the same structure as being the same. Most programming + languages are at least in part nominal whereas most formal type systems, + including DOT, are structural. But the abstract types in DOT + provide a way to express nominal types such as classes and traits. + The Wadlerfest paper contains examples that show how + one can express classes for standard types such as `Boolean` and `List` in DOT. + + 2. They give us a stable basis on which we can study richer languages + that resemble Scala more closely. For instance, we can encode + type parameters as type members of objects in DOT. This encoding + can give us a better understanding of the interactions of + subtyping and generics. It can explain why variance rules + are the way they are and what the precise typing rules for + wildcard parameters `[_ <: T]`, `[_ >: T]` should be. + + 3. DOT also provides a blueprint for Scala compilation. The new Scala + compiler _dotty_ has internal data structures that closely resemble DOT. + In particular, type parameters are immediately mapped to type members, + in the way we propose to encode them also in the calculus. + + 4. Finally, the proof principles explored in the DOT work give us guidelines + to assess and treat other possible soundness issues. We now know much + better what conditions must be fulfilled to ensure type soundness. + This lets us put other constructs of the Scala language to the test, + either to increase our confidence that they are indeed sound, or + to show that they are unsound. In my next blog I will + present some of the things we have discovered by that exercise. + From c2dc1bc97412e5723f40994d7b15d61bd4a78dcd Mon Sep 17 00:00:00 2001 From: Martin Odersky Date: Wed, 3 Feb 2016 13:17:46 +0100 Subject: [PATCH 3/5] fix typo --- blog/_posts/2016-02-03-essence-of-scala.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/blog/_posts/2016-02-03-essence-of-scala.md b/blog/_posts/2016-02-03-essence-of-scala.md index 457e9c1a6..7ed07f90b 100644 --- a/blog/_posts/2016-02-03-essence-of-scala.md +++ b/blog/_posts/2016-02-03-essence-of-scala.md @@ -43,7 +43,7 @@ by DOT is described by the following abstract grammar: { def a: T } Method declaration { type T >: T1 <: T2 } Type declaration T1 & T2 Intersection - { x => T } Recursio + { x => T } Recursion The grammar uses several kinds of names: From 05d4d0fb7fe03d5a7ee25bc29e1b888037766693 Mon Sep 17 00:00:00 2001 From: Martin Odersky Date: Wed, 3 Feb 2016 22:09:20 +0100 Subject: [PATCH 4/5] adress reviewers comments --- blog/_posts/2016-02-03-essence-of-scala.md | 18 ++++++++++-------- 1 file changed, 10 insertions(+), 8 deletions(-) diff --git a/blog/_posts/2016-02-03-essence-of-scala.md b/blog/_posts/2016-02-03-essence-of-scala.md index 537c32768..ecf989fe7 100644 --- a/blog/_posts/2016-02-03-essence-of-scala.md +++ b/blog/_posts/2016-02-03-essence-of-scala.md @@ -13,10 +13,10 @@ it's DOT, the calculus of dependent object types, that underlies Scala. A [paper on DOT](http://infoscience.epfl.ch/record/215280) will be presented in April at [Wadlerfest](http://events.inf.ed.ac.uk/wf2016), an event celebrating Phil Wadler's 60th birthday. There's also a prior -[technical report](http://arxiv.org/abs/1510.05216v1) by Tiark Rompf -and Nada Amin describing a slightly different version of the -calculus. Each version describes a proof of type soundness that has -been machine-checked for correctness. +technical report (["From F to DOT"](http://arxiv.org/abs/1510.05216)) +by Tiark Rompf and Nada Amin describing a slightly different version +of the calculus. Each paper describes a proof of type soundness that +has been machine-checked for correctness. ## The DOT calculus @@ -64,7 +64,7 @@ form](https://en.wikipedia.org/wiki/A-normal_form), that is, every intermediate term is abstracted out in a local definition. The following _type soundness_ property was shown in each case with a -mechanized proof: +mechanized, (i.e. machine-checked) proof: > If a term `t` has type `T`, and the evaluation of `t` terminates, then the result of the evaluation will be a value `v` of type `T`. @@ -72,7 +72,7 @@ mechanized proof: ## Difficulties Formulating the precise soundness theorem and proving it was unexpectedly hard, -because it uncovered a problem complex that had not been +because it uncovered some technical challenges that had not been studied in depth before. In DOT - as well as in many programming languages - you can have conflicting definitions. For instance you might have an abstract type declaration in a base class with two conflicting aliases in subclasses: @@ -80,8 +80,10 @@ type declaration in a base class with two conflicting aliases in subclasses: trait Base { type A } trait Sub1 extends Base { type A = String } trait Sub2 extends Base { type A = Int } + trait Bad extends Sub1 with Sub2 -Now, if you combine `Sub1` and `Sub2` in one class, you get a conflict, since the type `A` is supposed to be equal to both `String` and `Int`. If you do +Now, if you combine `Sub1` and `Sub2` in trait `Bad` you get a conflict, +since the type `A` is supposed to be equal to both `String` and `Int`. If you do not detect the conflict and assume the equalities at face value you get `String = A = Int`, hence by transitivity `String = Int`! Once you are that far, you can of course engineer all sorts of situations where @@ -149,5 +151,5 @@ project are important. This lets us put other constructs of the Scala language to the test, either to increase our confidence that they are indeed sound, or to show that they are unsound. In my next blog I will - present some of the issues we have discovered by that exercise. + present some of the issues we have discovered by that exercise. From a5a87bc5cfe3d8c3810046d06e48ff01cc6c12ab Mon Sep 17 00:00:00 2001 From: Martin Odersky Date: Thu, 4 Feb 2016 17:28:18 +0100 Subject: [PATCH 5/5] drop big-step/small-step --- blog/_posts/2016-02-03-essence-of-scala.md | 22 ++++++---------------- 1 file changed, 6 insertions(+), 16 deletions(-) diff --git a/blog/_posts/2016-02-03-essence-of-scala.md b/blog/_posts/2016-02-03-essence-of-scala.md index ecf989fe7..b974f5646 100644 --- a/blog/_posts/2016-02-03-essence-of-scala.md +++ b/blog/_posts/2016-02-03-essence-of-scala.md @@ -13,7 +13,7 @@ it's DOT, the calculus of dependent object types, that underlies Scala. A [paper on DOT](http://infoscience.epfl.ch/record/215280) will be presented in April at [Wadlerfest](http://events.inf.ed.ac.uk/wf2016), an event celebrating Phil Wadler's 60th birthday. There's also a prior -technical report (["From F to DOT"](http://arxiv.org/abs/1510.05216)) +technical report ([From F to DOT](http://arxiv.org/abs/1510.05216)) by Tiark Rompf and Nada Amin describing a slightly different version of the calculus. Each paper describes a proof of type soundness that has been machine-checked for correctness. @@ -51,20 +51,10 @@ The grammar uses several kinds of names: a for (parameterless) methods A for types -The full calculus adds to this syntax formal _typing rules_ that assign -types `T` to terms `t` and formal _evaluation rules_ that describe how a -program is evaluated. The main difference between the Wadlerfest paper -and the prior technical report lies in the form of these evaluation -rules. The technical report uses a "big step" semantics, i.e. it -defines an interpreter that explains how terms map to -values. The Wadlerfest paper uses a more common "small step" -semantics, i.e. it explains evaluation as step-wise rewritings of -terms. To do this, it needs to restrict terms to be in [A-normal -form](https://en.wikipedia.org/wiki/A-normal_form), that is, every -intermediate term is abstracted out in a local definition. - -The following _type soundness_ property was shown in each case with a -mechanized, (i.e. machine-checked) proof: +The full calculus adds to this syntax formal _typing rules_ that +assign types `T` to terms `t` and formal _evaluation rules_ that +describe how a program is evaluated. The following _type soundness_ +property was shown with a mechanized, (i.e. machine-checked) proof: > If a term `t` has type `T`, and the evaluation of `t` terminates, then the result of the evaluation will be a value `v` of type `T`. @@ -151,5 +141,5 @@ project are important. This lets us put other constructs of the Scala language to the test, either to increase our confidence that they are indeed sound, or to show that they are unsound. In my next blog I will - present some of the issues we have discovered by that exercise. + present some of the issues we have discovered through that exercise.