Description
Working with opaque types, which are based on our existing GADT machinery, I see more and more problems with our existing approach:
This one is known:
- GADT maps are different from constraints, so no unfication is possible between variables defined in one and variables defined in the other.
But these have not been voiced before:
-
Knowledge about GADT bounds has spread far into the core algorithms of the compiler. It's not just subtyping, by now we consult the GADT bounds also
- when computing members of a type
- when computing base types
- when following the underlying type of a type proxy
It's not clear at all whether this is enough, or whether we need to consult GADT maps in more places.
-
GADT maps pose enormous problems for caching. Every update to a GADT map invalidates all caches that contain values whose computation relied on the previous value (or relied on the fact that there was no GADT entry yet). But there is no way to trace these caches. The alternative of not caching any info that could be affected by GADT bounds is unappealing as well, because it looks very hard to figure out what might be affected later on.
So far, we have not seen much fail empirically, because GADTs were not really fully implemented so very little code relied on them. But if we want that to change, we have to take these problems seriously.
An alternative approach
Here's a different implementation scheme which avoids the cited problems (whether it will create new ones remains to be seen).
-
When type checking a case in a match, determine all type parameters with term owners that are referenced from the widened scrutinee type. Call these the GADT parameters. (This step is the same as in the current scheme)
-
Determine the support set of the GADT parameters. This is the smallest set S of term-owned symbols such that:
- every GADT parameter is in S.
- if a symbol x is owned by a term and has a type info T that refers to a symbol in S, then x is in S (for both term and type symbols x).
- Define a map M over the support set as follows:
-
Every GADT parameter in S is mapped to a type variable whose
TypeParamRef
origin has the bounds of the original symbol mapped recursively by M. -
Every other symbol s in S is mapped to a fresh symbol of the same kind as s, whose info is the original info of s mapped recursively by M.
Contexts have a new field
gadtMap
that contains the current map.
-
A match case gets typechecked in a new context, where a new instance of M as computed in the last step is added to the
gadtMap
field and all TypeVars in M s image are added to the constraint. -
When typechecking an identifier in
typedIdent
, map the computed type (always aTermRef
orTypeRef
) using the context's GADT map.[Aside: The mapping would be avoided if we could replace old scopes with new scopes in the context. But this means duplicating potentially a stack of context objects, because GADT parameters might be bound in outer contexts. This looks tricky.]
-
Typecheck pattern against scrutinee as usual. This might further constrain images of GADT parameters in the constraint. After that is done,
- instantiate all GADT TypeVars in M to fresh abstract type symbols, with the bounds copied over and mapped recursively.
- instantiate all pattern bound TypeVars as usual.
- Typecheck the guard and RHS of the case as usual. At the end, check that the type of RHS
conforms to the expected type of the match after mapping the latter with M. If the type of the RHS does not conform to expected type without applying M, insert a cast.