Description
Because quite a few people suggested this and @jasny started implementation work on it, I want to discuss this possible alternative implementation approach.
First, to make it clear what I'm talking about:
With reified generics, there is a single class entry representating a parameterized type, and each class reference stores a pointer to the class entry, as well as a list of type arguments. If you have Pair<A, B>
there will be one class entry for Pair
, which is referenced by Pair<int, int>
, Pair<int, string>
etc.
With monomorphized generics, we instead generate a new class entry for each combination of type arguments. If you have Pair<A, B>
, then Pair<int, int>
and Pair<int, string>
will generate two separate class entries starting from a common template for Pair
.
In languages like C++, one major advantage of monomorphization is that code can be aggressively optimized by specializing it to specific types. While the same is theoretically possible in PHP as well, the optimization potential is many orders of magnitude smaller. The major disadvantage of monomorphization is that it increases code size. Generating an excessively large amount of code with templates, resulting in long compile-times and large binaries is a well known problem in C++.
For PHP, I believe the main appeal of monomorphizing is twofold: First, it may be faster, because generic types can be fully resolved and do not need to be looked up. Second, it seems to simplify the implementation. The reason is that we can keep everything using zend_class_entry*
.
I think that things are not quite as simple as that though. In particular, it is not possible to arrange those generic class entries into an accurate inheritance hierarchy as soon as variant type parameters come into play:
class Traversable<out T> { ... }
class A {}
class B extends A {}
// Traversable<B> instanceof Traversable<A> must be true.
Here, the monomorphized class entry for Traversable<B>
must be a child of Traversable<A>
. More problematically, Traversable<X>
must be a subtype of Traversable<X|anything_else_here>
, and there is a potentially infinite number of such types. Of course, this also forms a lattice, not a simple chain. As such, subtype checking (which is involved everywhere from instanceof, over type checks to inheritance checks) will not be able to work on a simple "class entries only" model anyway.
The other consideration is memory usage, and caching. Copying the entire class including opcodes for each type combination would skyrocket memory usage. Even if we don't copy opcodes (which means that references to generic parameters in them are not early bound), just copying all the method etc entries would also need a significant amount of memory.
Additionally, it is not clear how those monomorphized class entries would be cached. Creating them for each type combination and resolving all types in them should have some pretty hefty overhead. However, as the monomorphized class entry would depend on runtime state (such as the parent class), it would not be possible to place it into opcache SHM. This would only be possible as part of preloading, where having these kinds of dependencies is legal. But we can't make preloading a requirement to make this at least somewhat efficient.