Skip to content

Implementation notes #45

Open
Open
@nikic

Description

@nikic

I've done some initial protoyping for (reified) generics in nikic/php-src#3. I want to discuss some implementation aspects here. Some additional reading on relevant issues:

The prototype focussed on making something minimal work, which is "purely abstract" generics involving simple inheritance:

abstract class WithParam<T> {
    public function method(T $param) {
        var_dump($param);
    }
}
class WithInt extends WithParam<int> {}

function acceptsWithParamInt(WithParam<int> $param) {
    $param->method(42);
}

acceptsWithParamInt(new WithInt);

Note that only the abstract class has unbound type parameters. The concrete class doesn't, so there are no generic parameters on the object instantiation.

Generic arguments are stored using the same structure as union types:

typedef struct {
	uint32_t num_types;
	zend_type types[1];
} zend_type_list;

A big part of implementing generics is replacing existing zend_class_entry* pointers with a structure that holds both zend_class_entry* and the list of generic type arguments:

typedef struct _zend_class_reference {
	zend_class_entry *ce;
	zend_type_list args;
} zend_class_reference;

As most class references are expected to be to classes without generic arguments, we apply the following optimization for that case: Each class entry is allocated with an additional header, that effectively looks like this:

struct real_class_entry {
    zend_class_entry *ce; /* = &self.normal_class_entry */
    uitn32_t num_types; /* = 0 */
    zend_class_entry normal_class_entry;
};

This allows cheaply converting any class entry into a class reference without generic arguments. Additionally, accesses to the class entry through such a reference should be rather cheap, because the header will be in the same cache line as start of the class entry.

Unfortunately, there are quite a few cases where we don't work on class entries, but on string names of classes. Important examples are unresolved type declarations, as well as pre-inheritance parent/interface references and class references inside opcodes. The structure for this type of reference is principally the same, just replacing a class entry with a name:

typedef struct _zend_name_reference {
	zend_string *name;
	zend_type_list args;
} zend_name_reference;

However, in this case we do not have such a convenient way to represent name references without arguments. After all, we do not want to add an additional header to each interned string.

Instead, we define an additional "packed name reference" (abbreviated PNR in macros and function names):

typedef uintptr_t zend_packed_name_reference;

This is an align-tagged union between a simple zend_string* (no arguments) or zend_name_reference (arguments).

A quick grep tells me that php-src has 1500 references to zend_class_entry, and probably quite a few more uses than that through members, and a large part of these need migration to zend_class_reference. I only migrated parent references and type declarations in my prototype.

Apart from storing generic type arguments on class references, we also need to deal with references to generic type parameters, such as in public function method(T $param). Each generic parameter is assigned an parameter ID, which is stored as a new zend_type kind.

We now need to consider how we can access the actual bound type argument efficiently. Consider the following example:

abstract class A<T1> {
    public function method_a(T1 $param) {}
}
abstract class B<T2> extends A<int> {
    public function method_b(T2 $param) {}
}
abstract class C<T3> extends B<string> {
    public function method_c(T3 $param) {}
}
class D extends C<array> {}

Pre-inheritance, the generic parameter ID of T1, T2 and T3 is 0, as they are each the first generic parameter of their respective class. During inheritance, we shift the generic parameter ID by the number of bound generic parameters on the parent. This means that T1 has ID 0, T2 has ID 1 and T3 has ID 2. Class D would store the following array of bound generic arguments:

bound_generic_args[0]: int
bound_generic_args[1]: string
bound_generic_args[2]: array

This includes the necessary elements for parent and grandparent classes as well.

While this scheme is sufficient for the purpose of simple abstract inheritance, as implemented in the protoype, we should also consider how we can extend it to other situations.

First, for interfaces we do not have to worry about resolution of generic type parameters at runtime, as interfaces are only checked during inheritance.

For traits, generic parameter IDs cannot be shifted in this manner, as there may be multiple traits. Instead, we would probably actually perform some form of monomorphization for this particular case, where we copy the trait methods and replace type parameters.

One larger problem is what to do with generic parameter references in opcodes, for example for $x instanceof T. As opcodes are immutable during inheritance, we cannot shift the parameter in this case. I don't have an idea on how to make this really efficient. I believe this will involve looking up the correct offset to use at runtime, for the particular declaring class and the particular calling class.

Finally, this only considers generic arguments bound in the class declaration. What about free generic arguments bound on instantiation? These would be stored in the class reference on the object. A generic parameter lookup would have to picked one or the other using something like:

uint32_t num_bound_generic_args = ce_ref->ce->num_bound_generic_args;
if (param_id < num_bound_generic_args) {
    type = &ce_ref->ce->bound_generic_args[param_id];
} else {
    type = &ce_ref->args.types[param_id - num_bound_generic_args];
}

The alternative is to copy the class arguments in the object class reference as well. This might be a non-trivial increase in memory usage though, if deep hierarchies are involved.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions