|
| 1 | +#################### |
| 2 | + Reference counting |
| 3 | +#################### |
| 4 | + |
| 5 | +In languages like C, when you need memory for storing data for an indefinite period of time or in a |
| 6 | +large amount, you call ``malloc`` and ``free`` to acquire and release blocks of memory of some size. |
| 7 | +This sounds simple on the surface but turns out to be quite tricky, mainly because the data may not |
| 8 | +be freed for as long as it is used, anywhere in the program. Sometimes, this makes it unclear who is |
| 9 | +responsible for freeing the memory, and when to do so. Failure to handle this correctly may result |
| 10 | +in use-after-free, double-free, and memory leaks. |
| 11 | + |
| 12 | +Very rarely do you have to think about memory management when working in PHP. The engine takes care |
| 13 | +of this for you by tracking which values are no longer needed. It does this by assigning a reference |
| 14 | +count to each value, often abbreviated as refcount or RC. Whenever a reference to a value is passed |
| 15 | +to somebody else, its reference count is increased to indicate the value is now used by another |
| 16 | +party. When the party no longer needs the value, it is responsible for decreasing the reference |
| 17 | +count. Once the reference count reaches zero, we know the value is no longer needed anywhere in the |
| 18 | +program, and that it may be freed. |
| 19 | + |
| 20 | +.. code:: php |
| 21 | +
|
| 22 | + $a = new stdClass; // RC 1 |
| 23 | + $b = $a; // RC 2 |
| 24 | + unset($a); // RC 1 |
| 25 | + unset($b); // RC 0, free |
| 26 | +
|
| 27 | +Reference counting is needed for types that store auxiliary data, which are the following: |
| 28 | + |
| 29 | +- Strings |
| 30 | +- Arrays |
| 31 | +- Objects |
| 32 | +- References |
| 33 | +- Resources |
| 34 | + |
| 35 | +These are either reference types (objects, references and resources) or they are large types that |
| 36 | +don't fit in a single ``zend_value`` directly (strings, arrays). More simple types either don't |
| 37 | +store a value at all (``null``, ``false``, ``true``) or their value is small enough to fit directly |
| 38 | +in ``zend_value`` and copied when passed somewhere else (``int``, ``float``). |
| 39 | + |
| 40 | +All of the reference counted types share a common initial struct sequence. |
| 41 | + |
| 42 | +.. code:: c |
| 43 | +
|
| 44 | + typedef struct _zend_refcounted_h { |
| 45 | + uint32_t refcount; /* reference counter 32-bit */ |
| 46 | + union { |
| 47 | + uint32_t type_info; |
| 48 | + } u; |
| 49 | + } zend_refcounted_h; |
| 50 | +
|
| 51 | + struct _zend_string { |
| 52 | + zend_refcounted_h gc; |
| 53 | + // ... |
| 54 | + }; |
| 55 | +
|
| 56 | + struct _zend_array { |
| 57 | + zend_refcounted_h gc; |
| 58 | + // ... |
| 59 | + }; |
| 60 | +
|
| 61 | +This explains the ``zval.value.counted`` union member in the ``zval`` struct we saw in the ``zval`` |
| 62 | +chapter. It refers to the initial ``gc`` field of any reference counted type, in case we don't care |
| 63 | +about which concrete type we're dealing with. |
| 64 | + |
| 65 | +The ``zend_refcounted_h`` struct is simple. It contains the reference count, and a ``type_info`` |
| 66 | +field that repeats some of the type information that is also stored in the ``zval``, for situations |
| 67 | +where we're not dealing with a ``zval`` directly. It also stores some additional fields, described |
| 68 | +under `GC flags`_. |
| 69 | + |
| 70 | +******** |
| 71 | + Macros |
| 72 | +******** |
| 73 | + |
| 74 | +As with ``zval``, ``zend_refcounted_h`` members should not be accessed directly. Instead, you should |
| 75 | +use the provided macros. There are macros that work with reference counted types directly, prefixed |
| 76 | +with ``GC_``, or macros that work on ``zval`` values, usually prefixed with ``Z_``. Unfortunately, |
| 77 | +naming is not always consistent. |
| 78 | + |
| 79 | +.. list-table:: ``zval`` macros |
| 80 | + :header-rows: 1 |
| 81 | + |
| 82 | + - - Macro |
| 83 | + - Non-RC [#non-rc]_ |
| 84 | + - Description |
| 85 | + |
| 86 | + - - ``Z_REFCOUNT[_P]`` |
| 87 | + - No |
| 88 | + - Returns the reference count. |
| 89 | + |
| 90 | + - - ``Z_ADDREF[_P]`` |
| 91 | + - No |
| 92 | + - Increases the reference count. |
| 93 | + |
| 94 | + - - ``Z_TRY_ADDREF[_P]`` |
| 95 | + - Yes |
| 96 | + - Increases the reference count. May be called on any ``zval``. |
| 97 | + |
| 98 | + - - ``zval_ptr_dtor`` |
| 99 | + - Yes |
| 100 | + - Decreases the reference count and frees the value if the reference count reaches zero. |
| 101 | + |
| 102 | + - - ``Z_DELREF[_P]`` |
| 103 | + - No |
| 104 | + - Decreases the reference count. Note that this will not actually free the value if the |
| 105 | + reference count reaches zero. You should usually use ``zval_ptr_dtor`` instead. |
| 106 | + |
| 107 | + - - ``Z_TRY_DELREF[_P]`` |
| 108 | + - Yes |
| 109 | + - Decreases the reference count. Note that this will not actually free the value if the |
| 110 | + reference count reaches zero. You should usually use ``zval_ptr_dtor`` instead. |
| 111 | + |
| 112 | +.. [#non-rc] |
| 113 | +
|
| 114 | + Whether the macro works with non-reference counted types. If it does, the operation is usually a |
| 115 | + no-op. If it does not, using the macro on these values is undefined behavior. |
| 116 | +
|
| 117 | +.. list-table:: ``zend_refcounted_h`` macros |
| 118 | + :header-rows: 1 |
| 119 | + |
| 120 | + - - Macro |
| 121 | + - Immutable [#immutable]_ |
| 122 | + - Description |
| 123 | + |
| 124 | + - - ``GC_REFCOUNT[_P]`` |
| 125 | + - Yes |
| 126 | + - Returns the reference count. |
| 127 | + |
| 128 | + - - ``GC_ADDREF[_P]`` |
| 129 | + - No |
| 130 | + - Increases the reference count. |
| 131 | + |
| 132 | + - - ``GC_TRY_ADDREF[_P]`` |
| 133 | + - Yes |
| 134 | + - Increases the reference count. |
| 135 | + |
| 136 | + - - ``GC_DTOR[_P]`` |
| 137 | + - Yes |
| 138 | + - Decreases the reference count and frees the value if the reference count reaches zero. |
| 139 | + |
| 140 | + - - ``GC_DELREF[_P]`` |
| 141 | + - No |
| 142 | + - Decreases the reference count. Note that this will not actually free the value if the |
| 143 | + reference count reaches zero. You should usually use ``GC_DTOR_[P]`` instead. |
| 144 | + |
| 145 | + - - ``GC_TRY_DELREF[_P]`` |
| 146 | + - Yes |
| 147 | + - Decreases the reference count. Note that this will not actually free the value if the |
| 148 | + reference count reaches zero. You should usually use ``GC_DTOR_[P]`` instead. |
| 149 | + |
| 150 | +.. [#immutable] |
| 151 | +
|
| 152 | + Whether the macro works with immutable types, described under `Immutable reference counted types`_. |
| 153 | +
|
| 154 | +*********************************** |
| 155 | + Immutable reference counted types |
| 156 | +*********************************** |
| 157 | + |
| 158 | +Sometimes, even a reference counted type is not reference counted. When PHP runs in a multi-process |
| 159 | +or multi-threaded environment with opcache enabled, it shares some common values between processes |
| 160 | +or threads to reduce memory consumption. As you may know, sharing memory between processes or |
| 161 | +threads can be tricky and requires special care when modifying values. In particular, modification |
| 162 | +usually requires exclusive access to the memory so that the other processes or threads wait until |
| 163 | +the value is done being updated. In this case, this synchronization is avoided by making the value |
| 164 | +immutable and never modifying the reference count. Such values will receive the ``GC_IMMUTABLE`` |
| 165 | +flag in their ``gc->u.type_info`` field. |
| 166 | + |
| 167 | +Some macros like ``GC_TRY_ADDREF`` will guard against immutable values. You should not use immutable |
| 168 | +values on some macros, like ``GC_ADDREF``. This will result in undefined behavior, because the macro |
| 169 | +will not check whether the value is immutable before performing the reference count modifications. |
| 170 | +You may execute PHP with the ``-d opcache.protect_memory=1`` flag to mark the shared memory as |
| 171 | +read-only and trigger a hardware exception if the code accidentally attempts to modify it. |
| 172 | + |
| 173 | +***************** |
| 174 | + Cycle collector |
| 175 | +***************** |
| 176 | + |
| 177 | +Sometimes, reference counting is not enough. Consider the following example: |
| 178 | + |
| 179 | +.. code:: php |
| 180 | +
|
| 181 | + $a = new stdClass; |
| 182 | + $b = new stdClass; |
| 183 | + $a->b = $b; |
| 184 | + $b->a = $a; |
| 185 | + unset($a); |
| 186 | + unset($b); |
| 187 | +
|
| 188 | +When this code finishes, the reference count of both ``$a`` and ``$b`` will still be 1, as they |
| 189 | +reference each other. This is called a reference cycle. |
| 190 | + |
| 191 | +PHP implements a cycle collector that detects cycles and frees values that are only reachable |
| 192 | +through their own references. The cycle collector will record values that may be involved in a |
| 193 | +cycle, and run when this buffer becomes full. It is also possible to invoke it explicitly by calling |
| 194 | +the ``gc_collect_cycles()`` function. The cycle collector design is described in the `Cycle |
| 195 | +collector <todo>`_ chapter. |
| 196 | + |
| 197 | +********** |
| 198 | + GC flags |
| 199 | +********** |
| 200 | + |
| 201 | +.. code:: c |
| 202 | +
|
| 203 | + /* zval_gc_flags(zval.value->gc.u.type_info) (common flags) */ |
| 204 | + #define GC_NOT_COLLECTABLE (1<<4) |
| 205 | + #define GC_PROTECTED (1<<5) /* used for recursion detection */ |
| 206 | + #define GC_IMMUTABLE (1<<6) /* can't be changed in place */ |
| 207 | + #define GC_PERSISTENT (1<<7) /* allocated using malloc */ |
| 208 | + #define GC_PERSISTENT_LOCAL (1<<8) /* persistent, but thread-local */ |
| 209 | +
|
| 210 | +The ``GC_NOT_COLLECTABLE`` flag indicates that the value may not be involved in a reference cycle. |
| 211 | +This allows for a fast way to detect values that don't need to be added to the cycle collector |
| 212 | +buffer. Only arrays and objects may actually be involved in reference cycles. |
| 213 | + |
| 214 | +The ``GC_PROTECTED`` flag is used to protect against recursion in various internal functions. For |
| 215 | +example, ``var_dump`` recursively prints the contents of values, and marks visited values with the |
| 216 | +``GC_PROTECTED`` flag. If the value is recursive, it prevents the same value from being visited |
| 217 | +again. |
| 218 | + |
| 219 | +``GC_IMMUTABLE`` has been discussed in `Immutable reference counted types`_. |
| 220 | + |
| 221 | +The ``GC_PERSISTENT`` flag indicates that the value was allocated using ``malloc``, instead of PHPs |
| 222 | +own allocator. Usually, such values are alive for the entire lifetime of the process, instead of |
| 223 | +being freed at the end of the request. See the `Zend allocator <todo>`_ chapter for more |
| 224 | +information. |
| 225 | + |
| 226 | +The ``GC_PERSISTENT_LOCAL`` flag indicates that a ``CG_PERSISTENT`` value is only accessibly in one |
| 227 | +thread, and is thus still safe to modify. This flag is only used in debug builds to satisfy an |
| 228 | +``assert``. |
0 commit comments