You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Allow some array functions to operate in-place using a peep-hole optimization
This patch essentially implements phpGH-9881 (and more).
Patterns like that seem to be quite common in user code. Therefore it
seems to make sense to make to create a "peep-hole optimization" to
prevent copies of the array.
This patch optimizes the `$x = array_function($x, ...)` pattern and the
`$x = array_function(temporary, ...)` pattern for these functions:
array_merge, array_unique, array_replace, array_diff and array_intersect.
With these limitations:
- array_{diff,intersect} only do the temporary optimization because the
comparison may throw.
- array_{merge,replace} only optimizes CVs for non-recursive case because the
recursive version may throw.
- Only SORT_REGULAR and SORT_NUMERIC optimization for array_unique,
because string conversion may throw.
- array_merge optimization works only if the array is packed and is
without holes.
It works by checking if the array function is immediately followed by an
assignment which overrides the input. In that case we can do the
operation in-place instead of copying the array. Note that this is
limited to CV's at the moment, and can't handle more complex scenarios
like array or object assignments.
The current approach is a bit ugly though: it looks at the VM
instructions from within a function to check if the optimization is
possible, which is a bit odd.
I considered extending opcache as an alternative, but I believe this would
require adding a whole bunch of machinery for only a few users.
Looking at the assembly of prepare_in_place_array_modify_if_possible()
it looks pretty light-weight, about 95 bytes / 29 instructions on my
x86-64 Linux laptop.
** Safety **
There are some array functions which take some sort of copy of the input
array into a temporary C array for sorting.
(e.g. array_unique, array_diff, and array_intersect do this).
Since we no longer take a copy in all cases, we must check if it's
possible that a value is accessed that was already destroyed.
For array_unique: cmpdata will never be removed so that will never reach
refcount 0. And when something is removed, it is the previous value of
cmpdata, not the one user later. So this seems okay.
For array_diff: it loops over the array from left to right and only accesses
a destroyed pointer when behaviour==DIFF_NORMAL. But DIFF_NORMAL only happens
when a user callback is set, which is when we don't use the optimization.
So this is safe too.
For array_intersect: a previous pointer (ptr[0] - 1) is accessed.
But this can't be a destroyed value because the pointer is first moved forward.
** Results **
Using this benchmark script
https://gist.github.com/nielsdos/ae5a2dddc53c61749ae31c908aa78e98
I get:
=== array_merge $a = array_merge($a, ...) ===
before 1.9615 sec
after 0.0015 sec
=== array_merge temporary ===
before 0.0282 sec
after 0.0109 sec
=== array_unique $a = array_unique($a, ...) ===
before 0.3698 sec
after 0.3281 sec
=== array_unique temporary ===
before 0.3814 sec
after 0.3422 sec
=== array_replace $a = array_replace($a, ...) ===
before 0.0148 sec
after 0.0024 sec
=== array_replace temporary ===
before 0.0273 sec
after 0.0129 sec
=== array_intersect temporary
(no significant improvement because dominated by sorting) ===
before 8.9734 sec
after 8.7202 sec
=== array_diff temporary ===
before 0.5510 sec
after 0.5365 sec
0 commit comments