Skip to content

Commit 7759f7f

Browse files
committed
Allow some array functions to operate in-place using a peep-hole optimization
This patch essentially implements phpGH-9881 (and more). Patterns like that seem to be quite common in user code. Therefore it seems to make sense to make to create a "peep-hole optimization" to prevent copies of the array. This patch optimizes the `$x = array_function($x, ...)` pattern and the `$x = array_function(temporary, ...)` pattern for these functions: array_merge, array_unique, array_replace, array_diff and array_intersect. With these limitations: - array_{diff,intersect} only do the temporary optimization because the comparison may throw. - array_replace only optimizes CVs for non-recursive case because the recursive version may throw. - Only SORT_REGULAR and SORT_NUMERIC optimization for array_unique, because string conversion may throw. - array_merge optimization works only if the array is packed and is without holes. It works by checking if the array function is immediately followed by an assignment which overrides the input. In that case we can do the operation in-place instead of copying the array. Note that this is limited to CV's at the moment, and can't handle more complex scenarios like array or object assignments. The current approach is a bit ugly though: it looks at the VM instructions from within a function to check if the optimization is possible, which is a bit odd. I considered extending opcache as an alternative, but I believe this would require adding a whole bunch of machinery for only a few users. Looking at the assembly of prepare_in_place_array_modify_if_possible() it looks pretty light-weight, about 95 bytes / 29 instructions on my x86-64 Linux laptop. ** Safety ** There are some array functions which take some sort of copy of the input array into a temporary C array for sorting. (e.g. array_unique, array_diff, and array_intersect do this). Since we no longer take a copy in all cases, we must check if it's possible that a value is accessed that was already destroyed. For array_unique: cmpdata will never be removed so that will never reach refcount 0. And when something is removed, it is the previous value of cmpdata, not the one user later. So this seems okay. For array_diff: it loops over the array from left to right and only accesses a destroyed pointer when behaviour==DIFF_NORMAL. But DIFF_NORMAL only happens when a user callback is set, which is when we don't use the optimization. So this is safe too. For array_intersect: a previous pointer (ptr[0] - 1) is accessed. But this can't be a destroyed value because the pointer is first moved forward. ** Results ** Using this benchmark script https://gist.github.com/nielsdos/ae5a2dddc53c61749ae31c908aa78e98 I get: === array_merge $a = array_merge($a, ...) === before 1.9615 sec after 0.0015 sec === array_merge temporary === before 0.0282 sec after 0.0109 sec === array_unique $a = array_unique($a, ...) === before 0.3698 sec after 0.3281 sec === array_unique temporary === before 0.3814 sec after 0.3422 sec === array_replace $a = array_replace($a, ...) === before 0.0148 sec after 0.0024 sec === array_replace temporary === before 0.0273 sec after 0.0129 sec === array_intersect temporary (no significant improvement because dominated by sorting) === before 8.9734 sec after 8.7202 sec === array_diff temporary === before 0.5510 sec after 0.5365 sec
1 parent 7de83e2 commit 7759f7f

File tree

1 file changed

+111
-22
lines changed

1 file changed

+111
-22
lines changed

ext/standard/array.c

Lines changed: 111 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -3825,6 +3825,59 @@ PHPAPI int php_array_replace_recursive(HashTable *dest, HashTable *src) /* {{{ *
38253825
}
38263826
/* }}} */
38273827

3828+
/* Returns true if it's possible to do an in-place array modification, preventing a costly copy.
3829+
* It also modifies the CV (if applicable) to prevent freeing it upon assigning.
3830+
* If this returns true you need to add a ref at the end of the modification for the return value.
3831+
* safe_for_CV indicates whether the optimization may be applied to CVs. This should be false
3832+
* if the operation may throw for example, because then the in-place modification is visible to
3833+
* the exception catcher. */
3834+
static bool prepare_in_place_array_modify_if_possible(const zend_execute_data *execute_data, const zval *arg, bool safe_for_CV)
3835+
{
3836+
/* 2 refs: the CV and the argument; or 1 ref for a temporary passed as argument */
3837+
uint32_t refcount = Z_REFCOUNT_P(arg);
3838+
if (refcount != 1 && !(refcount == 2 && safe_for_CV)) {
3839+
return false;
3840+
}
3841+
/* Immutable or persistent => no modification allowed */
3842+
if (GC_FLAGS(Z_ARRVAL_P(arg)) & (IS_ARRAY_IMMUTABLE | IS_ARRAY_PERSISTENT)) {
3843+
return false;
3844+
}
3845+
3846+
if (refcount == 2) {
3847+
const zend_op *call_opline = execute_data->prev_execute_data->opline;
3848+
const zend_op *next_opline = call_opline + 1;
3849+
3850+
/* Must be an assignment from the result of the call to a CV */
3851+
if (next_opline->opcode != ZEND_ASSIGN || next_opline->op1_type != IS_CV || next_opline->op2.var != call_opline->result.var) {
3852+
return false;
3853+
}
3854+
3855+
/* Must be an assignment to the same array as the input */
3856+
zval *var = ZEND_CALL_VAR(execute_data->prev_execute_data, next_opline->op1.var);
3857+
if (Z_TYPE_P(var) != IS_ARRAY || Z_ARRVAL_P(arg) != Z_ARRVAL_P(var)) {
3858+
return false;
3859+
}
3860+
3861+
/* Must set the CV to NULL so we don't destroy the array on assignment */
3862+
ZVAL_NULL(var);
3863+
/* Make RC 1 such that the array may be modified */
3864+
GC_DELREF(Z_ARRVAL_P(arg));
3865+
}
3866+
3867+
return true;
3868+
}
3869+
3870+
static bool set_return_value_array_dup_or_in_place(const zend_execute_data *execute_data, const zval *arg, zval *return_value, bool safe_for_CV)
3871+
{
3872+
if (prepare_in_place_array_modify_if_possible(execute_data, arg, safe_for_CV)) {
3873+
RETVAL_ARR(Z_ARRVAL_P(arg));
3874+
return true;
3875+
} else {
3876+
RETVAL_ARR(zend_array_dup(Z_ARRVAL_P(arg)));
3877+
return false;
3878+
}
3879+
}
3880+
38283881
static zend_always_inline void php_array_replace_wrapper(INTERNAL_FUNCTION_PARAMETERS, int recursive) /* {{{ */
38293882
{
38303883
zval *args = NULL;
@@ -3846,10 +3899,11 @@ static zend_always_inline void php_array_replace_wrapper(INTERNAL_FUNCTION_PARAM
38463899
}
38473900
}
38483901

3849-
/* copy first array */
3902+
/* copy first array if necessary */
38503903
arg = args;
3851-
dest = zend_array_dup(Z_ARRVAL_P(arg));
3852-
ZVAL_ARR(return_value, dest);
3904+
bool in_place = set_return_value_array_dup_or_in_place(execute_data, arg, return_value, !recursive);
3905+
dest = Z_ARRVAL_P(return_value);
3906+
38533907
if (recursive) {
38543908
for (i = 1; i < argc; i++) {
38553909
arg = args + i;
@@ -3861,6 +3915,10 @@ static zend_always_inline void php_array_replace_wrapper(INTERNAL_FUNCTION_PARAM
38613915
zend_hash_merge(dest, Z_ARRVAL_P(arg), zval_add_ref, 1);
38623916
}
38633917
}
3918+
3919+
if (in_place) {
3920+
GC_ADDREF(dest);
3921+
}
38643922
}
38653923
/* }}} */
38663924

@@ -3925,22 +3983,34 @@ static zend_always_inline void php_array_merge_wrapper(INTERNAL_FUNCTION_PARAMET
39253983

39263984
arg = args;
39273985
src = Z_ARRVAL_P(arg);
3928-
/* copy first array */
3929-
array_init_size(return_value, count);
3930-
dest = Z_ARRVAL_P(return_value);
3986+
bool in_place = false;
3987+
/* copy first array if necessary */
39313988
if (HT_IS_PACKED(src)) {
3932-
zend_hash_real_init_packed(dest);
3933-
ZEND_HASH_FILL_PACKED(dest) {
3934-
ZEND_HASH_PACKED_FOREACH_VAL(src, src_entry) {
3935-
if (UNEXPECTED(Z_ISREF_P(src_entry) &&
3936-
Z_REFCOUNT_P(src_entry) == 1)) {
3937-
src_entry = Z_REFVAL_P(src_entry);
3938-
}
3939-
Z_TRY_ADDREF_P(src_entry);
3940-
ZEND_HASH_FILL_ADD(src_entry);
3941-
} ZEND_HASH_FOREACH_END();
3942-
} ZEND_HASH_FILL_END();
3989+
/* Note: If it has holes, it might get sequentialized */
3990+
if (HT_IS_WITHOUT_HOLES(src) && prepare_in_place_array_modify_if_possible(execute_data, arg, true)) {
3991+
in_place = true;
3992+
dest = src;
3993+
ZVAL_ARR(return_value, dest);
3994+
} else {
3995+
array_init_size(return_value, count);
3996+
dest = Z_ARRVAL_P(return_value);
3997+
3998+
zend_hash_real_init_packed(dest);
3999+
ZEND_HASH_FILL_PACKED(dest) {
4000+
ZEND_HASH_PACKED_FOREACH_VAL(src, src_entry) {
4001+
if (UNEXPECTED(Z_ISREF_P(src_entry) &&
4002+
Z_REFCOUNT_P(src_entry) == 1)) {
4003+
src_entry = Z_REFVAL_P(src_entry);
4004+
}
4005+
Z_TRY_ADDREF_P(src_entry);
4006+
ZEND_HASH_FILL_ADD(src_entry);
4007+
} ZEND_HASH_FOREACH_END();
4008+
} ZEND_HASH_FILL_END();
4009+
}
39434010
} else {
4011+
array_init_size(return_value, count);
4012+
dest = Z_ARRVAL_P(return_value);
4013+
39444014
zend_string *string_key;
39454015
zend_hash_real_init_mixed(dest);
39464016
ZEND_HASH_MAP_FOREACH_STR_KEY_VAL(src, string_key, src_entry) {
@@ -3967,6 +4037,10 @@ static zend_always_inline void php_array_merge_wrapper(INTERNAL_FUNCTION_PARAMET
39674037
php_array_merge(dest, Z_ARRVAL_P(arg));
39684038
}
39694039
}
4040+
4041+
if (in_place) {
4042+
GC_ADDREF(src);
4043+
}
39704044
}
39714045
/* }}} */
39724046

@@ -4574,7 +4648,8 @@ PHP_FUNCTION(array_unique)
45744648

45754649
cmp = php_get_data_compare_func_unstable(sort_type, 0);
45764650

4577-
RETVAL_ARR(zend_array_dup(Z_ARRVAL_P(array)));
4651+
/* Everything but SORT_REGULAR or SORT_NUMERIC may throw, and is therefore unsafe for the optimization */
4652+
bool in_place = set_return_value_array_dup_or_in_place(execute_data, array, return_value, sort_type == PHP_SORT_REGULAR || sort_type == PHP_SORT_NUMERIC);
45784653

45794654
/* create and sort array with pointers to the target_hash buckets */
45804655
arTmp = pemalloc((Z_ARRVAL_P(array)->nNumOfElements + 1) * sizeof(struct bucketindex), GC_FLAGS(Z_ARRVAL_P(array)) & IS_ARRAY_PERSISTENT);
@@ -4620,6 +4695,10 @@ PHP_FUNCTION(array_unique)
46204695
}
46214696
}
46224697
pefree(arTmp, GC_FLAGS(Z_ARRVAL_P(array)) & IS_ARRAY_PERSISTENT);
4698+
4699+
if (in_place) {
4700+
GC_ADDREF(Z_ARRVAL_P(array));
4701+
}
46234702
}
46244703
/* }}} */
46254704

@@ -4744,6 +4823,7 @@ static void php_array_intersect(INTERNAL_FUNCTION_PARAMETERS, int behavior, int
47444823
zend_fcall_info *fci_key = NULL, *fci_data;
47454824
zend_fcall_info_cache *fci_key_cache = NULL, *fci_data_cache;
47464825
PHP_ARRAY_CMP_FUNC_VARS;
4826+
bool in_place = false;
47474827

47484828
bucket_compare_func_t intersect_key_compare_func;
47494829
bucket_compare_func_t intersect_data_compare_func;
@@ -4870,8 +4950,8 @@ static void php_array_intersect(INTERNAL_FUNCTION_PARAMETERS, int behavior, int
48704950
}
48714951
}
48724952

4873-
/* copy the argument array */
4874-
RETVAL_ARR(zend_array_dup(Z_ARRVAL(args[0])));
4953+
/* copy the argument array if necessary */
4954+
in_place = set_return_value_array_dup_or_in_place(execute_data, &args[0], return_value, false);
48754955

48764956
/* go through the lists and look for common values */
48774957
while (Z_TYPE(ptrs[0]->val) != IS_UNDEF) {
@@ -4982,6 +5062,10 @@ static void php_array_intersect(INTERNAL_FUNCTION_PARAMETERS, int behavior, int
49825062

49835063
efree(ptrs);
49845064
efree(lists);
5065+
5066+
if (in_place) {
5067+
GC_ADDREF(Z_ARRVAL_P(return_value));
5068+
}
49855069
}
49865070
/* }}} */
49875071

@@ -5129,6 +5213,7 @@ static void php_array_diff(INTERNAL_FUNCTION_PARAMETERS, int behavior, int data_
51295213
zend_fcall_info *fci_key = NULL, *fci_data;
51305214
zend_fcall_info_cache *fci_key_cache = NULL, *fci_data_cache;
51315215
PHP_ARRAY_CMP_FUNC_VARS;
5216+
bool in_place = false;
51325217

51335218
bucket_compare_func_t diff_key_compare_func;
51345219
bucket_compare_func_t diff_data_compare_func;
@@ -5255,8 +5340,8 @@ static void php_array_diff(INTERNAL_FUNCTION_PARAMETERS, int behavior, int data_
52555340
}
52565341
}
52575342

5258-
/* copy the argument array */
5259-
RETVAL_ARR(zend_array_dup(Z_ARRVAL(args[0])));
5343+
/* copy the argument array if necessary */
5344+
in_place = set_return_value_array_dup_or_in_place(execute_data, &args[0], return_value, false);
52605345

52615346
/* go through the lists and look for values of ptr[0] that are not in the others */
52625347
while (Z_TYPE(ptrs[0]->val) != IS_UNDEF) {
@@ -5365,6 +5450,10 @@ static void php_array_diff(INTERNAL_FUNCTION_PARAMETERS, int behavior, int data_
53655450

53665451
efree(ptrs);
53675452
efree(lists);
5453+
5454+
if (in_place) {
5455+
GC_ADDREF(Z_ARRVAL_P(return_value));
5456+
}
53685457
}
53695458
/* }}} */
53705459

0 commit comments

Comments
 (0)