Skip to content

Implement implicit move optimisation #11166

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

nielsdos
Copy link
Member

Introduction

This implements a more generic version of the in-place modification I
first tried in #11060. We decided to limit that PR to RC1
optimisations of some array functions only, and not do the in-place
$variable optimisation in that way because of issues.

This patch overcomes those issues and builds on the previous patch.
With this patch, any internal function that supports RC1 optimisations
automatically gets the optimisation for in-place variable modifications.
Contrary to the previous approach, this is compatible with exceptions.
Furthermore, this approach also allows userland functions to benefit
from this optimisation.

e.g. the following code will not take a copy of the array with this
patch, whereas previously it would due to the copy-on-write
characteristic of arrays:

function foo($array) {
  $array[1] = 1;
}
function bar() {
  $array = ...;
  $array = foo($array);
}

Right now the impact on the benchmark suite isn't that high. The reason
is that only a handful of functions within PHP optimise for RC1 cases, and
the array sizes for those cases are fairly small.
When more support for these cases are added, the benefit from this patch
will increase.

I've added a micro benchmark for array operations that shows the effect
of this optimisation.

Implementation

The optimiser already tracks which SSA variables have a value that
doesn't matter with the no_val field. By changing ZEND_SEND_VAR to
redefine op1, we automatically know if the variable will ever be used
again without being overwritten by looking at the no_val field.
If the no_val field is set, the variable may hold a string/array and the
refcount may be 1, we set a flag on the ZEND_SEND_VAR(_EX) opline to
indicate that it may avoid a copy. The flag is stored in extended_value.

There are two new VM type spec handlers for this that check for the
flag: one for ZEND_SEND_VAR and one for ZEND_SEND_VAR_EX.

Limitations

  • The optimisation isn't performed on arguments passed to other
    functions. This is because the optimisation would be externally
    visible through backtraces, which is undesirable.
    Unfortunately, this is also the case where a lot of optimisation
    opportunity lies. Nonetheless, even with this limitation it seems like
    the optimisation can help a lot.
  • The optimisation does not apply to functions using indirect variable
    access (e.g. variable-variables, compact()) and vararg functions.

Benchmark results

Zend/array_micro_bench.php without this optimisation:

empty_loop         0.000
array_merge        3.405    3.405
array_merge temp   2.796    2.796
array_merge nest   2.720    2.719
array_unique       2.014    2.014
user_func          0.039    0.039
------------------------
Total             10.975

Zend/array_micro_bench.php with this optimisation:

empty_loop         0.000
array_merge        0.002    0.002
array_merge temp   1.181    1.181
array_merge nest   2.710    2.710
array_unique       1.973    1.973
user_func          0.024    0.024
------------------------
Total              5.891

As you can see there is virtually no difference for the array_merge nest case. This is expected: the optimisation cannot be done because the original array is still required for the addition after the array_merge function is executed.
However, for the other cases there is a notable difference. The performance of array_unique is dominated by sorting, so we don't see a very big win there, but still a win. For user_func we see a small win, but the total time is not a lot to begin with because the array is small.

I can see a nice benefit from this optimisation in data-heavy processes (e.g. ETL processes).

Acknowledgements

This optimisation is inspired by C++'s & Rust's "move semantics", hence the name: implicitly adding moves.

During the development of this patch I had some conversations about the implementation with @iluuu1994 , so I would like to thank him as well.

## Introduction

This implements a more generic version of the in-place modification I
first tried in phpGH-11060. We decided to limit that PR to RC1
optimisations of some array functions only, and not do the in-place
$variable optimisation in that way because of issues.

This patch overcomes those issues and builds on the previous one.
With this patch, any internal function that supports RC1 optimisations
automatically gets the optimisation for in-place variable modifications.
Contrary to the previous approach, this is compatible with exceptions.
Furthermore, this approach also allows userland functions to benefit
from this optimisation.

e.g. the following code will not take a copy of the array with this
patch, whereas previously it would due to the copy-on-write
characteristic of arrays:
```
function foo($array) {
  $array[1] = 1;
}
function bar() {
  $array = ...;
  $array = foo($array);
}
```

Right now the impact on the benchmark suite isn't that high. The reason
is that only a handful of functions within PHP optimise for RC1 cases, and
the array sizes for those cases are fairly small.
When more support for these cases are added, the benefit from this patch
will increase.

I've added a micro benchmark for array operations that shows the effect
of this optimisation.

## Implementation

The optimiser already tracks which SSA variables have a value that
doesn't matter with the no_val field. By changing ZEND_SEND_VAR to
redefine op1, we automatically know if the variable will ever be used
again without being overwritten by looking at the no_val field.
If the no_val field is set, the variable may hold a string/array and the
refcount may be 1, we set a flag on the ZEND_SEND_VAR(_EX) opline to
indicate that it may avoid a copy. The flag is stored in extended_value.

There are two new VM type spec handlers for this that check for the
flag: one for ZEND_SEND_VAR and one for ZEND_SEND_VAR_EX.

## Limitations

* The optimisation isn't performed on arguments passed to other
  functions. This is because the optimisation would be externally
  visible through backtraces, which is undesirable.
  Unfortunately, this is also the case where a lot of optimisation
  opportunity lies. Nonetheless, even with this limitation it seems like
  the optimisation can help a lot.
* The optimisation does not apply to functions using indirect variable
  access (e.g. variable-variables, compact()) and vararg functions.
Copy link
Member

@dstogov dstogov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see a mesh with ZEND_SEND_VAR_EX support. It's supported in some places but not in the others. It seems you never create op1_def SSA var for it, but then expect it in zend_dfa_optimize_send_copies() and ZEND_SEND_VAR_EX_SIMPLE_EXT.

I see a very slight improvement on Symfony and PHP-Parser, but more serious degradation on Wordpress (it's small anyway - 0.2% measured with callgrind)

Comment on lines +1053 to +1058
SKIP_IF_TOP(op1);

if (opline->opcode == ZEND_SEND_VAR) {
SET_RESULT(op1, op1);
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also add support for ZEND_SEND_VAR_EX?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably. That's somewhat separate from this PR, so do you want me to do it here or in another PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. SCCP didn't support ZEND_SEND_VAR_EX before. So this is not a problem.

@@ -1680,3 +1684,13 @@ void zend_ssa_rename_var_uses(zend_ssa *ssa, int old, int new, bool update_types
old_var->phi_use_chain = NULL;
}
/* }}} */

void zend_ssa_replace_op1_def_op1_use(zend_ssa *ssa, zend_ssa_op *ssa_op)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name is not clear. May be it should be zend_ssa_replace_op1_def_by_op1_use. May be this should be a more general function.

zend_op *op = call->arg_info[i].opline;
zend_ssa_op *this_ssa_op = &ssa->ops[op - op_array->opcodes];

if (op->opcode == ZEND_SEND_VAR && this_ssa_op->op1_def >= 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose ZEND_SEND_VAR_EX is missed, because it can't be a part of a call to a "known" target.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, compile-time-calls cannot be done with ZEND_SEND_VAR_EX. I can add a comment to explain why this is though.

Copy link
Member

@dstogov dstogov May 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add a comment to explain why this is though.

No problem. I just liked to be sure.

Comment on lines 2349 to 2354
/* Removing the def in try_remove_definition() may reduce optimisation opportunities.
* We want to keep the no_val definition until we actually replace it with a constant. */
if (opline->opcode == ZEND_SEND_VAR && ssa_op->op1_use == i && ssa_op->op1_def >= 0) {
zend_ssa_replace_op1_def_op1_use(ssa, ssa_op);
opline->extended_value = 0;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be done for SEND_VAR_EX as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this, I can't do it. This is because SEND_VAR_EX can pass a variable as a reference. Then if we replace the def with the use then the variable at the callsite will become a ref. See for example Zend/tests/method_argument_binding.phpt

@dstogov
Copy link
Member

dstogov commented May 2, 2023

I see a very slight improvement on Symfony and PHP-Parser, but more serious degradation on Wordpress (it's small anyway - 0.2% measured with callgrind)

This degradation comes from the extra work in Optimizer. So it's not critical.

@nielsdos
Copy link
Member Author

nielsdos commented May 2, 2023

Thanks for the review!

I see a very slight improvement on Symfony and PHP-Parser, but more serious degradation on Wordpress (it's small anyway - 0.2% measured with callgrind)

This degradation comes from the extra work in Optimizer. So it's not critical.

This makes senes to me.
FYI the CI does benchmarking too, and WordPress is included. The CI benchmarking excludes the compilation and optimisation time. There it shows a slight improvement for WordPress.

It seems you never create op1_def SSA var for it, but then expect it in zend_dfa_optimize_send_copies() and ZEND_SEND_VAR_EX_SIMPLE_EXT.

This is already done in the current code :

case ZEND_SEND_VAR_EX:
case ZEND_SEND_FUNC_ARG:
case ZEND_SEND_REF:
case ZEND_SEND_UNPACK:
case ZEND_FE_RESET_RW:
case ZEND_MAKE_REF:
case ZEND_PRE_INC_OBJ:
case ZEND_PRE_DEC_OBJ:
case ZEND_POST_INC_OBJ:
case ZEND_POST_DEC_OBJ:
case ZEND_UNSET_DIM:
case ZEND_UNSET_OBJ:
case ZEND_FETCH_DIM_W:
case ZEND_FETCH_DIM_RW:
case ZEND_FETCH_DIM_FUNC_ARG:
case ZEND_FETCH_DIM_UNSET:
case ZEND_FETCH_LIST_W:
if (opline->op1_type == IS_CV) {
goto add_op1_def;
}
break;

Am I missing something?

int ssa_cv = ssa_op->op1_use;

/* Argument move must not be observable in backtraces */
if (ssa->vars[ssa_cv].var < op_array->num_args) {
Copy link
Contributor

@mvorisek mvorisek May 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much is this a problem - without it, will it corrupt something? Backtraces does normally create a CoW copy [1], so removing this condition can improve performance.

[1] https://3v4l.org/nbiOF/rfc#vgit.master_jit and https://3v4l.org/gSHDb/rfc#vgit.master

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does not cause memory corruption, but causes a semantic change.

Let's say you remove the condition, then the following can happen:
If you have $a = array_merge($a, ...); where $a is an argument, and the call causes an exception, then $a will be NULL in the backtrace because its value has been moved into the function call argument.

@dstogov
Copy link
Member

dstogov commented May 2, 2023

Am I missing something?

No. I missed that. Sorry.

@nielsdos
Copy link
Member Author

nielsdos commented May 2, 2023

I addressed the review comments.
I left the SCCP SET_RESULT(op1, op1) suggestion for SEND_VAR_EX for another time (#11166 (comment)).

I put one more additional comment here: #11166 (comment)

@dstogov
Copy link
Member

dstogov commented May 2, 2023

I afraid, this patch may bring many problems.
For example the following code changes the output.

<?php
class Bomb {
    function __destruct() {
        var_dump(2);
    }
}
function foo($array) {
  $array[1] = 1;
}
function bar() {
  $array = [new Bomb];
  foo($array);
  var_dump(1);
  $array = null;
}
bar();

@nielsdos
Copy link
Member Author

nielsdos commented May 2, 2023

I see, the destructors execute earlier. In fact I remember Ilija mentioning something like this but I forgot about it.
I don't see a way around that issue :/. In general we don't know when it may be destroyed and when not.
And while we could check at compile time if the type is "array of long" etc, this only works within one function, so the remaining benefit would be negligible...

@iluuu1994
Copy link
Member

@dstogov Do you know of a particular use-case where this would lead to issues? Delaying can also be caused by internal cycles, like storing a non-static closure inside a property, so relying on this behavior already seems a bit risky. I do admit that the performance win is a bit smaller than we have hoped...

@dstogov
Copy link
Member

dstogov commented May 3, 2023

@iluuu1994 I didn't try to dig for other problems, but I expect them.

  1. We "unfairly" change the value of potentially visible CV.
  2. Destruction order is changed (this may cause behaviour changes as I demonstrated above)
  3. Type inference is changed to reflect this "unfair" value (this may cause issues in JIT)
  4. SSA construction is changed for SEND_VAR[_EX] (this may cause issues in Optimizer)

@iluuu1994
Copy link
Member

We "unfairly" change the value of potentially visible CV.

This shouldn't be a concern, as this optimization isn't performed on "main" (due to globals or inherited scope), or functions utilizing functions that access/modify local scope (e.g. extract).

Destruction order is changed (this may cause behaviour changes as I demonstrated above)

I see. My question was more whether there are real cases where destructor order. I understand this is hard to answer definitively.

As for the JIT and optimizer issues, I don't have a well enough picture to argue on those. 🙂

@bwoebi
Copy link
Member

bwoebi commented May 8, 2023

The destructor ordering is the only fundamental issue I could think of when I read this patch a week ago.
All these sorts of move semantics probably just fundamentally need redefinition of liveness guarantees.

I think the move semantics are valuable enough to justify rethinking these. Just like a C compiler will move values off the stack if they're not going to be used anymore. E.g. C# has the fixed statement, which has for effect that the variable will never be destroyed within that function (or scope the fixed applies to). (Yes, that's entering RFC territory, but TBF it might not be that a bad thing to explicitly indicate locations where within a function you rely on destructors not executing early.)

This PR can probably not merged as is due to the destructor behaviour change, so @nielsdos I'll tag the PR as requires RFC, unless you can come up with a magical way not exposing that issue. :-D

@morrisonlevi
Copy link
Contributor

We don't make guarantees about destructor order anyway, do we? I think this optimization could be done in PHP 9.

@nielsdos
Copy link
Member Author

We don't make guarantees about destructor order anyway, do we? I think this optimization could be done in PHP 9.

No idea. But Hyrum's law may apply. I learned that this could be a significant break to Wikimedia as they have a class to defer work that relies on deterministic destruction

@bwoebi
Copy link
Member

bwoebi commented Oct 21, 2024

@nielsdos I suppose, if we got type tracking in arrays, we could trivially exclude arrays with objects&resources...
Anyway, move optimizations are only relevant for functions actually altering the input value. So currently these functions would anyway do a copy. So, we likely could afford the O(n) check for "contains object (with destructor) or array" - and if yes, set a flag to avoid repeated checks. I would assume that this would still be a win. ... But it would involve cooperation of the individual functions (e.g. #[ProbablyMutatesArg]) instead of being a generic approach.

We technically could land this optimization for strings already today though (or for arrays where we might statically determine to not contain objects), even if not including arrays yet, it might be making an impact already (e.g. str_replace with equally sized search and replace, or strtoupper/lower etc.).

@nielsdos
Copy link
Member Author

Even for strings, there's another subtle issue though.
This works by defining a new SSA variable for a CV so we can define a new type for the CV. This breaks a possible optimization for the JIT: when a CV's type stays invariant we never need type stores and can definitely keep the variable in a register, but this interferes with what this PR does.

@bwoebi
Copy link
Member

bwoebi commented Oct 21, 2024

@nielsdos You might assign it the empty string (i.e. zend_empty_string) upon move, which will never be freed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants