Skip to content

Optimize observers #13649

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 22 commits into from
Jun 15, 2024
Merged

Optimize observers #13649

merged 22 commits into from
Jun 15, 2024

Conversation

bwoebi
Copy link
Member

@bwoebi bwoebi commented Mar 9, 2024

Inline the lookup whether a function is observed at all. This strategy is also used for FRAMELESS calls. If the frameless call is observed, we instead allocate a call frame and push the arguments, to call the the function afterwards. Doing so is still a performance benefit as opposed to executing individual INIT_FCALL+SEND_VAL ops. Thus, even if the frameless call turns out to be observed, the call overhead is slightly lower than before. If the internal function is not observed at all, the unavoidable overhead is fetching the FLF zend_function pointer and the run-time cache needs to be inspected.

As part of this work, it turned out to be most viable to put the result operand on the ZEND_OP_DATA instead of ZEND_FRAMELESS_ICALL_3, allowing seamless interoperability with the DO_ICALL opcode. This is a bit unusual in comparison to all other ZEND_OP_DATA usages, but seems to not pose problems overall.

There is also a small issue resolved: trampolines would always use the ZEND_CALL_TRAMPOLINE_SPEC_OBSERVER function due to zend_observer_fcall_op_array_extension being set to -1 too late.

The same inlining of operations has been applied to JIT as well.

Copy link
Member

@iluuu1994 iluuu1994 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not understand what the benefit of moving result to OP_DATA is. Can you elaborate on this point?

I also didn't fully understand why ZEND_OBSERVER_NONE_OBSERVED is needed. From what I understand, zend_observer_remove_handler moves the tail to -1, so if the first handler is unset, no handler should be installed. Tbh the observer.{h,c} code is quite hard to read, but I didn't spend too much time on it.

Did you run some benchmarks? Can you notice a difference for frameless functions, in particular?

return *handler == ZEND_OBSERVER_NONE_OBSERVED;
}

static zend_always_inline bool zend_observer_fcall_has_no_observers(zend_execute_data *execute_data, bool allow_generator, zend_observer_fcall_begin_handler **handler) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Avoid double negation? E.g. !zend_observer_fcall_has_no_observers.

Copy link
Member Author

@bwoebi bwoebi Mar 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The naming problem here is that it's not a binary state, but "no observers", "some observers" and "observers undetermined", and I want to distinguish "no observers" vs "some or undetermined observers". I'm not sure how to better name that.

@bwoebi
Copy link
Member Author

bwoebi commented Mar 9, 2024

I did not understand what the benefit of moving result to OP_DATA is. Can you elaborate on this point?

Hm, have I been unclear in the PR description? "it turned out to be most viable to put the result operand on the ZEND_OP_DATA instead of ZEND_FRAMELESS_ICALL_3, allowing seamless interoperability with the DO_ICALL opcode" - DO_ICALL cannot jump an extra opline (skipping OP_DATA that is) at its end, and accesses result on its opline.

I also didn't fully understand why ZEND_OBSERVER_NONE_OBSERVED is needed

Simply an optimization to avoid reading both start and end handlers when none is set, i.e. saves a read in the hot path of checking whether a function is being observed.

@bwoebi
Copy link
Member Author

bwoebi commented Mar 10, 2024

Regarding rough benchmarking numbers, without debug, without opcache (wall time; v: valgrind instructions), with this script:

Baseline of empty loop: 1 sec (v: 1,007,435,926)

(Using 20000000 iterations for valgrind)

$s = 1; for ($i = 0; $i < 500000000; ++$i) { $a = min($s, 2); }

master, without observers: 2.8 sec (v: 2,727,435,251)
master, with observers: 5.7 sec (v: 5,527,657,794)
this PR, with observers: 2.9 sec (v: 3,047,665,900)

So, yes, the frameless functions are much faster to call.

Now, the same code with pow instead of min (which is not using FLF currently):

master, without observers: 4.9 sec (v: 4,687,435,657)
master, with observers: 6.5 sec (v: 6,387,657,790)
this PR, with observers: 5.9 sec (v: 5,587,666,190)

The generic improvement of inlining the runtime cache check (and just checking one value instead of two) is quite significant on its own already, but using frameless calls just completely blows it out of the water.

I'm also amazed that apparently for frameless calls it uses only 0.1 sec (3% of time) more even though it needs 10% more valgrind instructions.

@bwoebi bwoebi force-pushed the optimize-observers branch 2 times, most recently from 0dd6584 to ee5d6ed Compare March 10, 2024 20:14
@iluuu1994
Copy link
Member

it turned out to be most viable to put the result operand on the ZEND_OP_DATA instead of ZEND_FRAMELESS_ICALL_3, allowing seamless interoperability with the DO_ICALL opcode

I missed this point. Unfortunately, I don't have a better idea without significant complications that would nullify the performance benefit, or duplicate the DO_ICALL handler.

Simply an optimization to avoid reading both start and end handlers when none is set, i.e. saves a read in the hot path of checking whether a function is being observed.

Thanks for the explanation. 🙂 The naming could probably be improved could be improved. Or at least add a short comment to ZEND_OBSERVER_NONE_OBSERVED ("The function has neither begin nor end observers"). I assumed it had to do with holes in the list, but the code indicated that this can't happen.

@bwoebi bwoebi force-pushed the optimize-observers branch from dfbe08a to 72d298a Compare March 10, 2024 21:56
Comment on lines -31 to -33
#define ZEND_OBSERVABLE_FN(function) \
(ZEND_MAP_PTR(function->common.run_time_cache) && !(function->common.fn_flags & ZEND_ACC_CALL_VIA_TRAMPOLINE))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does this check effectively move to?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to zend_observer_fcall_has_no_observers in zend_observer.h

@bwoebi bwoebi force-pushed the optimize-observers branch 7 times, most recently from 2b3c7ec to f623714 Compare March 20, 2024 00:54
@bwoebi bwoebi marked this pull request as ready for review March 20, 2024 00:54
@bwoebi bwoebi requested review from dstogov and bukka as code owners March 20, 2024 00:54
Copy link
Member

@dstogov dstogov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this should be committed. I don't like the fact that the engine started to depend on observer API more and more.

I didn't review the zend_jit_ir.c part yet. Can it be implemented using JIT helpers (C function) to make the most dangerous part of the patch less? I don't think bloating of that JIT code may make any positive effect. I expect this patch should slow-down the real life apps.

As an optimization patch it requires benchmark results to prove its profitability.

@iluuu1994 could you please also take a look.

Comment on lines -4545 to +4578
if (ZEND_OBSERVER_ENABLED || zend_execute_internal) {
if (zend_execute_internal) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we had agreed to fallback using ICALL instead of FRAMELESS in case of OBSERVER.

The actual speed-up of FRAMELESS calls wasn't very significant and the increase of the complexity already made questions. I expect, any observer makes much more significant slowdown and I don't see reasons in attempt
to win 0.1% of performance when we already losing 10%.

This patch looks much more complex...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done the proper benchmarking homework now - frameless makes about 1% in wall time for wordpress, with observers.
And yes, we were losing 5% (wall time). Now we're losing 2.5%.
I would agree on 0.1% speedup not making sense for 10% loss. But that's not the case here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I remember frameless calls gave ~0.8% win on symfony demo amd wordpress (let it be 1%).
How the observer might lose 5% in the back direction? It should be the same 1%.

Copy link
Member Author

@bwoebi bwoebi Mar 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, you misunderstand. When enabling a single observer at all have a 5% overall perf loss on the wordpress bench on current master. With this patch, without frameless support, we have a ~3.5% loss. With this patch and frameless support we have ~2.5% loss.
I.e. we do actually gain the same 1% from frameless.

I was just saying that:

I don't see reasons in attempt to win 0.1% of performance when we already losing 10%.

is exaggerated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, you misunderstand. When enabling a single observer at all have a 5% overall perf loss on the wordpress bench on current master. With this patch, without frameless support, we have a ~3.5% loss. With this patch and frameless support we have ~2.5% loss.
I.e. we do actually gain the same 1% from frameless.

OK. This sounds more fair. Framiteless functions give 1% speedup, but only without observer. This proposal attempts to give the same 1% speedup when observed.

I don't think it's possible to give the same speedup, because your handler add overhead checking for observability (ZEND_OBSERVER_ENABLED + zend_observer_handler_is_unobserved).
Does this patch makes real improvement without JIT? (Your valgrind numbers show 5% on wordpress, and I can't explain this, because frameless function themselves gave just 1%).

I'm not completely sure what you are doing with JIT. Do you generate code for both frameless and ICALL? Or do you generate the code only for one of them depending on observability at the moment of compilation?

I'll try to analyse your benchmark results more careful...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your valgrind numbers show 5% on wordpress, and I can't explain this, because frameless function themselves gave just 1%.

The instr difference on this patch (with frameless) vs current master is 6-7%. The perf difference is 2.5%.
The instr difference on this patch with frameless vs this patch without frameless is 2%. The perf difference is 1%.

For both JIT and non-JIT.

I'm not completely sure what you are doing with JIT. Do you generate code for both frameless and ICALL? Or do you generate the code only for one of them depending on observability at the moment of compilation?

The generated code is essentially if (needs observing) { push frame (without arg copy); call FLF handler; pop frame } else { call FLF handler; }. So it's not quite falling back to ICALL in JIT.

I don't think it's possible to give the same speedup, because your handler add overhead checking for observability (ZEND_OBSERVER_ENABLED + zend_observer_handler_is_unobserved).

The ZEND_OBSERVER_ENABLED part is specialized away. Only zend_observer_handler_is_unobserved is checked, and that one is inlined - i.e. the full check for FLF functions is just if (*ZEND_OBSERVER_DATA(ZEND_FLF_FUNC(opline)) == ZEND_OBSERVER_NONE_OBSERVED).

which is a very low-cost check, comparatively to ICALL. We can assume the run time cache for FLF functions being set and the func not be a generator or trampoline, saving a fetch and two comparisons too.

Yes, if we had the same handling than we had for FCALL, then the speedup would not quite be the same.

Comment on lines 1125 to 1129
case ZEND_FRAMELESS_ICALL_3:
// Copy result to OP_DATA to ensure dispatch in observer fallback can be aligned with the last opcode of the call
opline[1].result.var = opline->result.var;
opline[1].result_type = opline->result_type;
break;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't understand why this is necessary. For me this looks like a hack.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the fallback to ZEND_DO_ICALL (when we VM_DISPATCH) to work, the opcode after opline must be the next actual opline after and not OP_DATA. Hence ZEND_DO_ICALL must be dispatched on the OP_DATA opcode for ZEND_FRAMELESS_ICALL_3. And thus result must be on the OP_DATA opcode as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great to avoid this hack.
It may confuse some of Optimizer pass (e.g. optimize_temp_vars_5.c) that don't expect repeatable definition.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's why this hack is removed in undo_pass_two. I.e. for optimizer purposes it doesn't know about that hack. But it works fine with VM :-)

Comment on lines 9605 to 9632
ZVAL_NULL(EX_VAR(opline->result.var));
zval *result = EX_VAR(opline->result.var);
ZVAL_NULL(result);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this really change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing, in ZEND_FRAMELESS_ICALL_0 (actually the line below should also use result). For the other frameless opcodes, it was about not refetching from opline->result after the unobserved check.

if (ZEND_OBSERVER_ENABLED) {
zend_function *fbc = ZEND_FLF_FUNC(opline);
if (UNEXPECTED(zend_observer_handler_is_unobserved(ZEND_OBSERVER_DATA(fbc)) == false)) {
zend_execute_data *call = _zend_vm_stack_push_call_frame_ex(zend_vm_calc_used_stack(0, fbc), ZEND_CALL_NESTED_FUNCTION, fbc, 0, NULL);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we avoid call to zend_vm_calc_used_stack(0, fbc). The size of the stack should be known statically.
May be I'm wrong... I don't remember when we got zend_internal_function->T and what does it mean...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the temporary the observers use to store the prev_observed_frame.
For FLF functions we can probably hardcode it to (ZEND_CALL_FRAME_SLOT + X + 1) * sizeof(zval), at least for now given that we don't have a mechanism to dynamically allocate temps for arbitrary functions.

This is outside the hot path of not-observed functions, so I don't consider this as critical, but can change it.

Comment on lines 9634 to 9693
if (EG(exception)) {
if (UNEXPECTED(EG(exception) != NULL)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your patch is already huge enough. Mixing the real changes with unrelated code refactoring is annoying.

@bwoebi
Copy link
Member Author

bwoebi commented Mar 25, 2024

Some valgrind benchmark.php numbers:

This PR (without observers, virtually the same then baseline without observers):

{
    "Zend\/bench.php": {
        "instructions": "2230512408"
    },
    "Zend\/bench.php JIT": {
        "instructions": "608925423"
    },
    "Symfony Demo 2.2.3": {
        "instructions": "98897712"
    },
    "Symfony Demo 2.2.3 JIT": {
        "instructions": "100896997"
    },
    "Wordpress 6.2": {
        "instructions": "265831175"
    },
    "Wordpress 6.2 JIT": {
        "instructions": "224894366"
    }
}

This PR with -d zend_test.observer.enabled=1 -d zend_test.observer.show_output=0':

{
    "Zend\/bench.php": {
        "instructions": "2385706119"
    },
    "Zend\/bench.php JIT": {
        "instructions": "656269578"
    },
    "Symfony Demo 2.2.3": {
        "instructions": "103402360"
    },
    "Symfony Demo 2.2.3 JIT": {
        "instructions": "104049744"
    },
    "Wordpress 6.2": {
        "instructions": "281125030"
    },
    "Wordpress 6.2 JIT": {
        "instructions": "231714316"
    }
}

Note that for Zend/bench.php, it adds 155M instrs to have observer enabled at all. For the JIT variant it adds only 48M instrs.
Similarly for Symfony demo it adds 5.6M instrs to have observer enabled. With JIT it adds only 3.2M instrs.

So having extra JIT support halves the overhead for symfony.

Having frameless+observer support in JIT has virtually no improvement for symfony. Having frameless+observer outside of JIT shows about 0.8% improvement on the Symfony Demo (104191895 instructions when frameless is disabled).
For wordpress having frameless disabled is 287232509 instrs; with JIT it's 235009436 instrs. I.e. 2-4% of the improvement comes from frameless.

For comparison the baseline numbers, with observer enabled:

{
    "Zend\/bench.php": {
        "instructions": "2514733906"
    },
    "Zend\/bench.php JIT": {
        "instructions": "805609640"
    },
    "Symfony Demo 2.2.3": {
        "instructions": "106351366"
    },
    "Symfony Demo 2.2.3 JIT": {
        "instructions": "105622352"
    },
    "Wordpress 6.2": {
        "instructions": "297501230"
    },
    "Wordpress 6.2 JIT": {
        "instructions": "246740281"
    }
}

I.e. compared to baseline Symfony is 3% faster. JIT is 1.5% faster.
Wordpress is 6% faster, JIT is 7% faster.
On the bench.php it's 5% faster; bench JIT is 8% faster.

The relative overhead of observer itself on bench.php JIT is only 5M (6%) instrs instead of 20M (25%) instrs now.
On Wordpress (no JIT) it's 15M (6%) instrs instead of 32M (12%) instrs now. With JIT we have 7M (3%) instead of 22M (10%) instrs,
With Symfony (no JIT) the relative gain is less, but still, dropping from 8% to 5%.

Overall it feels worth the bit of extra complexity - bench.php gets significant improvement, wordpress has also quite significant improvement (a 10% overhead is more than halved); Symfony gets a moderate improvement. In terms of wall time it was like nearly ~1% improvement for me on Symfony JIT. Wordpress JIT got a solid 2% faster in wall time (and up to 10% in cpu time according to time!). bench.php is 7% faster in wall time.

@bwoebi bwoebi force-pushed the optimize-observers branch 5 times, most recently from 07ad243 to 343d450 Compare April 2, 2024 16:39
@bwoebi bwoebi force-pushed the optimize-observers branch 4 times, most recently from 84aa8b8 to 628478e Compare May 25, 2024 21:25
@bwoebi bwoebi force-pushed the optimize-observers branch from 628478e to 04f144b Compare May 25, 2024 21:49
Comment on lines 1861 to 1866
zend_function *func = ZEND_FLF_FUNC(opline);
/* Assume frameless functions are not recursive with themselves.
* This condition may be true when observers are enabled. */
if (last_call && last_call->func == func) {
goto not_frameless_call;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't understand what are you doing here, and how this is related to observer optimization.
If this is a bug fix, please explain and commit this separately.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There will be a frame (the observed call) on top of the FRAMELESS opcode now. If I don't do this, it will try to generate the frame for the FRAMLESS frame on top of that, i.e. observed frameless calls would appear twice in stacktraces now.

Copy link
Member

@dstogov dstogov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This becomes a bit better.

  • The VM part is almost good (see comments).
  • The JIT part does a lot of unnecessary thing. It should be simplified. Obsolete changes should be removed (zend_jit_do_fcall)
  • Observer API changes should be explained (documented).

Comment on lines 36 to 42
typedef void (*zend_observer_fcall_begin_handler)(zend_execute_data *execute_data);
typedef void (*zend_observer_fcall_end_handler)(zend_execute_data *execute_data, zval *retval);

typedef struct _zend_observer_fcall_handlers {
zend_observer_fcall_begin_handler begin;
zend_observer_fcall_end_handler end;
} zend_observer_fcall_handlers;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to move these declaations?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I can move these back.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, do it and try to minimize the patch in the other places as well.

Comment on lines 9674 to 9675
if (UNEXPECTED(zend_observer_handler_is_unobserved(ZEND_OBSERVER_DATA(fbc)) == false)) {
ZEND_VM_DISPATCH_TO_HELPER(zend_frameless_observed_call, args, 1, fbc, fbc, op1, arg1, op2, NULL, op3, NULL, result, result);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing args doesn't make any sense. It shoukd be cheaper to check opline->opcode.
Instead of passing "args1", "args2", "args3" and "result", I would fetch them in zend_frameless_observed_call().
This way ZEND_FRAMELESS_ICALL_* handlers and zend_frameless_observed_call() will become completely independent.

Then you might be able to reuse zend_frameless_observed_call() from JIT.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done :-)

Comment on lines +4549 to +4552
ir_END_list(data->ir_end_inputs);
ir_IF_TRUE(data->if_unobserved);
ir_END_list(data->ir_end_inputs);
ir_MERGE_list(data->ir_end_inputs);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you put the empty TRUE part of unbserved condition here. You might put it right after condition...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The frameless handling uses jit_observer_fcall_is_unobserved_start, and uses the ir_IF_TRUE branch itself and does not call the end function here.

Comment on lines 4527 to 4530
// JIT: if (ZEND_USER_CODE(func->type)) {
ir_ref tmp = ir_LOAD_U8(ir_ADD_OFFSET(func_ref, offsetof(zend_function, type)));
ir_ref if_internal_func = ir_IF(ir_AND_U8(tmp, ir_CONST_U8(1)));
ir_IF_TRUE(if_internal_func);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need to check for internal/user functions?
You optimizes frameless functions - they are internal by design.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code gets called by the generic and the frameless optimization.

Comment on lines 17062 to 17083
uint32_t call_num_args = ZEND_FLF_NUM_ARGS(opline->opcode);

// push call frame
ir_ref call_ref = ir_CALL_4(IR_ADDR, ir_CONST_FC_FUNC(zend_jit_observed_frameless_helper_push),
jit_FP(jit),
ir_CONST_U32(zend_vm_calc_used_stack(call_num_args, fbc)),
ir_CONST_ADDR(fbc),
ir_CONST_U32(call_num_args));

// push all args
switch (call_num_args) {
case 3: jit_ZVAL_COPY(jit, ZEND_ADDR_REF_ZVAL(ir_ADD_OFFSET(call_ref, EX_NUM_TO_VAR(2))), MAY_BE_ANY & ~MAY_BE_REF, ZEND_ADDR_REF_ZVAL(op1_data_ref), op1_data_info, 1); ZEND_FALLTHROUGH;
case 2: jit_ZVAL_COPY(jit, ZEND_ADDR_REF_ZVAL(ir_ADD_OFFSET(call_ref, EX_NUM_TO_VAR(1))), MAY_BE_ANY & ~MAY_BE_REF, ZEND_ADDR_REF_ZVAL(op2_ref), op2_info, 1); ZEND_FALLTHROUGH;
case 1: jit_ZVAL_COPY(jit, ZEND_ADDR_REF_ZVAL(ir_ADD_OFFSET(call_ref, EX_NUM_TO_VAR(0))), MAY_BE_ANY & ~MAY_BE_REF, ZEND_ADDR_REF_ZVAL(op1_ref), op1_info, 1);
}

// call and free args
ir_CALL_4(IR_VOID, ir_CONST_FC_FUNC(zend_jit_observed_frameless_helper_call),
call_ref,
ir_CONST_FC_FUNC(fbc->internal_function.handler),
observer_handler,
res_ref);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you split the logic into two helpers. Do all the things in a single helper.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I don't understand the JIT well enough. I don't know what state the JIT is in with respect to the VM.
Like, will the temporaries be actually set or something? Can I just fetch as if it were in VM? Like do EX_VAR() on any operand and the value is in correct state?
I thought it weren't unless we'd do deoptimization/exit from VM.
At least I don't feel confident doing it, but the way this currently is works very obviously to me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I managed to unify it and seems to work.

Comment on lines -10050 to +10150
if (ZEND_OBSERVER_ENABLED) {
ir_CALL_1(IR_VOID, ir_CONST_FC_FUNC(zend_observer_fcall_begin), rx);
bool may_have_observer = ZEND_OBSERVER_ENABLED && (!func || (func->common.fn_flags & (ZEND_ACC_CALL_VIA_TRAMPOLINE | ZEND_ACC_GENERATOR)) == 0);
if (may_have_observer) {
ir_ref observer_handler;
struct jit_observer_fcall_is_unobserved_data unobserved_data = jit_observer_fcall_is_unobserved_start(jit, func, &observer_handler, rx, func_ref ? func_ref : ir_LOAD_A(jit_CALL(rx, func)));
jit_observer_fcall_begin(jit, rx, observer_handler);
jit_observer_fcall_is_unobserved_end(jit, &unobserved_data);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what do you do in zend_jit_push_call_frame() and zend_jit_do_fcall() now.
These changes are not needed for the frameless/observer support anymore.
Please remove them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you might have forgotten that there are two parts to this PR: optimizing observers in general and enabling frameless calls with observers active. See e.g. #13649 (comment) where I discuss the impact of only generic optimizations (like this one here) and frameless support included.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I.e. for normal, non-frameless fcalls, it's also much cheaper to JIT the runtime cache lookup than doing it in non-specialized functions.

@iluuu1994 iluuu1994 self-requested a review May 27, 2024 10:30
@bwoebi bwoebi force-pushed the optimize-observers branch from c31c346 to 71b16ee Compare June 1, 2024 18:17
test@test.test added 2 commits June 1, 2024 22:32
@bwoebi bwoebi force-pushed the optimize-observers branch from eef56d3 to 6b1d905 Compare June 1, 2024 21:02
@bwoebi bwoebi force-pushed the optimize-observers branch from 6b1d905 to 14bb1c1 Compare June 1, 2024 21:18
Copy link
Member

@dstogov dstogov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked only VM and JIT changes.
They looks almost good.
Please fix the problem with "Undefined variable" warning and make an additional cleanup pass.

I hope, this is going to be merged on the next iteration.

Comment on lines 9629 to 9637
zval *arg1 = GET_OP1_ZVAL_PTR_DEREF(BP_VAR_R);
if (EG(exception)) {
if (UNEXPECTED(EG(exception) != NULL)) {
FREE_OP1();
HANDLE_EXCEPTION();
}
function(EX_VAR(opline->result.var), arg1);

#if !ZEND_VM_SPEC || ZEND_OBSERVER_ENABLED
if (ZEND_OBSERVER_ENABLED && UNEXPECTED(zend_observer_handler_is_unobserved(ZEND_OBSERVER_DATA(ZEND_FLF_FUNC(opline))) == false)) {
zend_frameless_observed_call(execute_data);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case of undefined CV in op1, the "Undefined variable" warning is going to be emitted twice.
Once by GET_OP1_ZVAL_PTR_DEREF() and then in zend_frameless_observed_call().

GET_OP1_ZVAL_PTR_DEREF(BP_VAR_R) should be moved into the not-observed path.

Copy link
Member Author

@bwoebi bwoebi Jun 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it won't. zend_frameless_observed_call uses zend_get_zval_ptr, which doesn't emit undef warnings.
But good point though, because IS_UNDEF needs to be checked and converted to IS_NULL.

@@ -8178,10 +8274,11 @@ static int zend_jit_push_call_frame(zend_jit_ctx *jit, const zend_op *opline, co
ir_ref used_stack_ref = IR_UNUSED;
bool stack_check = 1;
ir_ref rx, ref, top, if_enough_stack, cold_path = IR_UNUSED;
uint32_t num_args = ZEND_OP_IS_FRAMELESS_ICALL(opline->opcode) ? ZEND_FLF_NUM_ARGS(opline->opcode) : opline->extended_value;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this? Where zend_jit_push_call_frame() is used for frameless now?

Please, verify all the changes in zend_jit_push_call_frame() and zend_jit_do_fcall(). Some of them are the reminders from your previous attempts and may be removed now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need it, sorry for the leftover.

Comment on lines 17117 to +17123
if (op1_info & MAY_BE_REF) {
op1_ref = jit_ZVAL_DEREF_ref(jit, op1_ref);
}

ir_ref skip_observer = IR_UNUSED;
if (ZEND_OBSERVER_ENABLED) {
skip_observer = jit_frameless_observer(jit, opline);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same problem as in VM. Undefined warning is going to be emitted twice.

Comment on lines 36 to 42
typedef void (*zend_observer_fcall_begin_handler)(zend_execute_data *execute_data);
typedef void (*zend_observer_fcall_end_handler)(zend_execute_data *execute_data, zval *retval);

typedef struct _zend_observer_fcall_handlers {
zend_observer_fcall_begin_handler begin;
zend_observer_fcall_end_handler end;
} zend_observer_fcall_handlers;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, do it and try to minimize the patch in the other places as well.

Comment on lines 17181 to 17185
/* Set OP1 to UNDEF in case FREE_OP2() throws. */
// TODO: I believe this is only necessary when jit_ir_needs_dtor is true for either op
if ((opline->op1_type & (IS_VAR|IS_TMP_VAR)) != 0 && (opline->op2_type & (IS_VAR|IS_TMP_VAR)) != 0) {
jit_set_Z_TYPE_INFO(jit, op1_addr, IS_UNDEF);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't completely understand your comment about jit_ir_needs_dtor. We don't have such variable or function.
I agree, these IS_UNDEF assignments may be optimized out if we take into account types inferred for next operands and prove that their destruction can't cause exceptions.

@bwoebi please remove this comment

@iluuu1994 please think about the proposed optimization

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll remove it - I don't even recall the exact details when I wrote that.

@bwoebi bwoebi force-pushed the optimize-observers branch from 96c9c12 to 7b989c6 Compare June 6, 2024 12:43
Comment on lines 1545 to 1548
if (Z_ISUNDEF_P(zv)) {
ZVAL_NULL(zv);
}
ZVAL_COPY_DEREF(ZEND_CALL_VAR_NUM(call, arg), zv);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is wrong!
The original CV must not be touched. it should be kept IS_UNDEF.
Only ZEND_CALL_VAR_NUM(call, arg) should be set to IS_NULL.
Something like this:

zval *dst = ZEND_CALL_VAR_NUM(call, arg);
if (Z_ISUNDEF_P(zv)) {
	ZVAL_NULL(dst);
} else {
	ZVAL_COPY_DEREF(dst, zv);
}

@bwoebi bwoebi force-pushed the optimize-observers branch from 7b989c6 to 2f69d7c Compare June 6, 2024 16:14
Copy link
Member

@dstogov dstogov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The VM and JIT parts look more or less good now.
I don't object against this, but I still don't like these observer related changes.

@iluuu1994 please take a final look.

@iluuu1994
Copy link
Member

The VM changes also look good to me. I did not look at the JIT, or the observer changes in detail.

@bwoebi bwoebi merged commit 6a2c531 into php:master Jun 15, 2024
11 checks passed
@bwoebi bwoebi deleted the optimize-observers branch June 15, 2024 12:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants