-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Optimize observers #13649
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize observers #13649
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not understand what the benefit of moving result
to OP_DATA
is. Can you elaborate on this point?
I also didn't fully understand why ZEND_OBSERVER_NONE_OBSERVED
is needed. From what I understand, zend_observer_remove_handler
moves the tail to -1, so if the first handler is unset, no handler should be installed. Tbh the observer.{h,c} code is quite hard to read, but I didn't spend too much time on it.
Did you run some benchmarks? Can you notice a difference for frameless functions, in particular?
return *handler == ZEND_OBSERVER_NONE_OBSERVED; | ||
} | ||
|
||
static zend_always_inline bool zend_observer_fcall_has_no_observers(zend_execute_data *execute_data, bool allow_generator, zend_observer_fcall_begin_handler **handler) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Avoid double negation? E.g. !zend_observer_fcall_has_no_observers
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The naming problem here is that it's not a binary state, but "no observers", "some observers" and "observers undetermined", and I want to distinguish "no observers" vs "some or undetermined observers". I'm not sure how to better name that.
Hm, have I been unclear in the PR description? "it turned out to be most viable to put the result operand on the ZEND_OP_DATA instead of ZEND_FRAMELESS_ICALL_3, allowing seamless interoperability with the DO_ICALL opcode" - DO_ICALL cannot jump an extra opline (skipping OP_DATA that is) at its end, and accesses result on its opline.
Simply an optimization to avoid reading both start and end handlers when none is set, i.e. saves a read in the hot path of checking whether a function is being observed. |
Regarding rough benchmarking numbers, without debug, without opcache (wall time; v: valgrind instructions), with this script: Baseline of empty loop: 1 sec (v: 1,007,435,926) (Using 20000000 iterations for valgrind)
master, without observers: 2.8 sec (v: 2,727,435,251) So, yes, the frameless functions are much faster to call. Now, the same code with pow instead of min (which is not using FLF currently): master, without observers: 4.9 sec (v: 4,687,435,657) The generic improvement of inlining the runtime cache check (and just checking one value instead of two) is quite significant on its own already, but using frameless calls just completely blows it out of the water. I'm also amazed that apparently for frameless calls it uses only 0.1 sec (3% of time) more even though it needs 10% more valgrind instructions. |
0dd6584
to
ee5d6ed
Compare
I missed this point. Unfortunately, I don't have a better idea without significant complications that would nullify the performance benefit, or duplicate the DO_ICALL handler.
Thanks for the explanation. 🙂 The naming could probably be improved could be improved. Or at least add a short comment to |
dfbe08a
to
72d298a
Compare
#define ZEND_OBSERVABLE_FN(function) \ | ||
(ZEND_MAP_PTR(function->common.run_time_cache) && !(function->common.fn_flags & ZEND_ACC_CALL_VIA_TRAMPOLINE)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does this check effectively move to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to zend_observer_fcall_has_no_observers in zend_observer.h
2b3c7ec
to
f623714
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this should be committed. I don't like the fact that the engine started to depend on observer API more and more.
I didn't review the zend_jit_ir.c
part yet. Can it be implemented using JIT helpers (C function) to make the most dangerous part of the patch less? I don't think bloating of that JIT code may make any positive effect. I expect this patch should slow-down the real life apps.
As an optimization patch it requires benchmark results to prove its profitability.
@iluuu1994 could you please also take a look.
if (ZEND_OBSERVER_ENABLED || zend_execute_internal) { | ||
if (zend_execute_internal) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we had agreed to fallback using ICALL instead of FRAMELESS in case of OBSERVER.
The actual speed-up of FRAMELESS calls wasn't very significant and the increase of the complexity already made questions. I expect, any observer makes much more significant slowdown and I don't see reasons in attempt
to win 0.1% of performance when we already losing 10%.
This patch looks much more complex...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've done the proper benchmarking homework now - frameless makes about 1% in wall time for wordpress, with observers.
And yes, we were losing 5% (wall time). Now we're losing 2.5%.
I would agree on 0.1% speedup not making sense for 10% loss. But that's not the case here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I remember frameless calls gave ~0.8% win on symfony demo amd wordpress (let it be 1%).
How the observer might lose 5% in the back direction? It should be the same 1%.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, you misunderstand. When enabling a single observer at all have a 5% overall perf loss on the wordpress bench on current master. With this patch, without frameless support, we have a ~3.5% loss. With this patch and frameless support we have ~2.5% loss.
I.e. we do actually gain the same 1% from frameless.
I was just saying that:
I don't see reasons in attempt to win 0.1% of performance when we already losing 10%.
is exaggerated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, you misunderstand. When enabling a single observer at all have a 5% overall perf loss on the wordpress bench on current master. With this patch, without frameless support, we have a ~3.5% loss. With this patch and frameless support we have ~2.5% loss.
I.e. we do actually gain the same 1% from frameless.
OK. This sounds more fair. Framiteless functions give 1% speedup, but only without observer. This proposal attempts to give the same 1% speedup when observed.
I don't think it's possible to give the same speedup, because your handler add overhead checking for observability (ZEND_OBSERVER_ENABLED
+ zend_observer_handler_is_unobserved
).
Does this patch makes real improvement without JIT? (Your valgrind numbers show 5% on wordpress, and I can't explain this, because frameless function themselves gave just 1%).
I'm not completely sure what you are doing with JIT. Do you generate code for both frameless and ICALL? Or do you generate the code only for one of them depending on observability at the moment of compilation?
I'll try to analyse your benchmark results more careful...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your valgrind numbers show 5% on wordpress, and I can't explain this, because frameless function themselves gave just 1%.
The instr difference on this patch (with frameless) vs current master is 6-7%. The perf difference is 2.5%.
The instr difference on this patch with frameless vs this patch without frameless is 2%. The perf difference is 1%.
For both JIT and non-JIT.
I'm not completely sure what you are doing with JIT. Do you generate code for both frameless and ICALL? Or do you generate the code only for one of them depending on observability at the moment of compilation?
The generated code is essentially if (needs observing) { push frame (without arg copy); call FLF handler; pop frame } else { call FLF handler; }
. So it's not quite falling back to ICALL in JIT.
I don't think it's possible to give the same speedup, because your handler add overhead checking for observability (ZEND_OBSERVER_ENABLED + zend_observer_handler_is_unobserved).
The ZEND_OBSERVER_ENABLED part is specialized away. Only zend_observer_handler_is_unobserved is checked, and that one is inlined - i.e. the full check for FLF functions is just if (*ZEND_OBSERVER_DATA(ZEND_FLF_FUNC(opline)) == ZEND_OBSERVER_NONE_OBSERVED)
.
which is a very low-cost check, comparatively to ICALL. We can assume the run time cache for FLF functions being set and the func not be a generator or trampoline, saving a fetch and two comparisons too.
Yes, if we had the same handling than we had for FCALL, then the speedup would not quite be the same.
Zend/zend_opcode.c
Outdated
case ZEND_FRAMELESS_ICALL_3: | ||
// Copy result to OP_DATA to ensure dispatch in observer fallback can be aligned with the last opcode of the call | ||
opline[1].result.var = opline->result.var; | ||
opline[1].result_type = opline->result_type; | ||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't understand why this is necessary. For me this looks like a hack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the fallback to ZEND_DO_ICALL (when we VM_DISPATCH) to work, the opcode after opline
must be the next actual opline after and not OP_DATA. Hence ZEND_DO_ICALL must be dispatched on the OP_DATA opcode for ZEND_FRAMELESS_ICALL_3. And thus result must be on the OP_DATA opcode as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great to avoid this hack.
It may confuse some of Optimizer pass (e.g. optimize_temp_vars_5.c) that don't expect repeatable definition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's why this hack is removed in undo_pass_two. I.e. for optimizer purposes it doesn't know about that hack. But it works fine with VM :-)
Zend/zend_vm_def.h
Outdated
ZVAL_NULL(EX_VAR(opline->result.var)); | ||
zval *result = EX_VAR(opline->result.var); | ||
ZVAL_NULL(result); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this really change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing, in ZEND_FRAMELESS_ICALL_0 (actually the line below should also use result
). For the other frameless opcodes, it was about not refetching from opline->result after the unobserved check.
Zend/zend_vm_def.h
Outdated
if (ZEND_OBSERVER_ENABLED) { | ||
zend_function *fbc = ZEND_FLF_FUNC(opline); | ||
if (UNEXPECTED(zend_observer_handler_is_unobserved(ZEND_OBSERVER_DATA(fbc)) == false)) { | ||
zend_execute_data *call = _zend_vm_stack_push_call_frame_ex(zend_vm_calc_used_stack(0, fbc), ZEND_CALL_NESTED_FUNCTION, fbc, 0, NULL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we avoid call to zend_vm_calc_used_stack(0, fbc)
. The size of the stack should be known statically.
May be I'm wrong... I don't remember when we got zend_internal_function->T
and what does it mean...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the temporary the observers use to store the prev_observed_frame.
For FLF functions we can probably hardcode it to (ZEND_CALL_FRAME_SLOT + X + 1) * sizeof(zval)
, at least for now given that we don't have a mechanism to dynamically allocate temps for arbitrary functions.
This is outside the hot path of not-observed functions, so I don't consider this as critical, but can change it.
Zend/zend_vm_def.h
Outdated
if (EG(exception)) { | ||
if (UNEXPECTED(EG(exception) != NULL)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your patch is already huge enough. Mixing the real changes with unrelated code refactoring is annoying.
Some valgrind benchmark.php numbers: This PR (without observers, virtually the same then baseline without observers):
This PR with
Note that for Zend/bench.php, it adds 155M instrs to have observer enabled at all. For the JIT variant it adds only 48M instrs. So having extra JIT support halves the overhead for symfony. Having frameless+observer support in JIT has virtually no improvement for symfony. Having frameless+observer outside of JIT shows about 0.8% improvement on the Symfony Demo (104191895 instructions when frameless is disabled). For comparison the baseline numbers, with observer enabled:
I.e. compared to baseline Symfony is 3% faster. JIT is 1.5% faster. The relative overhead of observer itself on bench.php JIT is only 5M (6%) instrs instead of 20M (25%) instrs now. Overall it feels worth the bit of extra complexity - bench.php gets significant improvement, wordpress has also quite significant improvement (a 10% overhead is more than halved); Symfony gets a moderate improvement. In terms of wall time it was like nearly ~1% improvement for me on Symfony JIT. Wordpress JIT got a solid 2% faster in wall time (and up to 10% in cpu time according to |
07ad243
to
343d450
Compare
84aa8b8
to
628478e
Compare
628478e
to
04f144b
Compare
Zend/zend_builtin_functions.c
Outdated
zend_function *func = ZEND_FLF_FUNC(opline); | ||
/* Assume frameless functions are not recursive with themselves. | ||
* This condition may be true when observers are enabled. */ | ||
if (last_call && last_call->func == func) { | ||
goto not_frameless_call; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't understand what are you doing here, and how this is related to observer optimization.
If this is a bug fix, please explain and commit this separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There will be a frame (the observed call) on top of the FRAMELESS opcode now. If I don't do this, it will try to generate the frame for the FRAMLESS frame on top of that, i.e. observed frameless calls would appear twice in stacktraces now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This becomes a bit better.
- The VM part is almost good (see comments).
- The JIT part does a lot of unnecessary thing. It should be simplified. Obsolete changes should be removed (
zend_jit_do_fcall
) - Observer API changes should be explained (documented).
Zend/zend_observer.h
Outdated
typedef void (*zend_observer_fcall_begin_handler)(zend_execute_data *execute_data); | ||
typedef void (*zend_observer_fcall_end_handler)(zend_execute_data *execute_data, zval *retval); | ||
|
||
typedef struct _zend_observer_fcall_handlers { | ||
zend_observer_fcall_begin_handler begin; | ||
zend_observer_fcall_end_handler end; | ||
} zend_observer_fcall_handlers; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it necessary to move these declaations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I can move these back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, do it and try to minimize the patch in the other places as well.
Zend/zend_vm_def.h
Outdated
if (UNEXPECTED(zend_observer_handler_is_unobserved(ZEND_OBSERVER_DATA(fbc)) == false)) { | ||
ZEND_VM_DISPATCH_TO_HELPER(zend_frameless_observed_call, args, 1, fbc, fbc, op1, arg1, op2, NULL, op3, NULL, result, result); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Passing args
doesn't make any sense. It shoukd be cheaper to check opline->opcode
.
Instead of passing "args1", "args2", "args3" and "result", I would fetch them in zend_frameless_observed_call()
.
This way ZEND_FRAMELESS_ICALL_*
handlers and zend_frameless_observed_call()
will become completely independent.
Then you might be able to reuse zend_frameless_observed_call()
from JIT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done :-)
ir_END_list(data->ir_end_inputs); | ||
ir_IF_TRUE(data->if_unobserved); | ||
ir_END_list(data->ir_end_inputs); | ||
ir_MERGE_list(data->ir_end_inputs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you put the empty TRUE part of unbserved condition here. You might put it right after condition...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The frameless handling uses jit_observer_fcall_is_unobserved_start, and uses the ir_IF_TRUE branch itself and does not call the end function here.
ext/opcache/jit/zend_jit_ir.c
Outdated
// JIT: if (ZEND_USER_CODE(func->type)) { | ||
ir_ref tmp = ir_LOAD_U8(ir_ADD_OFFSET(func_ref, offsetof(zend_function, type))); | ||
ir_ref if_internal_func = ir_IF(ir_AND_U8(tmp, ir_CONST_U8(1))); | ||
ir_IF_TRUE(if_internal_func); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need to check for internal/user functions?
You optimizes frameless functions - they are internal by design.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code gets called by the generic and the frameless optimization.
ext/opcache/jit/zend_jit_ir.c
Outdated
uint32_t call_num_args = ZEND_FLF_NUM_ARGS(opline->opcode); | ||
|
||
// push call frame | ||
ir_ref call_ref = ir_CALL_4(IR_ADDR, ir_CONST_FC_FUNC(zend_jit_observed_frameless_helper_push), | ||
jit_FP(jit), | ||
ir_CONST_U32(zend_vm_calc_used_stack(call_num_args, fbc)), | ||
ir_CONST_ADDR(fbc), | ||
ir_CONST_U32(call_num_args)); | ||
|
||
// push all args | ||
switch (call_num_args) { | ||
case 3: jit_ZVAL_COPY(jit, ZEND_ADDR_REF_ZVAL(ir_ADD_OFFSET(call_ref, EX_NUM_TO_VAR(2))), MAY_BE_ANY & ~MAY_BE_REF, ZEND_ADDR_REF_ZVAL(op1_data_ref), op1_data_info, 1); ZEND_FALLTHROUGH; | ||
case 2: jit_ZVAL_COPY(jit, ZEND_ADDR_REF_ZVAL(ir_ADD_OFFSET(call_ref, EX_NUM_TO_VAR(1))), MAY_BE_ANY & ~MAY_BE_REF, ZEND_ADDR_REF_ZVAL(op2_ref), op2_info, 1); ZEND_FALLTHROUGH; | ||
case 1: jit_ZVAL_COPY(jit, ZEND_ADDR_REF_ZVAL(ir_ADD_OFFSET(call_ref, EX_NUM_TO_VAR(0))), MAY_BE_ANY & ~MAY_BE_REF, ZEND_ADDR_REF_ZVAL(op1_ref), op1_info, 1); | ||
} | ||
|
||
// call and free args | ||
ir_CALL_4(IR_VOID, ir_CONST_FC_FUNC(zend_jit_observed_frameless_helper_call), | ||
call_ref, | ||
ir_CONST_FC_FUNC(fbc->internal_function.handler), | ||
observer_handler, | ||
res_ref); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you split the logic into two helpers. Do all the things in a single helper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because I don't understand the JIT well enough. I don't know what state the JIT is in with respect to the VM.
Like, will the temporaries be actually set or something? Can I just fetch as if it were in VM? Like do EX_VAR() on any operand and the value is in correct state?
I thought it weren't unless we'd do deoptimization/exit from VM.
At least I don't feel confident doing it, but the way this currently is works very obviously to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I managed to unify it and seems to work.
if (ZEND_OBSERVER_ENABLED) { | ||
ir_CALL_1(IR_VOID, ir_CONST_FC_FUNC(zend_observer_fcall_begin), rx); | ||
bool may_have_observer = ZEND_OBSERVER_ENABLED && (!func || (func->common.fn_flags & (ZEND_ACC_CALL_VIA_TRAMPOLINE | ZEND_ACC_GENERATOR)) == 0); | ||
if (may_have_observer) { | ||
ir_ref observer_handler; | ||
struct jit_observer_fcall_is_unobserved_data unobserved_data = jit_observer_fcall_is_unobserved_start(jit, func, &observer_handler, rx, func_ref ? func_ref : ir_LOAD_A(jit_CALL(rx, func))); | ||
jit_observer_fcall_begin(jit, rx, observer_handler); | ||
jit_observer_fcall_is_unobserved_end(jit, &unobserved_data); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what do you do in zend_jit_push_call_frame()
and zend_jit_do_fcall()
now.
These changes are not needed for the frameless/observer support anymore.
Please remove them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you might have forgotten that there are two parts to this PR: optimizing observers in general and enabling frameless calls with observers active. See e.g. #13649 (comment) where I discuss the impact of only generic optimizations (like this one here) and frameless support included.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I.e. for normal, non-frameless fcalls, it's also much cheaper to JIT the runtime cache lookup than doing it in non-specialized functions.
c31c346
to
71b16ee
Compare
This reverts commit 97cc381.
eef56d3
to
6b1d905
Compare
6b1d905
to
14bb1c1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked only VM and JIT changes.
They looks almost good.
Please fix the problem with "Undefined variable" warning and make an additional cleanup pass.
I hope, this is going to be merged on the next iteration.
Zend/zend_vm_def.h
Outdated
zval *arg1 = GET_OP1_ZVAL_PTR_DEREF(BP_VAR_R); | ||
if (EG(exception)) { | ||
if (UNEXPECTED(EG(exception) != NULL)) { | ||
FREE_OP1(); | ||
HANDLE_EXCEPTION(); | ||
} | ||
function(EX_VAR(opline->result.var), arg1); | ||
|
||
#if !ZEND_VM_SPEC || ZEND_OBSERVER_ENABLED | ||
if (ZEND_OBSERVER_ENABLED && UNEXPECTED(zend_observer_handler_is_unobserved(ZEND_OBSERVER_DATA(ZEND_FLF_FUNC(opline))) == false)) { | ||
zend_frameless_observed_call(execute_data); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case of undefined CV in op1, the "Undefined variable" warning is going to be emitted twice.
Once by GET_OP1_ZVAL_PTR_DEREF()
and then in zend_frameless_observed_call()
.
GET_OP1_ZVAL_PTR_DEREF(BP_VAR_R)
should be moved into the not-observed path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it won't. zend_frameless_observed_call
uses zend_get_zval_ptr
, which doesn't emit undef warnings.
But good point though, because IS_UNDEF needs to be checked and converted to IS_NULL.
ext/opcache/jit/zend_jit_ir.c
Outdated
@@ -8178,10 +8274,11 @@ static int zend_jit_push_call_frame(zend_jit_ctx *jit, const zend_op *opline, co | |||
ir_ref used_stack_ref = IR_UNUSED; | |||
bool stack_check = 1; | |||
ir_ref rx, ref, top, if_enough_stack, cold_path = IR_UNUSED; | |||
uint32_t num_args = ZEND_OP_IS_FRAMELESS_ICALL(opline->opcode) ? ZEND_FLF_NUM_ARGS(opline->opcode) : opline->extended_value; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this? Where zend_jit_push_call_frame()
is used for frameless now?
Please, verify all the changes in zend_jit_push_call_frame()
and zend_jit_do_fcall()
. Some of them are the reminders from your previous attempts and may be removed now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need it, sorry for the leftover.
if (op1_info & MAY_BE_REF) { | ||
op1_ref = jit_ZVAL_DEREF_ref(jit, op1_ref); | ||
} | ||
|
||
ir_ref skip_observer = IR_UNUSED; | ||
if (ZEND_OBSERVER_ENABLED) { | ||
skip_observer = jit_frameless_observer(jit, opline); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same problem as in VM. Undefined warning is going to be emitted twice.
Zend/zend_observer.h
Outdated
typedef void (*zend_observer_fcall_begin_handler)(zend_execute_data *execute_data); | ||
typedef void (*zend_observer_fcall_end_handler)(zend_execute_data *execute_data, zval *retval); | ||
|
||
typedef struct _zend_observer_fcall_handlers { | ||
zend_observer_fcall_begin_handler begin; | ||
zend_observer_fcall_end_handler end; | ||
} zend_observer_fcall_handlers; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, do it and try to minimize the patch in the other places as well.
ext/opcache/jit/zend_jit_ir.c
Outdated
/* Set OP1 to UNDEF in case FREE_OP2() throws. */ | ||
// TODO: I believe this is only necessary when jit_ir_needs_dtor is true for either op | ||
if ((opline->op1_type & (IS_VAR|IS_TMP_VAR)) != 0 && (opline->op2_type & (IS_VAR|IS_TMP_VAR)) != 0) { | ||
jit_set_Z_TYPE_INFO(jit, op1_addr, IS_UNDEF); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't completely understand your comment about jit_ir_needs_dtor
. We don't have such variable or function.
I agree, these IS_UNDEF assignments may be optimized out if we take into account types inferred for next operands and prove that their destruction can't cause exceptions.
@bwoebi please remove this comment
@iluuu1994 please think about the proposed optimization
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll remove it - I don't even recall the exact details when I wrote that.
96c9c12
to
7b989c6
Compare
Zend/zend_execute.c
Outdated
if (Z_ISUNDEF_P(zv)) { | ||
ZVAL_NULL(zv); | ||
} | ||
ZVAL_COPY_DEREF(ZEND_CALL_VAR_NUM(call, arg), zv); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is wrong!
The original CV must not be touched. it should be kept IS_UNDEF.
Only ZEND_CALL_VAR_NUM(call, arg) should be set to IS_NULL.
Something like this:
zval *dst = ZEND_CALL_VAR_NUM(call, arg);
if (Z_ISUNDEF_P(zv)) {
ZVAL_NULL(dst);
} else {
ZVAL_COPY_DEREF(dst, zv);
}
7b989c6
to
2f69d7c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The VM and JIT parts look more or less good now.
I don't object against this, but I still don't like these observer related changes.
@iluuu1994 please take a final look.
The VM changes also look good to me. I did not look at the JIT, or the observer changes in detail. |
Inline the lookup whether a function is observed at all. This strategy is also used for FRAMELESS calls. If the frameless call is observed, we instead allocate a call frame and push the arguments, to call the the function afterwards. Doing so is still a performance benefit as opposed to executing individual INIT_FCALL+SEND_VAL ops. Thus, even if the frameless call turns out to be observed, the call overhead is slightly lower than before. If the internal function is not observed at all, the unavoidable overhead is fetching the FLF zend_function pointer and the run-time cache needs to be inspected.
As part of this work, it turned out to be most viable to put the result operand on the ZEND_OP_DATA instead of ZEND_FRAMELESS_ICALL_3, allowing seamless interoperability with the DO_ICALL opcode. This is a bit unusual in comparison to all other ZEND_OP_DATA usages, but seems to not pose problems overall.
There is also a small issue resolved: trampolines would always use the ZEND_CALL_TRAMPOLINE_SPEC_OBSERVER function due to zend_observer_fcall_op_array_extension being set to -1 too late.
The same inlining of operations has been applied to JIT as well.