Skip to content

JIT: Add IBT support #8636

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 31, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions ext/opcache/jit/dynasm/dasm_x86.lua
Original file line number Diff line number Diff line change
Expand Up @@ -1147,6 +1147,8 @@ local map_op = {
rep_0 = "F3",
repe_0 = "F3",
repz_0 = "F3",
endbr32_0 = "F30F1EFB",
endbr64_0 = "F30F1EFA",
-- F4: *hlt
cmc_0 = "F5",
-- F6: test... mb,i; div... mb
Expand Down
13 changes: 13 additions & 0 deletions ext/opcache/jit/zend_jit_x86.dasc
Original file line number Diff line number Diff line change
Expand Up @@ -1623,6 +1623,16 @@ static size_t tsrm_tls_offset;
|| }
|.endmacro

|.macro ENDBR
||#if defined (__CET__) && (__CET__ & 1) != 0
| .if X64
| endbr64
| .else
| endbr32
| .endif
||#endif
|.endmacro

Comment on lines +1626 to +1635
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we enable/disable generation of endbr ar run-time? e.g. thorough new bit in opcache.jit directive?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this is really necessary...
Can you share some CET vs non-CET benchmarks results? (at least Zend/bench.php)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern is that runtime enable/disable may cause inconsistency, i.e., jitted code has endbr and gcced code doesn't. Our current method mimics or aligns with the behavior of gcc who enable/disable generation of endbr at compile time.
GCC:
When compiled with gcc -fcf-protection=full (default) or gcc -fcf-protection=branch, the MACRO __CET__ is defined and endbr is inserted. It will also emit a property named IBT to ELF header.
Execution flow:
Once ld.so see IBT property in ELF header, it issues an arch_prctl() syscall to kernel to enable HW/CPU CET engine.
Benchmarks results:
I will run and share Google PKB benchmark result next week. Below are quick Zend/bench.php results:

  • CET. Binary has endbr and HW/CPU CET on.
$ ./sapi/cli/php Zend/bench.php
simple             0.026
simplecall         0.011
simpleucall        0.031
simpleudcall       0.026
mandel             0.098
mandel2            0.087
ackermann(7)       0.018
ary(50000)         0.003
ary2(50000)        0.002
ary3(2000)         0.033
fibo(30)           0.056
hash1(50000)       0.006
hash2(500)         0.007
heapsort(20000)    0.022
matrix(20)         0.021
nestedloop(12)     0.022
sieve(30)          0.012
strcat(200000)     0.004
------------------------
Total              0.485
  • Non-CET. Binary has no endbr and HW/CPU CET off
$ ./sapi/cli/php Zend/bench.php
simple             0.025
simplecall         0.011
simpleucall        0.030
simpleudcall       0.025
mandel             0.098
mandel2            0.090
ackermann(7)       0.020
ary(50000)         0.003
ary2(50000)        0.003
ary3(2000)         0.031
fibo(30)           0.056
hash1(50000)       0.006
hash2(500)         0.007
heapsort(20000)    0.022
matrix(20)         0.021
nestedloop(12)     0.022
sieve(30)          0.012
strcat(200000)     0.004
------------------------
Total              0.486

BTW, when HW/CPU CET engine is OFF, the endbr is regarded as nop. For this case, even the app has endbr inside, the perf impact is negligible.

static bool reuse_ip = 0;
static bool delayed_call_chain = 0;
static uint32_t delayed_call_level = 0;
Expand Down Expand Up @@ -2292,6 +2302,7 @@ static int zend_jit_hybrid_hot_code_stub(dasm_State **Dst)
*/
static int zend_jit_hybrid_hot_counter_stub(dasm_State **Dst, uint32_t cost)
{
| ENDBR
| mov r0, EX->func
| mov r1, aword [r0 + offsetof(zend_op_array, reserved[zend_func_info_rid])]
| mov r2, aword [r1 + offsetof(zend_jit_op_array_hot_extension, counter)]
Expand Down Expand Up @@ -2362,6 +2373,7 @@ static int zend_jit_hybrid_hot_trace_stub(dasm_State **Dst)

static int zend_jit_hybrid_trace_counter_stub(dasm_State **Dst, uint32_t cost)
{
| ENDBR
| mov r0, EX->func
| mov r1, aword [r0 + offsetof(zend_op_array, reserved[zend_func_info_rid])]
| mov r1, aword [r1 + offsetof(zend_jit_op_array_trace_extension, offset)]
Expand Down Expand Up @@ -3049,6 +3061,7 @@ static int zend_jit_align_func(dasm_State **Dst)

static int zend_jit_prologue(dasm_State **Dst)
{
| ENDBR
if (zend_jit_vm_kind == ZEND_VM_KIND_HYBRID) {
| SUB_HYBRID_SPAD
} else if (GCC_GLOBAL_REGS) {
Expand Down