Jit arm64 #6982

dstogov · 2021-05-14T09:35:40Z

This is almost complete implementation of ARM64 back-end for PHP/JIT.

The JIT provides similar to x86[_64] speed improvement on Kunpeng 920 CPU (8-Cores, ARMv8, 2.60 GHz).
The following table shows execution time in seconds.

	jit=0	jit=function	jit=tracing
bench.php	0.387	0.152	0.130
micro_bench.php	2.278	1.521	1.273
Mandelbrot	0.074	0.033	0.014

In the current state all *.phpt tests are passed, symfony_demo is able to run with both functional and tracing JIT.

There are still some problems with long jumps patching in tracing JIT (it's limited to 1 MB for conditional and 128MB for unconditional branches). We are going to solve this soon.

Anyway, this is ready for review and seems good enough to be merged into master.

SUMMARY We implemented a prototype of PHP JIT/arm64. Briefly speaking, 1. build system Changes to the build system are made so that PHP JIT can be successfully built and run on ARM-based machine. Major change lies in file zend_jit_arm64.dasc, where the handler for each opcode is generated into machine code. Note that this file is just copied from zend_jit_x86.dasc and the *unimplemented* parts are substitued with 'brk' instruction for future work. 2. registers AArch64 registers are defined in file zend_jit_arm64.h. From our perspectives, the register usage is quite different from the x86 implementation due to the different ABI, number of registers and addressing modes. We had many confusions on this part, and will discuss it in details in the final section. 3. opcodes Several opcodes are partially supported, including INIT_FCALL, DO_UCALL, DO_ICALL, RETURN, ADD, PRE_INC, JMP, QM_ASSIGN, etc. Hence, simple use scenarios such as user function call, loops, addition with integer and floating point numbers can be supported. 18 micro test cases are added under 'ext/opcache/tests/jit/arm64/'. Note that majority of these test cases are design for functional JIT, and cases 'hot_func_*.phpt' and 'loop_002.phpt' can trigger tracing JIT. 4. test Our local test environment is an ARM-based server with Ubuntu 20.04 and GCC-10. Note that both HYBRID and CALL VM modes are supported. We suggest running the JIT test cases using the following command. Out of all 130 test cases, 66 cases can be passed currently. ``` $ make test TESTS='-d opcache.jit=1203 ext/opcache/tests/jit/' ``` DETAILS 1. I-cache flush Instruction cache must be flushed for the JIT-ed code on AArch64. See macro JIT_CACHE_FLUSH in file 'zend_jit_internal.h'. 2. Disassembler Add initialization and jump target parse operations for AArch64 backed. See the updates in file 'zend_jit_disasm.c'. 3. redzone Enable redzone for AArch64. See the update in zend_vm_opcodes.h. Redzone is designated to prevent 'vm_stack_data' from being optimized out by compilers. It's worth noting that this 16-byte redzone might be reused as temporary use(treated as extra stack space) for HYBRID mode. 4. stack space reservation The definitions of HYBRID_SPAD, SPAD and NR_SPAD are a bit tricky for x86/64. In AArch64, HYBRID_SPAD and SPAD are both defined as 16. These 16 bytes are pre-allocated for tempoerary usage along the exuection of JIT-ed code. Take line 4185 in file zend_jit_arm64.dasc as an example. NR_SPAD is defined as 48, out of which 32 bytes to save FP/IP/LR registers. Note that we choose to always reserve HYBRID_SPAD bytes in HYBRID mode, no matter whether redzone is used or not, for the sake of safety. 5. stack alignment In AArch64 the stack pointer should be 16-byte aligned. Since shadow stack is used for JIT, it's easy to guarantee the stack alignment, via simply moving SP with an offset like 16 or a multiple of 16. That's why NR_SPAD is defined as 48 and we use 32 of them to save FP/IP/LR registers which only occupies 24 bytes. 6. global registers x27 and x28 are reserved as global registers. See the updates in file zend_jit_vm_helpers.c 7. function prologue for CALL mode Two callee-saved registers x27 and x28 should saved in function zend_jit_prologue() in file zend_jit_arm64.dasc. Besides the LR, i.e. x30, should also be saved since runtime C helper functions(such as zend_jit_find_func_helper) might be invoked along the execution of JIT-ed code. 8. regset Minor changes are done to regset operations particularly for AArch64. See the updates in file zend_jit_internal.h. REGISTER USAGE In this section, we will first talk about our understanding on register usage and then demonstrate our design. 1. Register usage for HYBRID/CALL modes Registers are used similarly between HYBRID mode and CALL mode. One difference is how FP and IP are saved. In HYBRID mode, they are assigned to global registers, while in CALL mode they are saved/restored on the VM stack explicitly in prologue/epilogue. The other difference is that LR register should also be saved/restored in CALL mode since JIT-ed code are invoked as normal functions. 2. Register usage for functional/tracing JIT The way registers are used differs a lot between functional JIT and tracing JIT. For functional JIT, runtime C code (e.g. helper functions) would be invoked along the execution of JIT-ed code. As the operands for *most* opcodes are accessed via the stack slot, i.e. FP + offset. Hence there is no need to save/restore local(caller-saved) registers before/after invoking runtime C code. Exception lies in Phi node and registers might be allocated for these nodes. Currently I don't fully understand the reason, why registers are allocated for Phi functions, because I suppose for different versions of SSA variables at the Phi function, their postions on the stack slot should be identical(in other words, access via the stack slot is enough and there is no need to allocate registers). For tracing JIT, runtime information are recorded for traces(before the JIT compilation), and the data types and control flows are concrete as well. Hence it's would be faster to conduct operations and computations via registers rather than stack slots(as functional JIT does) for these collected hot paths. Besides, runtime C code can be invoked for tracing JIT, however this only happends for deoptimization and all registers are saved to stack in advance. 3. Candidates for register allocator 1) opcode candidates Function zend_jit_opline_supports_reg() determines the candidate opcodes which can use CPU registers. 2) register candidates Registers in set "ZEND_REGSET_FP + ZEND_REGSET_GP - ZEND_REGSET_FIXED - ZEND_REGSET_PRESERVED" are available for register allocator. Note that registers from ZEND_REGSET_FIXED are reserved for special purpose, such as the stack pointer, and they are excluded from register allocation process. Note that registers from ZEND_REGSET_PRESERVED are callee-saved based on the ABI and it's safe to not use them either. 4. Temporary registers Temporary registers are needed by some opcodes to save intermediate computation results. 1) Functions zend_jit_get_def_scratch_regset() and zend_jit_get_scratch_regset() return which registers might be clobbered by some opcodes. Hence register allocator would spill these scratch registers if necessary when encountering these opcodes. 2) Macro ZEND_REGSET_LOW_PRIORITY denotes a set of registers which would be allocated with low priority, and these registers can be used as temporary usage to avoid conflicts to its best. 5. Compared to the x86 implementation, in JIT/arm64 1) Called-saved FP registers are included into ZEND_REGSET_PRESERVED for AArch64. 2) We follow the logic of function zend_jit_opline_supports_reg(). 3) We reserve 4 GPRs and 2 FPRs out from register allocator and use them as temporary registers in particular. Note that these 6 registers are included in set ZEND_REGSET_FIXED. Since they are reserved, may-clobbered registers can be removed for most opcodes except for function calls. Besides, low-priority registers are defined as empty since all candidate registers are of the same priority. See the updates in function zend_jit_get_scratch_regset() and macro ZEND_REGSET_LOW_PRIORITY. 6. Why we reserve registers for temporary usage? 1) Addressing mode in AArch64 needs more temporary registers. The addressing mode is different from x86 and tempory registers might be *always* needed for most opcodes. For instance, an immediate must be first moved into one register before storing into memory in AArch64, whereas in x86 this immediate can be stored directly. 2) There are more registers in AArch64. Compared to the solution in JIT/x86(that is, temporary registers are reserved on demand, i.e. different registers for different opcodes under different conditions), our solution seems a coarse-granularity and brute-force solution, and the execution performance might be downgraded to some extent since the number of candidate registers used for allocation becomes less. We suppose the performance loss might be acceptable since there are more registers in AArch64. 3) Based on my understanding, scratch registers defined in x86 are excluded from candidates for register allocator with *low possibility*, and it can still allocate these registers. Special handling should be conducted, such as checking 'reg != ZREG_R0'. Hence, as we see it, it's simpler to reserve some temporary registers exclusively. See the updates in function zend_jit_math_long_long() for instance. TMP1 can be used directly without checking. Co-Developed-by: Nick Gasson <Nick.Gasson@arm.com>

1. one **hybrid** solution of register usage After the discussion with Dmitry, we may want to propose one hybrid solution of register usage. 1) Following the x86 implementation, we define REG0/1/2 to be the scratch registers. Clever tricks are utilized in x86 implementation for better register allocation. Note that we define REG0/1/2 as x8/9/10. One reason is that R0 and FCARG1 should be distinguished. 2) Temporary registers are also reserved(i.e. they are excluded from the candidates of register allocator), and they would be used due to the different addressing modes in AArch64. 2. update the 'make clean' target. 3. remove the unnecessary AArch64 related macros in zend_jit_internal.h. [ci skip] Change-Id: I627157b88b2344530d705751eb7f73a223ed83e5 CustomizedGitHooks: yes

Reference is involved in this test case, i.e. "$ref2 = & $ref1;". 1. Fix one bug in zend_do_fcall(). For each stack slot, the type information gets initialized during the call frame allocation phase. Opcode ZEND_ASSIGN_REF is associated to this statement. It's worth noting that PHP JIT doesn't apply to this opcode actually. That means the original handler(i.e. interpreter version) will be invoked at runtime. Note that this mode works for a number of opcodes, not only ZEND_ASSIGN_REF. In the execution of original handler, the runtime type information of $ref2 is accessed and this bug is triggered. 2. Support macros GET_Z_PTR and ZVAL_DEREF. 3. Cover new paths in function zend_jit_simple_assign() and macro ZVAL_COPY_CONST.

Following the previous patch, we continue to support failed JIT test cases involving reference. In assign_010.phpt, major changes are done to support the assignment "$a = $b" where "$b" is a reference. Honestly speaking, I didn't fully understand the syntax here but rather to translate the x86 implementation into AArch64. Besides, test case assign_011.phpt would pass as well with this patch.

Support the case where arguments might be reference. Besides, another two test cases, assign_019.phpt and assign_032.phpt, would pass as well with this patch.

This patch is trivial, supporting the comparion with constant values, i.e. "$i < 2" in this test case.

Support assginment with undefined variable, and a warning would be emitted. Besides, test case assign_023.phpt would pass as well with this patch.

Major changes are made to support statement "$a[0] = $unref", where opcode ASSIGN_DIM is involved. Besides, one bug in macro GC_DELREF is fixed. The reference count would be further checked after decreasing in macro ZVAL_PTR_DTOR, hence, instruction "subs" should be used to set the flags. After fixing this bug, external function zend_jit_array_free() is used as the dtor for the array "$a".

Major changes are: 1. Support opcode FETCH_DIM_W for "$arr[0][0] = $ref;" in the loop. See the updates in function zend_jit_fetch_dim(). 2. Spill the registers and store the values into memory. See the updates in function zend_jit_spill_store(). This is done for Phi function. 3. Invoke function zend_array_destory() as dtor for arrays. This is done by zend_jit_free_cv() when leaving the function foo().

For statement "$a = new stdClass;", opcode NEW is used and JIT would invoke the original handler at runtime. Our major changes are made to support statements "$a->a=1;" and "$a->b=2;" where opcode ASSIGN_OBJ are used.

This test case covers one new path in macro TRY_ADDREF, touching macro GC_ADDREF for the first time.

There are 6 user function calls in this test cases. The first 3 functions, i.e. foo(), foo1() and foo2(), can be supported already. In this patch, we mainly focus on foo3(). Note that based on my test, once foo3() gets supported, the remaining functions foo4() and foo5() can pass as well. Regarding function foo3(), we mainly focus on statement "$array = new ArrayObject();", and the following two opcodes are involved. 0009 V2 = NEW 0 string("ArrayObject") 0010 DO_FCALL Accordingly, functions zend_jit_handler(), zend_jit_cond_jmp() and zend_jit_do_fcall() are invoked to generate the machine code. See the handling process for case ZEND_NEW at file zend_jit.c. Hence, major changes in this patch are made to support this statement. Note that the updates at line 4840 in function zend_jit_do_fcall() are made to support the later internal function call, i.e. var_dump(). Note that another test "noval_001.phpt" would pass with this patch as well.

For function Foo(), the original handlers would be invoked for the first two statements. And the third statement "$a = 42", where ASSIGN opcode is involved, covers the cold code in function zend_jit_assign_to_variable(). For function $main(), statement "var_dump(Foo::$prop);" covers a new path in function zend_ jit_send_val() for SEND_VAL opcode. Besides, another 2 test cases, i.e. fetch_dim_r_003.phpt and fetch_dim_r_004.phpt, would pass as well with this patch.

This patch mainly supports the opcode FETCH_OBJ_R for statement "$a->result = "okey";".

1. For statement "echo $a->test()", opcode INIT_METHOD_CALL is involved. The updates in function zend_jit_init_method_call() and zend_jit_push_call_frame() are made to support it. 2. The updates in function zend_jit_leave_func() are made to support the RETURN opcode used in functions $closure and $test. 3. The updates in function zend_jit_assign_to_variable() are used to support statement "$x = $arr". 4. The updates in function zend_jit_fetch_dimension_address_inner() and zend_jit_simple_assign() are made to support statement "$x['a'] = $closure()", where opcode ASSIGN_DIM is involved.

1. For statement "$a->change($a = array("a" => range(1, 5)));", the following opcodes will be generated: 0002 ASSIGN CV0($a) V1 0003 INIT_METHOD_CALL 1 CV0($a) string("change") 0004 INIT_NS_FCALL_BY_NAME 2 string("A\range") 0005 SEND_VAL_EX int(1) 1 0006 SEND_VAL_EX int(5) 2 0007 V1 = DO_FCALL_BY_NAME The updates in function zend_jit_init_fcall(), zend_jit_send_val() and zend_jit_do_fcall() are made to support INIT_NS_FCALL_BY_NAME, SEND_VAL_EX and DO_FCALL_BY_NAME respectively. 2. For method $change(), opcode RECV is used to obtain the argument: 0000 #1.CV0($config) [rc1, rcn, array of [any, ref]] = RECV 1 Accordingly the updates in functions zend_jit_recv() and zend_jit_verify_arg_type() are made. 3. For statement "array_keys($config["a"])", the following opcodes will be generated: 0001 INIT_NS_FCALL_BY_NAME 1 string("A\array_keys") 0002 CHECK_FUNC_ARG 1 0003 #3.V1 [ref, rc1, rcn, any] = FETCH_DIM_FUNC_ARG #1.CV0($config) ... -> #2.CV0($config) [rc1, rcn, ... 0004 SEND_FUNC_ARG #3.V1 [ref, rc1, rcn, any] 1 0005 #4.V1 [ref, rc1, rcn, any] = DO_FCALL_BY_NAME CHECK_FUNC_ARG and SEND_FUNC_ARG are not supported before. See the updates in functions zend_jit_check_func_arg() and zend_jit_send_var(). Besides, a new path is covered in macro OBJ_RELEASE when leaving.

The opcodes for function $foo are: 0001 INIT_FCALL 1 96 string("var_dump") 0002 #2.T1 [null, long] = FETCH_DIM_R array(...) #1.CV0($n) [...] 0003 SEND_VAL #2.T1 [null, long] 1 0004 DO_ICALL 0005 RETURN null Opcode FETCH_DIM_R is not touched before, and the updates in function zend_jit_fetch_dim_read() are made to support it. As different types of arguments are used for $foo, several cases in function zend_jit_fetch_dimension_address_inner() are covered as well. Besides, opcode DO_ICALL can reach one site of cold code in function zend_jit_do_fcall().

Opcode FETCH_DIM_RW is not touched before and the udpates in function zend_jit_fetch_dim() and zend_jit_fetch_dimension_address_inner() are made to support it. Besides, one new path is covered in function zend_jit_return() when leaving.

Opcode ASSIGN_OBJ is generated for statement "$x->a = 1;" and one new path in function zend_jit_assign_obj() is covered. Note that function zend_jit_assign_to_variable_call() is invoked along this new path. Besides, helper function zend_objects_store_del() is used as the dtor for objects.

One new path is covered inside function zend_jit_fetch_obj() due to the use of FETCH_OBJ_R opcode. Note that function zend_jit_zval_copy_deref() is invoked along this new path. Updates in function zend_jit_free() are made to support FREE opcode. Stub function zend_jit_leave_function_stub() is touched for the first time.

Opcode ASSIGN_OBJ_OP is used for statement "$x->a += 2;". The updates in function zend_jit_assign_obj_op() are made to support this opcode.

…ode-generation (1204/1205)

Range checks are needed before encoding them into AArch64 instructions as immediates.

Instruction is misused. 'dword', i.e. 32 bits, are loaded from memory. Hence, 'ldr' should be used rather than 'ldrh'.

This test case is a big one. Major changes are: 1. statement "foo($obj->a)" One new path is covered in function zend_jit_fetch_obj() for the involved FETCH_OBJ_W opcode. See the update around label 5. Opcode SEND_REF is used. The updates in function zend_jit_send_ref() are made to support it. Note that macro FREE_OP is executed for the first time. Temproray registers are passed since they are used inside. As a result, its use sites are updated accordingly. 2. statement "$a = array()" in $foo2 One new path in function zend_jit_assign_to_variable() is covered. 3. statements involving variable $d in $bar One new path in function zend_jit_fetch_obj() is covered. See the updates around label 7. Note that in macro EMALLOC, condition ZEND_DEBUG can be covered by DEBUG build, i.e. "./configure --enable-debug".

Comparison between LONG and DOUBLE is (partially) supported in a similar way to comparison between two LONG values. See the updates in function zend_jit_cmp(). Key difference lies in handling NaN. 1. Instruction 'fcmp' is used to substitue 'ucomisd' in x86 implementation. Both of them raise invalid operation exception only when either source operand is an SNaN.[1][2] 2. Parity flag is used in x86 to check whether either operand is NaN.[3] I think this is QNaN case. As for AArch64, we use instruction 'bvs'.[4] It's worthing noting that condition codes have different meanings for floating-point comparions(e.g. 'fcmp')[4] compared to the general-purpose comparisons(e.g. 'cmp').[5] For instance, 'b.hs' after 'fcmp' can check not only the cases "greater than, equal to" but also the case "unordered"(that is NaN). We may simply treat it as a combination of 'jae' and 'jp' in x86. 3. Instruction 'SETcc' is used in x86 for the case of ">=" or ">". Note that flag "swap" is set in implementation, and it falls into cases ZEND_IS_SMALLER or ZEND_IS_SMALLER_OR_EQUAL. We can use 'cset' in AArch64. However, it's weird that the NaN check is missing in x86. I suppose it might be a bug. Take the case ">=" as an example. The two operands can be either DOUBLE + LONG or DOUBLE + DOUBLE. See the relevant code where flag "swap" is set(i.e. function zend_jit_cmp_double_long() and function zend_jit_cmp_double_double()). For the case "NaN >= 1.0", the expected result should be FALSE, however, JIT/x86 would produce TRUE due to the following "setae al". Unfortunately I haven't constructed one test case to trigger that. In our implementation, we choose to follow the case of "<" or "<=", and I believe our implementation is safe anyway.. 4. Temporary FP register is also needed and we reserve v16. See the updates in file zend_jit_arm64.h. 5. Macro SET_ZVAL_TYPE_INFO_FROM_REG is misused in function zend_jit_zval_copy_deref(). The second argument should be 32-bit long and we fix it. Note that simple test cases involving NaN are tested locally. I believe it would get deeper testing by cmp_003.phpt(we will support it later). [1] https://developer.arm.com/documentation/dui0204/f/vector-floating-point-programming/vfp-instructions/fcmp?lang=en [2] https://www.felixcloutier.com/x86/ucomisd [3] https://en.wikipedia.org/wiki/Parity_flag [4] https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-4-floating-point-comparisons-using-vfp [5] https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-1-condition-flags-and-codes

'smart_branch_opcode' JMPZ is used in this test case. Similar to the previous patch, I still didn't get why NaN check is missing for the cases ">" and ">=". In our implementation, we add such checks.

The following opcodes would be generated for $foo: 0000 #2.CV0($test) [bool] RANGE[0..1] = RECV 1 0001 #3.CV1($x) [long] RANGE[MIN..MAX] = RECV 2 0002 JMPZ #2.CV0($test) [bool] RANGE[0..1] BB4 0003 #4.T2 [bool] ... = IS_SMALLER_OR_EQUAL int(1) #3.CV1($x) ... 0004 JMP BB5 ... The updates in function zend_jit_verify_arg_type() are made to support RECV opcode. The updates in function zend_jit_bool_jmpznz() are made to support JMPZ opcode. New path is covered in functions zend_jit_cmp() and zend_jit_cmp_long_long() for IS_SMALLER_OR_EQUAL opcode.

This test case is a big one. This patch mainly handles smart_branch_opcode cases in function zend_jit_cmp_double_common(). Note that I failed to construct test cases to verify whether the missing NaN check in x86 is buggy or not. One TODO is left to remind us when the relevant code is touched.

Stack frame description is not accurate, so backtraces that involved JIT-ed code may be brocken. Disassemble and breakpoints on JIT-ed code work fine.

…ug=0x001) and with (opcache.jit_debug=0x401) for both ARM and x86.

…P2 and TMP3.

… be used by deoptimizer.

…m long jumps.

This makes symfony_demo app working with -d opcacge.jit=1205. (21 MB of JIT-ed code).

…case. This fixes ext/opcache/tests/jit/inc_021.phpt with tracing JIT.

build with -d opcache.jit=1254 -d opcache.jit_hot_loop=1 -d opcache.jit_hot_func=1 -d opcache.jit_hot_return=1 -d opcache.jit_hot_side_exit=1)

nikic

I think this looks okay at a high level -- I haven't looked at the arm64.dasc implementation in detail.

A general concern I have is that people working on x86 will not be able to easily make the ARM JIT work with their changes. Probably, it would fall to someone else to update it. Though it may help to include some instructions on how to setup cross-compilation and testing for ARM somewhere, as you already figured out how to do that...

ext/opcache/jit/zend_jit_internal.h

ext/opcache/jit/zend_jit.c

ext/opcache/jit/zend_jit_x86.dasc

ext/opcache/jit/zend_jit_gdb.c

ext/opcache/jit/zend_jit_arm64.dasc

nikic · 2021-05-18T08:35:04Z

ext/opcache/jit/zend_jit_arm64.dasc

+|.macro LOAD_ADDR, reg, addr
+|	// 48-bit virtual address
+||	if (((uintptr_t)(addr)) == 0) {


Suggested change

|.macro LOAD_ADDR, reg, addr

| // 48-bit virtual address

|| if (((uintptr_t)(addr)) == 0) {

|.macro LOAD_ADDR, reg, ptr

| // 48-bit virtual address

| uintptr_t addr = (uintptr_t) ptr;

|| if (addr == 0) {

etc. As this is repeated so many times, I think this would be nicer.

I think it would also be good to assert here that the top 16 bit are zero, as the code assumes this. (It looks like newer ARM versions have extended address space from 2^48 to 2^52.)

@shqking what do you suggest?

Thanks for spotting that. Yes. I agree that one assertion should be added here. I will update it soon.
52-bit virtual address is supported in newer ARMv8.2-LVA (See https://opensource.com/article/20/12/52-bit-arm64-kernel). However to the best of my knowledge, there are few such chips in the market.
Java JDK currently only supports 48-bit virtual address, and I suppose putting an assertion might be one good option for now and we may want to support 52-bit address when needed in the future.

But I don't think it would be a good solution to introduce a new variable inside the macro, i.e. uintptr_t addr = (uintptr_t) ptr;, which would lead to redefinition of addr for consecutive uses of macro LOAD_ADDR.

From my view, it's a problem to conduct the type conversion inside the definition of macro, or at all the callers of this macro. @dstogov May I have your opinion here?

@shqking I afraid, assertion may conflict with "address tagging technology" (the high byte of the address may be ignored).

Introducing new variable would require wrapping macro into do { ... } while (0) block, or even C function. I prefer to think about this later.

Yes. Agree. The higher bits might be used by some security features, e.g., MTE, PAC.

nikic · 2021-05-18T08:37:25Z

ext/opcache/jit/zend_jit_arm64.dasc

+|.endmacro
+
+|.macro LOAD_64BIT_VAL, reg, val
+||	if (((uint64_t)(val)) == 0) {


Same here, I would add uint64_t uval = (uint64_t) val to reduce noise.

@nikic What noise do you mean? I don't see any warnings.
I agree with your suggestion, but I would prefer to change this after merging into master.

((( Lisp noise :) But yes, no particular need to address before merging.

ext/opcache/jit/zend_jit_arm64.dasc

dstogov · 2021-05-18T10:47:48Z

I think this looks okay at a high level -- I haven't looked at the arm64.dasc implementation in detail.

A general concern I have is that people working on x86 will not be able to easily make the ARM JIT work with their changes. Probably, it would fall to someone else to update it. Though it may help to include some instructions on how to setup cross-compilation and testing for ARM somewhere, as you already figured out how to do that...

I'll publish an instruction at wiki.php.net

dstogov · 2021-05-18T12:34:19Z

Rebased and merged into master.

dstogov · 2021-05-18T14:37:39Z

@nikic https://wiki.php.net/internals/aarch64-cross-compile

nikic · 2021-05-18T17:36:55Z

It looks like there are some compiler warnings related to format strings: https://travis-ci.com/github/php/php-src/jobs/506179707#L1953

dstogov · 2021-05-18T18:15:33Z

Thanks for pointing. This is because we use libcapstone for disassemble on AArch64 (it also provides better output for x86). It seems, I forgot to test AArch64 without capstone and libudis86 doesn't support ARM of course. I'll try to fix this in an hour.

dstogov · 2021-05-19T10:28:50Z

@nikic ARM64 build without libcapstone should be fixed. I'm not sure how to trigger testing on travis-ci.com

sebpop · 2021-05-19T16:57:15Z

I'm not sure how to trigger testing on travis-ci.com

My colleague @janaknat can help set up travis-ci.com testing on Graviton2 arm64.

janaknat · 2021-05-19T22:25:54Z

I've created a PR to test with Graviton2 on Travis CI: #7016

shqking and others added 30 commits May 14, 2021 11:04

Support failed JIT test case: assign_012.phpt

653de5d

Support the case where arguments might be reference. Besides, another two test cases, assign_019.phpt and assign_032.phpt, would pass as well with this patch.

Support failed JIT test case: assign_027.phpt

d7cfa32

This patch is trivial, supporting the comparion with constant values, i.e. "$i < 2" in this test case.

Support failed JIT test case: assign_024.phpt

ba9bc3a

Support assginment with undefined variable, and a warning would be emitted. Besides, test case assign_023.phpt would pass as well with this patch.

Support failed JIT test case: assign_026.phpt

d8d7932

For statement "$a = new stdClass;", opcode NEW is used and JIT would invoke the original handler at runtime. Our major changes are made to support statements "$a->a=1;" and "$a->b=2;" where opcode ASSIGN_OBJ are used.

Support failed JIT test case: assign_dim_op_001.phpt

de21c29

This test case covers one new path in macro TRY_ADDREF, touching macro GC_ADDREF for the first time.

Support failed JIT test case: assign_036.phpt

5ac51a5

This patch mainly supports the opcode FETCH_OBJ_R for statement "$a->result = "okey";".

Support failed JIT test case: fetch_obj_003.phpt

501a114

Opcode ASSIGN_OBJ_OP is used for statement "$x->a += 2;". The updates in function zend_jit_assign_obj_op() are made to support this opcode.

Replace --SKIPIF-- by --EXTENSIONS--

ecd7b53

JIT/AArch64: INIT_FCALL and DO_FCALL support for optimized function c…

9ac405d

…ode-generation (1204/1205)

Add necessary assertions on range for INIT_FCALL and DO_FCALL

a7fba63

Range checks are needed before encoding them into AArch64 instructions as immediates.

Fix one bug in macro IF_GC_MAY_NOT_LEAK

de06cba

Instruction is misused. 'dword', i.e. 32 bits, are loaded from memory. Hence, 'ldr' should be used rather than 'ldrh'.

Support failed JIT test case: cmp_002.phpt

10ee8b0

'smart_branch_opcode' JMPZ is used in this test case. Similar to the previous patch, I still didn't get why NaN check is missing for the cases ">" and ">=". In our implementation, we add such checks.

dstogov added 13 commits May 14, 2021 11:04

Fixed incorrect range check (missed sign bit)

287d996

Reenable PROFITABILITY_CHECKS

272edad

Implemented AArch64 support for GDB/JIT interface.

dbd0bfc

Stack frame description is not accurate, so backtraces that involved JIT-ed code may be brocken. Disassemble and breakpoints on JIT-ed code work fine.

Fixed JIT memory usage debug info (opcache.jit_debug=0x200)

8ab1f6b

Allow to print JIT assemble without binary addresses (opcache.jit_deb…

f6df3b0

…ug=0x001) and with (opcache.jit_debug=0x401) for both ARM and x86.

Remove unused TMP4. Use intra-precedure call scratch registers for TM…

b61697e

…P2 and TMP3.

Use better code for trace exits. Don't store CPU registers that can't…

416b3bb

… be used by deoptimizer.

Use symbolic labels for BL and ADR instructions

60436b3

Fixed ZTS build.

dc89a2d

DynASM/ARM64: Add abiulity to plug-in generation of veneers to perfor…

9aa463c

…m long jumps.

Generate veneers to perform long jumps.

303ec3c

This makes symfony_demo app working with -d opcacge.jit=1205. (21 MB of JIT-ed code).

JIT/AArch64: disable register allocation for expected ++/-- overflow …

dcbf25f

…case. This fixes ext/opcache/tests/jit/inc_021.phpt with tracing JIT.

Fixed format specifier

cd443ec

dstogov requested a review from nikic May 17, 2021 07:40

dstogov added 3 commits May 17, 2021 10:52

Fix "store to misaligned address" ASAN warnings

835280b

Fixed JIT failure after overflow detection (bench.php failure in 32-bit

0ebf7cc

build with -d opcache.jit=1254 -d opcache.jit_hot_loop=1 -d opcache.jit_hot_func=1 -d opcache.jit_hot_return=1 -d opcache.jit_hot_side_exit=1)

Fixed possible failure when repair after overflow detection

8f6bea4

nikic reviewed May 18, 2021

View reviewed changes

dstogov added 2 commits May 18, 2021 12:41

Remove unnecessary #ifdef

e1e5aed

Fixed zend_long_is_power_of_two/zend_long_floor_log2 mess

7210c31

dstogov closed this May 18, 2021

shqking mentioned this pull request Jul 28, 2021

JIT is not enabled in ARM image KEINOS/Dockerfile_of_PHP8-JIT#24

Open

Jit arm64 #6982

Jit arm64 #6982

Uh oh!

Conversation

dstogov commented May 14, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nikic left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nikic May 18, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

dstogov commented May 18, 2021

Uh oh!

dstogov commented May 18, 2021

Uh oh!

dstogov commented May 18, 2021

Uh oh!

nikic commented May 18, 2021

Uh oh!

dstogov commented May 18, 2021

Uh oh!

dstogov commented May 19, 2021

Uh oh!

sebpop commented May 19, 2021

Uh oh!

janaknat commented May 19, 2021

Uh oh!

Uh oh!

dstogov commented May 14, 2021 •

edited

Loading

nikic May 18, 2021 •

edited

Loading