Skip to content

Fix GH-18136: tracing JIT floating point register clobbering on Windows and ARM64 #18352

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 43 additions & 0 deletions Zend/asm/save_xmm_x86_64_ms_masm.asm
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
.code

; ZEND_API void execute_ex(zend_execute_data *ex)
PUBLIC execute_ex

EXTERN execute_ex_real:PROC

; Assembly wrapper around the real execute_ex function, so that we can
; save the preserved registers when re-entering the VM from JIT code.
; See GH-18136.
execute_ex PROC EXPORT FRAME
; 10 floating points numbers
; 32 bytes shadow space
; 8 bytes to align after the return address
sub rsp, 8*10 + 32 + 8
.allocstack 8*10 + 32 + 8
.endprolog
movsd qword ptr [rsp + 32 + 8*0], xmm6
movsd qword ptr [rsp + 32 + 8*1], xmm7
movsd qword ptr [rsp + 32 + 8*2], xmm8
movsd qword ptr [rsp + 32 + 8*3], xmm9
movsd qword ptr [rsp + 32 + 8*4], xmm10
movsd qword ptr [rsp + 32 + 8*5], xmm11
movsd qword ptr [rsp + 32 + 8*6], xmm12
movsd qword ptr [rsp + 32 + 8*7], xmm13
movsd qword ptr [rsp + 32 + 8*8], xmm14
movsd qword ptr [rsp + 32 + 8*9], xmm15
call execute_ex_real
movsd xmm6, qword ptr [rsp + 32 + 8*0]
movsd xmm7, qword ptr [rsp + 32 + 8*1]
movsd xmm8, qword ptr [rsp + 32 + 8*2]
movsd xmm9, qword ptr [rsp + 32 + 8*3]
movsd xmm10, qword ptr [rsp + 32 + 8*4]
movsd xmm11, qword ptr [rsp + 32 + 8*5]
movsd xmm12, qword ptr [rsp + 32 + 8*6]
movsd xmm13, qword ptr [rsp + 32 + 8*7]
movsd xmm14, qword ptr [rsp + 32 + 8*8]
movsd xmm15, qword ptr [rsp + 32 + 8*9]
add rsp, 8*10 + 32 + 8
ret
execute_ex ENDP

END
9 changes: 9 additions & 0 deletions Zend/zend_vm_execute.h

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

9 changes: 9 additions & 0 deletions Zend/zend_vm_execute.skl
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,19 @@
# pragma GCC optimize("no-gcse")
# pragma GCC optimize("no-ivopts")
#endif
#ifdef _WIN64
/* See save_xmm_x86_64_ms_masm.asm */
void {%EXECUTOR_NAME%}_ex_real(zend_execute_data *ex)
#else
ZEND_API void {%EXECUTOR_NAME%}_ex(zend_execute_data *ex)
#endif
{
DCL_OPLINE

#if defined(__GNUC__) && defined(__aarch64__)
__asm__ __volatile__ (""::: "v8","v9","v10","v11","v12","v13","v14","v15");
#endif

Comment on lines +17 to +20
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this for HYBRID VM?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we do. The reason we don't save registers in HYBRID VM JIT'ed code is that execute_ex does it in HYBRID_JIT_GUARD() before calling handlers, but this doesn't save these registers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. If I remember correctly, the test also originally failed on circleci which is Linux+arm64+gcc so should use hybrid VM.

{%HELPER_VARS%}

{%INTERNAL_LABELS%}
Expand Down
35 changes: 35 additions & 0 deletions ext/opcache/tests/jit/gh18136.phpt
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
--TEST--
GH-18136 (tracing JIT floating point register clobbering on Windows and ARM64)
--EXTENSIONS--
opcache
--INI--
opcache.jit=tracing
opcache.jit_buffer_size=64M
opcache.jit_hot_func=4
opcache.jit_hot_loop=4
--FILE--
<?php
namespace Foo;

function diff($point1, $point2)
{
$a = deg2rad($point1); // Prefixing these with \ also makes the issue go away
$b = deg2rad($point2);
return $a - $b;
}

function getRawDistance()
{
$distance = 0;
for ($p = 0; $p < 200; $p++) {
// Needs to be a namespaced call_user_func call to reproduce the issue (i.e. adding \ at front makes the issue go away)
$distance += call_user_func('Foo\diff', 0, $p);
}

return $distance;
}

var_dump(getRawDistance());
?>
--EXPECT--
float(-347.3205211468715)
11 changes: 10 additions & 1 deletion win32/build/config.w32
Original file line number Diff line number Diff line change
Expand Up @@ -267,14 +267,23 @@ if (TARGET_ARCH == 'arm64') {
DEFINE('FIBER_ASM_FLAGS', '/DBOOST_CONTEXT_EXPORT=EXPORT /nologo /c /Fo');
}

ADD_FLAG('ASM_OBJS', '$(BUILD_DIR)\\Zend\\jump_' + FIBER_ASM_ABI + '.obj $(BUILD_DIR)\\Zend\\make_' + FIBER_ASM_ABI + '.obj');
var all_asm_objs = '$(BUILD_DIR)\\Zend\\jump_' + FIBER_ASM_ABI + '.obj $(BUILD_DIR)\\Zend\\make_' + FIBER_ASM_ABI + '.obj';
if (TARGET_ARCH == 'x64') {
all_asm_objs += ' $(BUILD_DIR)\\Zend\\save_xmm_x86_64_ms_masm.obj';
}
ADD_FLAG('ASM_OBJS', all_asm_objs);

MFO.WriteLine('$(BUILD_DIR)\\Zend\\jump_' + FIBER_ASM_ABI + '.obj: Zend\\asm\\jump_' + FIBER_ASM_ABI + '.asm');
MFO.WriteLine('\t$(PHP_ASSEMBLER) $(FIBER_ASM_FLAGS) $(BUILD_DIR)\\Zend\\jump_$(FIBER_ASM_ABI).obj Zend\\asm\\jump_$(FIBER_ASM_ABI).asm');

MFO.WriteLine('$(BUILD_DIR)\\Zend\\make_' + FIBER_ASM_ABI + '.obj: Zend\\asm\\make_' + FIBER_ASM_ABI + '.asm');
MFO.WriteLine('\t$(PHP_ASSEMBLER) $(FIBER_ASM_FLAGS) $(BUILD_DIR)\\Zend\\make_$(FIBER_ASM_ABI).obj Zend\\asm\\make_$(FIBER_ASM_ABI).asm');

if (TARGET_ARCH == 'x64') {
MFO.WriteLine('$(BUILD_DIR)\\Zend\\save_xmm_x86_64_ms_masm.obj: Zend\\asm\\save_xmm_x86_64_ms_masm.asm');
MFO.WriteLine('\t$(PHP_ASSEMBLER) $(FIBER_ASM_FLAGS) $(BUILD_DIR)\\Zend\\save_xmm_x86_64_ms_masm.obj Zend\\asm\\save_xmm_x86_64_ms_masm.asm');
}

ADD_FLAG("CFLAGS_BD_ZEND", "/D ZEND_ENABLE_STATIC_TSRMLS_CACHE=1");
if (VS_TOOLSET && VCVERS >= 1914) {
ADD_FLAG("CFLAGS_BD_ZEND", "/d2FuncCache1");
Expand Down
Loading