Skip to content

JIT tracing - thundering herd causes lockup after opcache reset #11609

Closed
@maaarghk

Description

@maaarghk

Description

we've spotted some bad behaviour around the locking the shared opcache memory on an 8.2 site (php-fpm) and unfortunately we can't reproduce it locally, only on production, so it's going to be difficult to get much more info than this - we've just left opcache disabled for now to avoid downtime in prod. i realise that there's already been a big flamewar chat about whether or not the jit code is actually going to see any fixes and maybe we should just disable it but i don't see any reason not to report a known issue in the mean time (may be just a PR needs reversed out).

after calling opcache_reset after a deploy, we occasionally end up with a situation where all php-fpm worker processes end up in a loop of locking and unlocking the SHM. the strace output looks like this:

20:53:12 fcntl(3</tmp/.ZendSem.Le8Hvi (deleted)>, F_SETLKW, {l_type=F_WRLCK, l_whence=SEEK_SET, l_start=0, l_len=1}) = 0 <0.008294>
20:53:12 fcntl(3</tmp/.ZendSem.Le8Hvi (deleted)>, F_SETLK, {l_type=F_UNLCK, l_whence=SEEK_SET, l_start=0, l_len=1}) = 0 <0.000064>
20:53:12 fcntl(3</tmp/.ZendSem.Le8Hvi (deleted)>, F_SETLKW, {l_type=F_WRLCK, l_whence=SEEK_SET, l_start=0, l_len=1}) = 0 <0.681358>
20:53:13 fcntl(3</tmp/.ZendSem.Le8Hvi (deleted)>, F_SETLK, {l_type=F_UNLCK, l_whence=SEEK_SET, l_start=0, l_len=1}) = 0 <0.000054>
20:53:13 fcntl(3</tmp/.ZendSem.Le8Hvi (deleted)>, F_SETLKW, {l_type=F_WRLCK, l_whence=SEEK_SET, l_start=0, l_len=1}) = 0 <0.050460>
20:53:13 fcntl(3</tmp/.ZendSem.Le8Hvi (deleted)>, F_SETLK, {l_type=F_UNLCK, l_whence=SEEK_SET, l_start=0, l_len=1}) = 0 <0.000421>
20:53:13 fcntl(3</tmp/.ZendSem.Le8Hvi (deleted)>, F_SETLKW, {l_type=F_WRLCK, l_whence=SEEK_SET, l_start=0, l_len=1}) = 0 <0.012670>
20:53:13 fcntl(3</tmp/.ZendSem.Le8Hvi (deleted)>, F_SETLK, {l_type=F_UNLCK, l_whence=SEEK_SET, l_start=0, l_len=1}) = 0 <0.000067>
20:53:13 fcntl(3</tmp/.ZendSem.Le8Hvi (deleted)>, F_SETLKW, {l_type=F_WRLCK, l_whence=SEEK_SET, l_start=0, l_len=1}) = 0 <0.037775>
20:53:13 fcntl(3</tmp/.ZendSem.Le8Hvi (deleted)>, F_SETLK, {l_type=F_UNLCK, l_whence=SEEK_SET, l_start=0, l_len=1}) = 0 <0.000056>
20:53:13 fcntl(3</tmp/.ZendSem.Le8Hvi (deleted)>, F_SETLKW, {l_type=F_WRLCK, l_whence=SEEK_SET, l_start=0, l_len=1}) = 0 <0.004498>
20:53:13 fcntl(3</tmp/.ZendSem.Le8Hvi (deleted)>, F_SETLK, {l_type=F_UNLCK, l_whence=SEEK_SET, l_start=0, l_len=1}) = 0 <0.000085>

after this occurred I asked our hosting company to SIGABRT all the php-fpm worker processes, to get coredumps. these aren't debug builds but we do have symbols and i've been able to draw some conclusions.

99% of the processes have a backtrace like this:

(gdb) bt
#0  0x00007f0f7a90a974 in fcntl64 () from libc.so.6
#1  0x00007f0f79451129 in zend_shared_alloc_lock () from opcache.so
#2  0x00007f0f79458a98 in zend_jit_stop_persistent_op_array () from opcache.so
#3  0x00007f0f794df510 in zend_jit_trace_hot_root.part () from opcache.so
#4  0x000055fb59200647 in ?? () < note - seems to be inside jit memory map
#5  0x00007f0f6f117b30 in ?? ()
#6  0x000055fb4d269898 in ?? () < note - seems to be inside opcache memory map

with one like this:

#0  0x00007f0f7a90f510 in ?? () from libc.so.6
#1  0x00007f0f7a90a944 in fcntl64 () from libc.so.6
#2  0x00007f0f7945119e in zend_shared_alloc_unlock () from opcache.so
#3  0x00007f0f794df5f0 in zend_jit_trace_hot_root.part () from opcache.so
#4  0x000055fb59200647 in ?? ()
#5  0x0000000000000001 in ?? ()
#6  0x000055fb53bed058 in ?? ()

I've taken a look in the core code and this does seem slightly wild:

within zend_jit_trace_hot_root:

	if (ZEND_JIT_TRACE_NUM >= JIT_G(max_root_traces)) {
		stop = ZEND_JIT_TRACE_STOP_TOO_MANY_TRACES;
		zend_jit_stop_counter_handlers();
		goto abort;
	}

zend_jit_stop_counter_handlers is a for loop in a for loop:

	for (uint32_t i = 0; i < ZCSG(hash).max_num_entries; i++) {
		zend_accel_hash_entry *cache_entry;
		for (cache_entry = ZCSG(hash).hash_table[i]; cache_entry; cache_entry = cache_entry->next) {
			zend_persistent_script *script;
			if (cache_entry->indirect) continue;
			script = (zend_persistent_script *)cache_entry->data;
			zend_jit_stop_persistent_script(script);
		}
	}

i'm certainly no expert (i.e. i've got no idea whatsoever) regarding the contents of ZCSG(hash) but it appears that if a loop is being done over max_num_entries AND cache_entry->next is being used to traverse a list, then there's some kind of logarithmic complexity thing going on here where a cache_entry could be visited multiple times if it is pointed to by next of some previous cache_entry?

anyway, the main issue: within this loop, zend_jit_stop_persistent_script is called -

static void zend_jit_stop_persistent_script(zend_persistent_script *script)
{
	zend_class_entry *ce;
	zend_op_array *op_array;

	zend_jit_stop_persistent_op_array(&script->script.main_op_array);

	ZEND_HASH_FOREACH_PTR(&script->script.function_table, op_array) {
		zend_jit_stop_persistent_op_array(op_array);
	} ZEND_HASH_FOREACH_END();

	ZEND_HASH_FOREACH_PTR(&script->script.class_table, ce) {
		ZEND_HASH_FOREACH_PTR(&ce->function_table, op_array) {
			if (op_array->type == ZEND_USER_FUNCTION) {
				zend_jit_stop_persistent_op_array(op_array);
			}
		} ZEND_HASH_FOREACH_END();
	} ZEND_HASH_FOREACH_END();
}

so here we have the main op array being stopped, followed by the function table array being stopped, followed by the class methods table being stopped(?). zend_jit_stop_persistent_op_array more or less wraps zend_jit_stop_hot_trace_counters, which locks the SHM, updates the function pointer back from jit to vm, and then unlocks the SHM.

on our box with 100 or so worker processes each having entered an up-to-quadruple-nested foreach loop, with the absolute innermost loop of each calling unlock / lock, potentially over a file which has many methods and functions, the end result is absolute chaos. the loop tends quickly towards taking minutes to complete as the number of setlkws queued up reaches thousands. in the short strace snippet posted above you can see an 0.6s wait for example. imagine that, per function?

notably, we seemed to trigger this whenever we flushed opcache. so possibly we're seeing some kind of thundering herd thing here. i see the ZEND_JIT_TRACE_NUM >= JIT_G(max_root_traces) condition being checked; maybe there is a situation where several threads at the same time add a script to the jit table and then they all try to reverse out their own work as they fill it up? I am really not familiar with jit even as a concept so I'm just guessing based on the fact we were at max worker processes and under load whenever we saw this issue.

this nested foreach thing was introduced here: #9343 and an alternative solution was proposed here: #9964

@dstogov - the first was accepted but the second not, with the second having the comment that it might cause "too many locks"? maybe worth reconsidering both of the above together with this issue report to find some better solution, assuming that improvements and bugfixes are currently permitted on the jit code.

ps: I don't believe this is a duplicate of any other issue that I can find; unlike in many other issues we do not have segfaults, just persistent lock waits.

PHP Version

8.2.5 - 7 / amd64 / linux / fpm sapi

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions