-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Fix signal leak in max execution timers #13027
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
fc6cc79
to
37d7061
Compare
9de65b1
to
cae3ece
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks mainly good to me, sorry for my multiple review comments instead of reviewing in one pass.
Just what I'm not sure about is whether the signal needs to be cleared before deleting the timer.
It might also warrant a comment in the code that you're doing the signal clearing because the signals stay in the pending state even if the timer is deleted.
I'll defer the final review to Arnaud though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand why the signals are never delivered. Can you clarify?
Zend/zend_max_execution_timer.c
Outdated
@@ -41,6 +43,14 @@ ZEND_API void zend_max_execution_timer_init(void) /* {{{ */ | |||
sev.sigev_signo = SIGRTMIN; | |||
sev.sigev_notify_thread_id = (pid_t) syscall(SYS_gettid); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
zend_max_execution_timer_init
is called from pcntl_fork
. With this change, a new timer will no longer be created. However, it seems syscall(SYS_gettid)
will no longer match. This might be a problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll create a test for this case to ensure we are not breaking it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't figure out how to write a phpt test for this case. I added some logging and a simple script to see what was going on though:
<?php
echo "about to fork\n";
$pid = pcntl_fork();
if(!$pid) {
echo sprintf("child process: %s \n", getmypid());
sleep(1);
} else {
echo sprintf("parent process: %s\n", getmypid());
sleep(2);
}
And it produces the following output:
$ php -dmax_execution_time=10 test.php
Timer 0 created on thread 54661
Setting timer 0 on thread 54661 (10 seconds)...
about to fork
Timer 0 created on thread 54662
parent process: 54661
child process: 54662
Setting timer 0 on thread 54662 (0 seconds)...
Deleting timer 0 on thread 54662...
Setting timer 0 on thread 54661 (0 seconds)...
Deleting timer 0 on thread 54661...
So it appears this might just be defensive programming, since it seems a fork creates its timers (and maybe doesn't inherit the parent's settings?).
I can also see it handles max_execution_time correctly as well:
<?php
echo "about to fork\n";
$pid = pcntl_fork();
if(!$pid) {
ini_set('max_execution_time', 1);
echo sprintf("child process: %s \n", getmypid());
sleep(3);
} else {
ini_set('max_execution_time', 4);
echo sprintf("parent process: %s\n", getmypid());
sleep(5);
}
which gives the following output:
$ php -dmax_execution_time=10 test.php
Timer 0 created on thread 54667
Setting timer 0 on thread 54667 (10 seconds)...
about to fork
Setting timer 0 on thread 54667 (0 seconds)...
Setting timer 0 on thread 54667 (4 seconds)...
parent process: 54667
Timer 0 created on thread 54668
Setting timer 0 on thread 54668 (0 seconds)...
Setting timer 0 on thread 54668 (1 seconds)...
child process: 54668
Setting timer 0 on thread 54668 (0 seconds)...
PHP Fatal error: Maximum execution time of 1 second exceeded in /usr/local/src/php/test.php on line 5
Fatal error: Maximum execution time of 1 second exceeded in /usr/local/src/php/test.php on line 5
Setting timer 0 on thread 54668 (0 seconds)...
Deleting timer 0 on thread 54668...
Setting timer 0 on thread 54667 (0 seconds)...
PHP Fatal error: Maximum execution time of 4 seconds exceeded in /usr/local/src/php/test.php on line 14
Fatal error: Maximum execution time of 4 seconds exceeded in /usr/local/src/php/test.php on line 14
Setting timer 0 on thread 54667 (0 seconds)...
Deleting timer 0 on thread 54667...
I'd love to write a test for this, but I'll have to dive into how phpt works to understand it. However, it looks like there may not be an issue here for pcntl_fork.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think switching to EG(pid) == pid
implicitly fixed this. 👍 This could be tested by forking, making the child timeout, and monitoring the chlid from the parent. But I'm not sure a test is worth it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I understand the issue, we leak one timer per thread because we create one during ts_resource (one per thread) and an other one during php_request_startup (once per request), and the former is never deleted.
However there are two things I don't get: What is generating the signals ? I would assume it's the leaked timers, but they are never armed, so they should never generate signals. Also, as Ilija noted, it's not clear why the signals are not handled.
Zend/zend_max_execution_timer.c
Outdated
ign_sigrtmin.sa_flags = 0; | ||
sigemptyset(&ign_sigrtmin.sa_mask); | ||
|
||
if(sigaction(SIGRTMIN, &ign_sigrtmin, NULL) < 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately this will reset the signal handler for the whole process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by that? If the shutdown function is called, the process should be done with whatever PHP script it was working on. If the process wishes to start handling another PHP script, it will call zend_max_execution_timer_init
again. Either way, the pending signals must be ignored to clear them, otherwise they will sit in SIGQ forever, even if another handler is set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently zend_max_execution_timer_init()
and zend_max_execution_timer_shutdown()
are called in php_request_start()
and php_request_shutdown()
respectively, so they are indeed called once per request. The signal handler is setup in zend_set_timeout_ex()
, at least once per request.
However, when zend_max_execution_timer_shutdown()
is called, there may be other requests executing in other threads of the same process. Resetting the signal handler here will affect them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see that now. I don't think it is an issue though because in zend_max_execution_timer_init()
we also reset the signal handler if shutdown has been previously called. Now that you've pointed this out, I think that was part of the reasoning for #12539, allowing timers to be reused, which would probably be out of scope here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The call to sigaction(sev.sigev_signo, NULL, &EG(oldact));
in zend_max_execution_timer_init()
is not resetting the handler, it's just copying the existing handler to &EG(oldact)
. So by resetting handlers in zend_max_execution_timer_shutdown()
, we are introducing a new issue.
We can not reset signals handlers at the end of each request, since this affects the whole process, including requests running in other threads. So I'm not sure how we can fix this issue if the only solution is to reset the signal handler.
However, I would like to challenge our understanding of the issue. Right now we are assuming that we leak signals when calling timer_delete()
while some signals are pending. I would expect this to be a very rare occurrence, and it's unlikely that we are leaking the amount of signal shown in php/frankenphp#440, so I don't think we have found the actual cause of the issue.
signal(7)
describes when signals are handled:
Whenever there is a transition from kernel-mode to user-mode execution (e.g., on return from a system call or scheduling of a thread onto the CPU), the kernel checks whether there is a pending unblocked signal for which the process has established a signal handler.
For a signal to leak (assuming that timer_delete()
causes the leak), we would need a signal to be queued before a timer_delete()
call, and that no context switch happened between these two events, which is unlikely.
What seems more likely is that the signal is masked, which is why it may be queued.
An other possibility is that the signal queue is filled with nested queues created by timer_create()
. timer_create()
appears to allocate a signal queue upon invocation. This counts towards RLIMIT_SIGPENDING
, and the function will return EAGAIN
(Resource temporarily unavailable, the error you are getting) if RLIMIT_SIGPENDING
is exceeded. SigQ in /proc/pid/status reports RLIMIT_SIGPENDING usage rather than the actual number of pending signals. So it's possible that the issue is only caused by the leak of one timer per thread.
Does this change alone (without the signal handler reset) fixes the issue for you?
How many different threads had executed a PHP script when SigQ reached 50k?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, I would like to challenge our understanding of the issue.
Good call here. I did, in-fact, fix the leak through SIG_IGN
before discovering that we were calling zend_max_execution_timer_init
twice. I actually didn't validate whether the SIG_IGN
was still required after fixing that. 🤦
Does this change alone (without the signal handler reset) fixes the issue for you?
It does fix the issue. I've updated the PR.
@arnaud-lb I've updated the PR description with more details. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, thank you!
There are conflicts with 8.2, could you rebase? I will merge soon after that.
4f92fcc
to
95c812b
Compare
ts_resource() and php_request_startup() both eventually call zend_max_execution_timer_init(), which didn't have a guard to prevent recreating signal handlers, thus resulting in leaking signals. This adds a guard to prevent the leak.
95c812b
to
e946b90
Compare
rebased and squashed the branch @arnaud-lb. |
* PHP-8.2: [ci skip] NEWS Fix timer leak (#13027)
* PHP-8.3: [ci skip] NEWS [ci skip] NEWS Fix timer leak (#13027)
This is a fix for php/frankenphp#440:
(1) is especially important in ZTS builds because
ts_resource()
andphp_request_startup()
both eventually callzend_max_execution_timer_init()
, which can result in a leak of signals.