Skip to content

Fix signal leak in max execution timers #13027

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 5, 2024

Conversation

withinboredom
Copy link
Member

@withinboredom withinboredom commented Dec 25, 2023

This is a fix for php/frankenphp#440:

  1. Ensures only a single timer/handler is registered per thread (only applicable to ZTS builds).

(1) is especially important in ZTS builds because ts_resource() and php_request_startup() both eventually call zend_max_execution_timer_init(), which can result in a leak of signals.

@devnexen devnexen requested a review from arnaud-lb December 26, 2023 11:35
Copy link
Member

@nielsdos nielsdos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks mainly good to me, sorry for my multiple review comments instead of reviewing in one pass.

Just what I'm not sure about is whether the signal needs to be cleared before deleting the timer.
It might also warrant a comment in the code that you're doing the signal clearing because the signals stay in the pending state even if the timer is deleted.
I'll defer the final review to Arnaud though.

Copy link
Member

@iluuu1994 iluuu1994 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand why the signals are never delivered. Can you clarify?

@@ -41,6 +43,14 @@ ZEND_API void zend_max_execution_timer_init(void) /* {{{ */
sev.sigev_signo = SIGRTMIN;
sev.sigev_notify_thread_id = (pid_t) syscall(SYS_gettid);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

zend_max_execution_timer_init is called from pcntl_fork. With this change, a new timer will no longer be created. However, it seems syscall(SYS_gettid) will no longer match. This might be a problem.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll create a test for this case to ensure we are not breaking it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't figure out how to write a phpt test for this case. I added some logging and a simple script to see what was going on though:

<?php

echo "about to fork\n";
$pid = pcntl_fork();
if(!$pid) {
    echo sprintf("child process: %s \n", getmypid());
    sleep(1);
} else {
    echo sprintf("parent process: %s\n", getmypid());
    sleep(2);
}

And it produces the following output:

$ php -dmax_execution_time=10 test.php
Timer 0 created on thread 54661
Setting timer 0 on thread 54661 (10 seconds)...
about to fork
Timer 0 created on thread 54662
parent process: 54661
child process: 54662
Setting timer 0 on thread 54662 (0 seconds)...
Deleting timer 0 on thread 54662...
Setting timer 0 on thread 54661 (0 seconds)...
Deleting timer 0 on thread 54661...

So it appears this might just be defensive programming, since it seems a fork creates its timers (and maybe doesn't inherit the parent's settings?).

I can also see it handles max_execution_time correctly as well:

<?php

echo "about to fork\n";
$pid = pcntl_fork();
if(!$pid) {
    ini_set('max_execution_time', 1);
    echo sprintf("child process: %s \n", getmypid());
    sleep(3);
} else {
    ini_set('max_execution_time', 4);
    echo sprintf("parent process: %s\n", getmypid());
    sleep(5);
}

which gives the following output:

$ php -dmax_execution_time=10 test.php
Timer 0 created on thread 54667
Setting timer 0 on thread 54667 (10 seconds)...
about to fork
Setting timer 0 on thread 54667 (0 seconds)...
Setting timer 0 on thread 54667 (4 seconds)...
parent process: 54667
Timer 0 created on thread 54668
Setting timer 0 on thread 54668 (0 seconds)...
Setting timer 0 on thread 54668 (1 seconds)...
child process: 54668
Setting timer 0 on thread 54668 (0 seconds)...
PHP Fatal error:  Maximum execution time of 1 second exceeded in /usr/local/src/php/test.php on line 5

Fatal error: Maximum execution time of 1 second exceeded in /usr/local/src/php/test.php on line 5
Setting timer 0 on thread 54668 (0 seconds)...
Deleting timer 0 on thread 54668...
Setting timer 0 on thread 54667 (0 seconds)...
PHP Fatal error:  Maximum execution time of 4 seconds exceeded in /usr/local/src/php/test.php on line 14

Fatal error: Maximum execution time of 4 seconds exceeded in /usr/local/src/php/test.php on line 14
Setting timer 0 on thread 54667 (0 seconds)...
Deleting timer 0 on thread 54667...

I'd love to write a test for this, but I'll have to dive into how phpt works to understand it. However, it looks like there may not be an issue here for pcntl_fork.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think switching to EG(pid) == pid implicitly fixed this. 👍 This could be tested by forking, making the child timeout, and monitoring the chlid from the parent. But I'm not sure a test is worth it.

Copy link
Member

@arnaud-lb arnaud-lb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand the issue, we leak one timer per thread because we create one during ts_resource (one per thread) and an other one during php_request_startup (once per request), and the former is never deleted.

However there are two things I don't get: What is generating the signals ? I would assume it's the leaked timers, but they are never armed, so they should never generate signals. Also, as Ilija noted, it's not clear why the signals are not handled.

ign_sigrtmin.sa_flags = 0;
sigemptyset(&ign_sigrtmin.sa_mask);

if(sigaction(SIGRTMIN, &ign_sigrtmin, NULL) < 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately this will reset the signal handler for the whole process.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by that? If the shutdown function is called, the process should be done with whatever PHP script it was working on. If the process wishes to start handling another PHP script, it will call zend_max_execution_timer_init again. Either way, the pending signals must be ignored to clear them, otherwise they will sit in SIGQ forever, even if another handler is set.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently zend_max_execution_timer_init() and zend_max_execution_timer_shutdown() are called in php_request_start() and php_request_shutdown() respectively, so they are indeed called once per request. The signal handler is setup in zend_set_timeout_ex(), at least once per request.

However, when zend_max_execution_timer_shutdown() is called, there may be other requests executing in other threads of the same process. Resetting the signal handler here will affect them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see that now. I don't think it is an issue though because in zend_max_execution_timer_init() we also reset the signal handler if shutdown has been previously called. Now that you've pointed this out, I think that was part of the reasoning for #12539, allowing timers to be reused, which would probably be out of scope here.

Copy link
Member

@arnaud-lb arnaud-lb Jan 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The call to sigaction(sev.sigev_signo, NULL, &EG(oldact)); in zend_max_execution_timer_init() is not resetting the handler, it's just copying the existing handler to &EG(oldact). So by resetting handlers in zend_max_execution_timer_shutdown(), we are introducing a new issue.

We can not reset signals handlers at the end of each request, since this affects the whole process, including requests running in other threads. So I'm not sure how we can fix this issue if the only solution is to reset the signal handler.

However, I would like to challenge our understanding of the issue. Right now we are assuming that we leak signals when calling timer_delete() while some signals are pending. I would expect this to be a very rare occurrence, and it's unlikely that we are leaking the amount of signal shown in php/frankenphp#440, so I don't think we have found the actual cause of the issue.

signal(7) describes when signals are handled:

Whenever there is a transition from kernel-mode to user-mode execution (e.g., on return from a system call or scheduling of a thread onto the CPU), the kernel checks whether there is a pending unblocked signal for which the process has established a signal handler.

For a signal to leak (assuming that timer_delete() causes the leak), we would need a signal to be queued before a timer_delete() call, and that no context switch happened between these two events, which is unlikely.

What seems more likely is that the signal is masked, which is why it may be queued.

An other possibility is that the signal queue is filled with nested queues created by timer_create(). timer_create() appears to allocate a signal queue upon invocation. This counts towards RLIMIT_SIGPENDING, and the function will return EAGAIN (Resource temporarily unavailable, the error you are getting) if RLIMIT_SIGPENDING is exceeded. SigQ in /proc/pid/status reports RLIMIT_SIGPENDING usage rather than the actual number of pending signals. So it's possible that the issue is only caused by the leak of one timer per thread.

Does this change alone (without the signal handler reset) fixes the issue for you?

How many different threads had executed a PHP script when SigQ reached 50k?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, I would like to challenge our understanding of the issue.

Good call here. I did, in-fact, fix the leak through SIG_IGN before discovering that we were calling zend_max_execution_timer_init twice. I actually didn't validate whether the SIG_IGN was still required after fixing that. 🤦

Does this change alone (without the signal handler reset) fixes the issue for you?

It does fix the issue. I've updated the PR.

@withinboredom
Copy link
Member Author

@arnaud-lb I've updated the PR description with more details.

Copy link
Member

@arnaud-lb arnaud-lb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thank you!

There are conflicts with 8.2, could you rebase? I will merge soon after that.

ts_resource() and php_request_startup() both eventually call zend_max_execution_timer_init(), which didn't have a guard to prevent recreating signal handlers, thus resulting in leaking signals. This adds a guard to prevent the leak.
@withinboredom
Copy link
Member Author

rebased and squashed the branch @arnaud-lb.

@arnaud-lb arnaud-lb merged commit 6342f73 into php:PHP-8.2 Jan 5, 2024
arnaud-lb added a commit that referenced this pull request Jan 5, 2024
* PHP-8.2:
  [ci skip] NEWS
  Fix timer leak (#13027)
arnaud-lb added a commit that referenced this pull request Jan 5, 2024
* PHP-8.3:
  [ci skip] NEWS
  [ci skip] NEWS
  Fix timer leak (#13027)
@withinboredom withinboredom deleted the fix/timers branch January 5, 2024 18:51
@dunglas dunglas mentioned this pull request Sep 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants