Skip to content

Allow fiber switching in destructors #13460

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed

Conversation

arnaud-lb
Copy link
Member

@arnaud-lb arnaud-lb commented Feb 21, 2024

See #11389:

Right now, when trying to suspend the current fiber in a destructor, the engine throws "Cannot switch fibers in current execution context".
[...]
Not being able to [switch fibers] in destructors kinda reintroduces the “What color is your function?” problem, which fibers were designed to avoid.

This PR allows fiber switching in destructors.

I assume that the reason Fiber for which switching is currently not allowed in destructors is that it could suspend GC runs indefinitely and preclude any future GC run.

This PR avoids this issue by implementing the second part of the solution proposed by @trowski here: The GC calls destructors in a Fiber, such that they can suspend without suspending the GC itself. If the Fiber is suspended, a new one is started to execute the remaining destructors. They are still executed during GC (when GC_G(gc_active) is 1) as they used to, so the changes are minimal (the GC algorithm is not affected). One difference is that they may not terminate before the GC ends. Moving destructor calls after GC (when other GC runs can execute) doesn't appear to be necessary and would add complexity (the GC buffer would be accessed concurrently).

One caveat is that the engine can not guarantee that suspended destructors eventually terminate. In particular, the shutdown sequence will destroy all live objects (including fibers) in an unspecified order. This forcefully terminates any suspended destructor, executing only their finally blocks.

The PR does not allow Fiber switching in the shutdown sequence yet. (Edit: updated PR to allow switching during shutdown.) It may be pointless since nothing can be relied on at this point: The sequence may have destroyed all other objects and fibers.

We don't need to call destructors in a Fiber if the GC itself was invoked in the main context, because destructors can not suspend the GC in this case. So when there is no EG(active_fiber), we can call destructors directly.

Synthetic benchmarks show no significant performance regressions when comparing this branch to master.

cc @nicolas-grekas @trowski

@bwoebi
Copy link
Member

bwoebi commented Feb 21, 2024

The PR does not allows Fiber switching in the shutdown sequence yet. It may be pointless since nothing can be relied on at this point: The sequence may have destroyed all other objects and fibers.

It may have called their destructors, but the objects are still having their data. Also shutdown tasks like "send a http call and await it" are still possible to do - just with fiber switching.

Being in shutdown just means that the destructor itself must not suspend, but it may start other fibers and I don't see why that would be forbidden.

@arnaud-lb
Copy link
Member Author

It may have called their destructors, but the objects are still having their data

What I had in mind is that these objects may not be fully functional at this point because they may have closed critical resources or transitioned to some invalid state. In particular, the destructor of Fibers terminates them and they can not be resumed after that. The following program throws because the Fiber is destroyed before C::__destruct is called:

class C {
    public static $c;
    public function __construct(public Fiber $f) {}
    public function __destruct() {
        $this->f->resume();
    }
}

$f = new Fiber(function () {
    Fiber::suspend();
});
$f->start();

C::$c = new C($f);

Being in shutdown just means that the destructor itself must not suspend, but it may start other fibers and I don't see why that would be forbidden.

Agreed. Also I didn't realize that the shutdown sequence always happens in the main "fiber", so we can safely unblock switching without risking suspension of the shutdown sequence.

I've updated the PR to allow switching during shutdown.

@arnaud-lb arnaud-lb changed the title [WIP] Allow fiber switching in destructors Allow fiber switching in destructors Feb 23, 2024
@nicolas-grekas
Copy link
Contributor

Looks like this works!
I rebased nicolas-grekas/symfony#43 and I have one test case ("testMaxDuration") that enters an infinite loop I need to investigate. The other ones are green \o/

@arnaud-lb arnaud-lb marked this pull request as ready for review March 6, 2024 14:04
@nicolas-grekas
Copy link
Contributor

All good on my side with this PR. I figured out the infinite loop, it was on my side.

:shipit:

@iluuu1994
Copy link
Member

iluuu1994 commented Mar 6, 2024

I believe suspending fibers in destructors might make memory corruption at a distance more probable. We've had many issues in the past where destructors or error handlers would lead to segfaults if they would modify memory currently referenced through indirect operands. For example:

Zend/tests/in-de-crement/oss-fuzz-61469_binop_dynamic_property_unset_error_handler.phpt:

class C {
    function errorHandle() {
        unset($this->a);
    }
}
$c = new C;
set_error_handler([$c,'errorHandle']);
$c->a += 5;
var_dump($c->a);

(The same issue goes for __destruct, I just couldn't find a better example)

Such errors are unlikely to happen in real code, as destructors are usually short and don't rely on global values. For the error to occur, the handler specifically needs to modify values that the current opcode handler is relying on.

However, when suspending a fiber, suddenly all code that is executed until the fiber is resumed is running "in the destructor". For example, if $foo->bars[0] = $bar replaces a object in the array with a destructor that suspends the fiber, and the destructor is called before the replacement happens, any code that adds to $foo->bars may cause the array to grow and the indirect pointer in the original handler to become invalid.

Many of those were fixed by delaying the destructor call. Maybe all of them have been resolved. I'm just pointing this out because they may be hard to diagnose.

@dstogov
Copy link
Member

dstogov commented Mar 11, 2024

I would prefer not to do this without a big reason.

__destructors are special functions and I think it's OK to prohibit them switching the context.

In GC destructors are called before the memory is actually freed, switching to another fiber at this moment will lead to additional memory consumption grow. In any case, destructors just can't be 100% compatible with GC their selves.

@bwoebi
Copy link
Member

bwoebi commented Mar 11, 2024

The big reason are surprising fatal errors in production code because something deep down a stacktrace within a destructor happened to do a small I/O task which naturally causes a fiber switch.
I.e. it's much more about stability to avoid issues than just the ability to directly fibers in destructors.

@dstogov
Copy link
Member

dstogov commented Mar 11, 2024

The big reason are surprising fatal errors in production code because something deep down a stacktrace within a destructor happened to do a small I/O task which naturally causes a fiber switch. I.e. it's much more about stability to avoid issues than just the ability to directly fibers in destructors.

I see and understand your intention (it makes sense).
For now I can't understand all the details of the patch and its subsequences.

In case of transfer in destructor during GC, the gc stays in the active state and the collected garbage is going to be freed only when control is transferred back and all the destructors are finished. Right?

@arnaud-lb
Copy link
Member Author

In case of transfer in destructor during GC, the gc stays in the active state and the collected garbage is going to be freed only when control is transferred back and all the destructors are finished. Right?

If a destructor transfers to an other Fiber via start() or resume(), this is what will happen. This is no different than a destructor calling an other function, in this context.

If a destructor suspends, it actually transfers the control to the GC (gc_call_destructors_in_fiber). In this case we proceed to execute other destructors, and garbage can be freed as soon as all destructors have either terminated or suspended. Objects with destructors may be collected during the next run, as before.

@dstogov
Copy link
Member

dstogov commented Mar 11, 2024

In case of transfer in destructor during GC, the gc stays in the active state and the collected garbage is going to be freed only when control is transferred back and all the destructors are finished. Right?

If a destructor transfers to an other Fiber via start() or resume(), this is what will happen. This is no different than a destructor calling an other function, in this context.

It's clear when the function returns buck, but unclear when the fibers would transfer the control.

If a destructor suspends, it actually transfers the control to the GC (gc_call_destructors_in_fiber). In this case we proceed to execute other destructors, and garbage can be freed as soon as all destructors have either terminated or suspended. Objects with destructors may be collected during the next run, as before.

Do you mean should_rerun_gc logic?

@arnaud-lb
Copy link
Member Author

In case of transfer in destructor during GC, the gc stays in the active state and the collected garbage is going to be freed only when control is transferred back and all the destructors are finished. Right?

If a destructor transfers to an other Fiber via start() or resume(), this is what will happen. This is no different than a destructor calling an other function, in this context.

It's clear when the function returns buck, but unclear when the fibers would transfer the control.

If a destructor suspends, it immediately transfers the control back to the GC. That's because this PR runs destructors in a separate fiber.

Before this PR, this is what would happen in case a destructor suspends:

Initial fiber:

gc_collect_cycle()
  GC_G(active)=1
  for each obj in buffer:
    obj->handler->dtor_obj() // May suspend indefinitely, GC is still active and can not terminate
  GC_G(active)=0

After this PR, this is what would happen:

Initial fiber:

gc_collect_cycle()
  GC_G(active)=1
  gc_call_destructors_in_fiber()
    start:
      start_fiber(gc_call_destructors) // Transfers to Dtor Fiber
      // If Dtor Fiber suspends, control is transferred here:      
      if buffer has pending dtors: 
        goto start
  GC_G(active)=0

Dtor Fiber:

gc_call_destructors()
  for each obj in buffer:
     ADDREF(obj)
     obj->handler->dtor_obj() // May suspend. This transfers to Initial Fiber. This Dtor Fiber is now owned by the application.
     DELREF(obj)
     if dtor_obj did suspend and just resumed:
         return

When obj->handler->dtor_obj suspends above, the control is transferred to the GC. The GC can continue executing other dtors, and can then free garbage. Objects which suspended during dtor_obj are not collectable due to their refcount, but will become collectable again as soon as their fiber terminates.

It's up to the application to ensure that suspended fibers eventually resume. To handle the case where an application doesn't reference a fiber anymore, I've updated the branch to ensure that the GC doesn't hold on suspended dtor fibers, so that they can be collected in case the application doesn't reference them.

If a destructor suspends, it actually transfers the control to the GC (gc_call_destructors_in_fiber). In this case we proceed to execute other destructors, and garbage can be freed as soon as all destructors have either terminated or suspended. Objects with destructors may be collected during the next run, as before.

Do you mean should_rerun_gc logic?

Yes. The should_rerun_gc logic is unaffected by this patch, except that objects whose destructor have suspended may not be collected during the re-run (like resurrected objects) until either their fiber terminates or is collected.

@nicolas-grekas
Copy link
Contributor

nicolas-grekas commented Apr 8, 2024

Hello, is there any chance this PR could be merged for 8.4? Without it, we can't really enjoy using fibers as seamlessly as they were supposed to work (no blue/green functions). I would be happy to use fibers more in e.g. Symfony but this issue with destructors is preventing me from doing so.

@arnaud-lb
Copy link
Member Author

@dstogov do you have objections about this change after my latest comment?

@dstogov
Copy link
Member

dstogov commented Jun 5, 2024

@dstogov do you have objections about this change after my latest comment?

@arnaud-lb I can't follow your design decisions (what is expected and what is not) through analysing the code changes and reading comments. It would be great to have some design document. I mainly afraid about the GC changes.

The following example demonstrates that switching Fiber in a destructor prevents execution of the rest of the destructor code.

Is this intended? For me this looks wrong.

<?php

class Data {
	public $n;
    public function __construct($n) {
    	$this->n = $n;
    }
    public function dump() {
        echo "dump(" . $this->n . ")\n";
    }
    public function __destruct() {
        echo "dtor(" . $this->n . ")\n";
        $this->n = null;
    }
}

class Foo {
    public $next;
    public $data;
    public function __destruct() {
    	$this->data->dump();
        Fiber::suspend();
    	$this->data->dump(); // this one is never called
    }
}

$fiber = new Fiber(function () {
    call_user_func(function () {
        $a = new Foo;
        $b = new Foo;
        $a->next = $b;
        $b->next = $a;
        $a->data = new Data("a");
        $b->data = new Data("b");
    });

    var_dump(gc_collect_cycles());
});

$fiber->start();
var_dump(gc_collect_cycles());
?>

@arnaud-lb
Copy link
Member Author

arnaud-lb commented Jun 6, 2024

@dstogov I wrote this: https://docs.google.com/document/d/15xNb_X7gF3B_X-7jv1hXWeFWkW2aROmT_WBHQrKKPLM/edit. I've tried to clarify what I could, but I'm not sure what to add. Let me know if you want me to clarify something or to expand some part.

The following example demonstrates that switching Fiber in a destructor prevents execution of the rest of the destructor code.

This is intended, and here is why I think this is not an issue:

From the GC point of view:

The object's refcount is increased before executing the dtor, and will only be decreased when the dtor returns (not when it suspends, because DELREF() does not execute until it returns) :

php-src/Zend/zend_gc.c

Lines 1834 to 1837 in f15cd75

GC_ADDREF(obj);
obj->handlers->dtor_obj(obj);
GC_TRACE_REF(obj, "returned from destructor");
GC_DELREF(obj);

As a result, unless the dtor resumes and returns during the current GC run, the object will be considered resurrected by the GC, and will not be freed in this run.

The object remains in the buffer and will be scanned again during next run. I use gc_check_possible_root() on the obj when the dtor resumes, in case it resumed after more than one run.

This is safe, as it's not different from an object resurrecting itself in a destructor by creating more references to itself.

From the application point of view:

If a suspended dtor is not resumed until the application terminates, the rest of the dtor may never be executed. The enclosing Fiber will be gracefully terminated, and only finally blocks will be executed.

It is the responsibility of the application to ensure that suspended Fibers are eventually resumed.

In the typical use-case, Fibers are managed by an event loop in a library like Amphp, and the event loop will not return until all Fibers are terminated, so this should not be an issue, and the burden is not on the user.

I added a pseudo event loop to your example: https://gist.github.com/arnaud-lb/db60218c3333913d66f19f3d7c599720. In this case the destructors are allowed to return.

@dstogov
Copy link
Member

dstogov commented Jun 6, 2024

The following example demonstrates that switching Fiber in a destructor prevents execution of the rest of the destructor code.

This is intended, and here is why I think this is not an issue:

May be I don't understand something. Let an object destructor calls some "asynchronous" IO routine that calls Fiber::suspend(), assuming it is suspending the application Fiber where it was started in and that it's going to continue the IO, when data is ready and fiber is resumed, but instead the GC Fiber is suspended, GC destroys this object, IO can't be resumed and the rest of the destructor code is never executed. Now let GC is called at non-deterministic moment and some destructors executed partially.

Adding $fiber->resume() at the end of the example doesn't resume the suspended destructor(s).

@dstogov
Copy link
Member

dstogov commented Jun 6, 2024

See the example:

<?php
class Foo {
    public $next;
    public function __destruct() {
   	  try {
        echo "1\n";
        Fiber::suspend();
        echo "2\n";
      } finally {
        echo "3\n";
      }	
    }
}

class Bar {
    public $next;
}

/* This works */

$fiber = new Fiber(function () {
    call_user_func(function () {
        $a = new Foo;
    });
});

$fiber->start();
$fiber->resume();
echo "4\n\n";

/* This doesn't work */

$fiber = new Fiber(function () {
    call_user_func(function () {
        $a = new Foo;
        $b = new Bar;
        $a->next = $b;
        $b->next = $a;
    });
    gc_collect_cycles();
});

$fiber->start();
$fiber->resume();
echo "4\n";
?>

The output:

1
2
3
4

1
3
PHP Fatal error:  Uncaught FiberError: Cannot resume a fiber that is not suspended in /home/dmitry/php/php-master/CGI-DEBUG-64/gc2.php:44
Stack trace:
#0 /home/dmitry/php/php-master/CGI-DEBUG-64/gc2.php(44): Fiber->resume()
#1 {main}
  thrown in /home/dmitry/php/php-master/CGI-DEBUG-64/gc2.php on line 44

The only difference between (1) and (2) that the GC was involved.

@trowski
Copy link
Member

trowski commented Jun 6, 2024

@dstogov Your second example fails because the fiber referenced by $fiber did not suspend. Rather the fiber created by the GC suspended. This trivial example has nothing to resume that fiber, but Revolt, AMPHP, or ReactPHP would have scheduled some event to resume the fiber. These frameworks do not care specifically where a fiber was created, but manage scheduling of suspending fibers.

If the destructor is not resumed, the unresumed fiber is destroyed and cleaned up at shutdown like any other unresumed fiber at shutdown. This is not a unique or unexpected behavior.

I'd like to get this merged for 8.4 if possible. The inability to suspend in destructors has been a long-standing issue for users of Fibers, and now is a blocker for wider adoption in frameworks like Symfony.

@dstogov
Copy link
Member

dstogov commented Jun 6, 2024

Your second example fails because the fiber referenced by $fiber did not suspend. Rather the fiber created by the GC suspended.

In both examples an instance of the class Foo is destroyed in a Fiber. In first case because of reference-counting, in second case by GC. The behaviour is completely different. The attempt to resume the destructor work at first case and not at the second. I can't see this as a right behavior.

@trowski
Copy link
Member

trowski commented Jun 6, 2024

In order for the behavior to be identical, we'd need to run every destructor within a fiber, so there would be a fiber switch incurred on every destructor call. I doubt this is an acceptable overhead. I think the behavior difference here may not be necessarily intuitive at first, but is expected when using fibers.

The attempt to resume the destructor work at first case and not at the second.

The suspend call in the second example is not within the fiber created in the call to gc_collect_cycles() (the one referenced by $fiber), the suspend call is within the fiber created by the GC, which is why the fiber in the second example is not suspending. This is not uncommon when using fibers to create another fiber within a fiber which may suspend but does not suspend the fiber which created it.

@arnaud-lb
Copy link
Member Author

@dstogov

May be I don't understand something. Let an object destructor calls some "asynchronous" IO routine that calls Fiber::suspend(), assuming it is suspending the application Fiber where it was started in and that it's going to continue the IO, when data is ready and fiber is resumed, but instead the GC Fiber is suspended, GC destroys this object, IO can't be resumed and the rest of the destructor code is never executed. Now let GC is called at non-deterministic moment and some destructors executed partially.

I think that an async IO routine would not make assumptions about the current Fiber, and would use Fiber::getCurrent() instead, so that it can work with many concurrent fibers. It would also keep a reference to the Fiber so it can resume it later:

function io_read($fd) {
    $fiber = Fiber::getCurrent();
    EventLoop::onReadable($fd, fn () => $fiber->resume());
    Fiber::suspend();
    return fread($fd);
}

function __destruct() {
    echo io_read($fd), "\n";
}

When using fibers like this, the proposed change is safe, as the Fiber from which __destruct is called does not matter. I believe this is what AmpPHP and related libraries do.

@dstogov
Copy link
Member

dstogov commented Jun 10, 2024

@arnaud-lb OK. I see that this works for the desired use-case.

I don't support this (I expect related troubles), but I won't object against this any more.
Let others approve this, or go through RFC vote process.

Copy link
Member

@trowski trowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After asking for feedback in the slack chat a couple times, there were no objections to merging this PR.

@arnaud-lb arnaud-lb closed this in 3c56af9 Jul 2, 2024
arnaud-lb added a commit that referenced this pull request Jul 2, 2024
fabpot added a commit to symfony/symfony that referenced this pull request Aug 19, 2024
…olas-grekas)

This PR was merged into the 7.2 branch.

Discussion
----------

[HttpClient] Add support for amphp/http-client v5

| Q             | A
| ------------- | ---
| Branch?       | 7.2
| Bug fix?      | no
| New feature?  | yes
| Deprecations? | no
| Issues        | #52008
| License       | MIT

This PR adds support for amphp/http-client version 5 as a transport for our HttpClient component.

It started as a draft at nicolas-grekas#43, which helped spot that PHP is missing the capability to suspend fibers in destructors. This was reported as php/php-src#11389 and is being fixed at php/php-src#13460.

Since the fix for php-src is going to land on PHP 8.4, using amphp/http-client version 5 will require php >= 8.4.

The implementation duplicates the one we have for v4 with the needed changes to use the v5 API. The difference are not big in size of code, but they're very low level (generators vs fibers). That would be quite useless to factor IMHO.

Commits
-------

a5fb61c [HttpClient] Add support for amphp/http-client v5
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants