Skip to content

Refactor Fiber internal implementation #7085

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from
Closed

Conversation

twose
Copy link
Member

@twose twose commented Jun 1, 2021

No BC breaks at the user level.

I tried to divide them into multiple PRs but I failed, changes depend on each other to a certain extent.

  1. Introduced main fiber. It's one of the most important parts of this PR. Any C stack can be abstracted as fiber, the main fiber in fibers is like the main thread in threads.

  2. Put the fiber object and its context on its own C stack. This is the same trick as the zend_vm_stack does. It also works on the main fiber, we can put the main fiber on the main PHP VM stack without emalloc.

  3. Add zend_fiber_jump and merge code of resume/suspend, this is the biggest problem in the previous implementation. The same code appears repeatedly in various places before. I believe the logic of fiber switching is clearer now.

(expected output of observer_fiber_05.phpt looks incorrect, we have 2 fibers but there are 3 destroying && destroyed, it's correct after refactor).

  1. Change the order of executor_globals members, so that we can switch context through a memcpy instead of multiple assignments, this simplifies the code, and for better performance.

  2. boost context provided the transfer structure to let users can transfer ptr data between fibers, so we can transfer zval ptr to fiber and use ZVAL_COPY on the peer side without an extra field to store zval. But we still need a zval to store the return value of fiber because we provided getReturn

  3. Remove the exception field from zend_fiber, we can use one thread-safe global var instead.

  4. About the ASan part (af29059 ), the code makes me confused, I believe the correct/clearer example is https://github.com/boostorg/context/blob/master/include/boost/context/continuation_ucontext.hpp

  5. Rename value to data, as boost called it transfer.data I believe it's more clear to named it data than value

  6. Rename caller to from, as boost called it transfer.from (also in boost.fiber), which means where the switch came from.

  7. Nested fibers are maintained by linked list structure, we usually use prev to represent the previous one, I named it previous here.

  8. We only need four statuses, INIT/SUSPENDED/RUNNING/DEAD (same with Lua coroutine), and it's better to make it be an enum, any other internal status information is expressed in fiber->flags

  9. Remove zend_fiber_context APIs, they are meaningless

Also, since we do not want to break the RFC, there are many optimizations or enhancements that can not be implemented for now. It is still the beginning.

twose added 2 commits June 1, 2021 16:09
Also align the macro names and generally macro definitions are always at the top of the file.
@trowski
Copy link
Member

trowski commented Jun 1, 2021

Without giving this a thorough examination because I don't have time today, there needs to be a way to know if you are within {main}. This means either Fiber::this() needs to return null or another method, such as Fiber::inMain(), needs to be added. Otherwise this is a BC break at the user level.

Can you explain the motivation of treating {main} like a fiber? I'm not immediately seeing the advantage of it.

While the zend_fiber_context API may not seem immediately useful to you, there may be a reason for an extension to make use of it. Otherwise, it was nice to have these separate functions even if we ultimately remove them from the API. At a glance, there's so much going on in some of these functions it's much harder to follow than before. That's why I broke operations like stack alloc/free into separate functions.

@@ -7,7 +7,7 @@ $value = Fiber::suspend(1);

?>
--EXPECTF--
Fatal error: Uncaught FiberError: Cannot suspend outside of a fiber in %ssuspend-outside-fiber.php:%d
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in this particular instance the old description is more obvious than "nowhere to go".

@trowski
Copy link
Member

trowski commented Jun 1, 2021

Quick note: The observer test output is a bit dubious because the inner fiber is destroyed while destroying the outer fiber, so the switch back to the outer fiber prints the destroying/destroyed pair twice. I do like having separate status and flag fields so the observer could differentiate between these states as you did, so +1 to that portion.

@bwoebi
Copy link
Member

bwoebi commented Jun 1, 2021

I think a definite advantage of treating main like a fiber is just defensive programming - there's no need to NULL check, you always jump from a fiber to another fiber, not from a fiber to null. I prefer this approach as well. It just probably should expose something to distinguish there.

@trowski
Copy link
Member

trowski commented Jun 1, 2021

@bwoebi A disadvantage is that Fiber::suspend() is not valid in {main} even though it's a fiber. As-is, this PR breaks code written using the current fiber API.

@twose
Copy link
Member Author

twose commented Jun 2, 2021

What many people have not recognized is that the reason why the main fiber can not suspend is not it is main fiber, but because it has not previous one, that's why I changed the error message to "Fiber has nowhere to go".
Therefore, the conditions for judging whether fiber can suspend is not current_fiber == main_fiber, it is fiber->previous != NULL.
I believe that $fiber->getPrevious() !== null works for you too. So we may need a new API Fiber->getPrevious().

Can you explain the motivation of treating {main} like a fiber? I'm not immediately seeing the advantage of it.

I have explained a part of the reasons in another thread, as @bwoebi said. Let main fiber be NULL will causes disaster (I am talking from my experience), and we have to check it everywhere. But this is not the biggest advantage of it.
I said the reason why the main fiber can not suspend is it has no previous one, but what will happen if we have one? After we register a fiber as the scheduler, we can swap it with the main fiber in the linked list, and the scheduler becomes the root node. Now the fiber system is almost completely consistent with the thread system in our operating system. And the most obvious benefit is that we need not new a Fiber to run tests anymore, it's very useful in PHPUnit (or PHPT). After we implement syscall hook, we can reuse all of the tests without modifying one line as we did in swow-bcreports, it runs all PHP tests with Swow extension to detect any behavior changes.

Of course that if you still want to treat main differently, provide Fiber::main() is better for me. We can use Fiber::this() === Fiber::main() instead of Fiber::inMain() (for these, I think Fiber::getCurrent() and Fiber::getMain() are better for me too.)

@twose
Copy link
Member Author

twose commented Jun 2, 2021

While the zend_fiber_context API may not seem immediately useful to you, there may be a reason for an extension to make use of it.

For example? It's definitely no for me. Imagine if other extensions implement their own fiber and his fiber has a different memory struct with zend_fiber, how could we make all of them work at the same time? If we could, we need not make any changes to Swoole.
And if one extension wants to implement a C context switch manager, it can do it on its own easily just like Swoole has its own coroutine implementation. The reason why we need fiber APIs is that we want our fibers can also work in ZendVM with zend_fiber. zend_fiber_context is not something that binds with Zend, and it is not the responsibility of Zend to provide the APIs of C context management I think.
And if you want fibers can run the C function without the ZendVM stack, it's ok and it was already in my plan because we also need it, but not in this PR. I believe that it can cover almost all requirements for other extensions in this scope after we support it.

@@ -622,6 +622,11 @@ ZEND_METHOD(Fiber, this)
{
ZEND_PARSE_PARAMETERS_NONE();

/* Follow RFC for now */
if (EG(current_fiber) == EG(main_fiber)) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though it seems ridiculous to me, but it's the fast way to solve the BC break here...

@trowski
Copy link
Member

trowski commented Jun 2, 2021

Let main fiber be NULL will causes disaster (I am talking from my experience), and we have to check it everywhere.

Having written a lot of fiber code, "have to check it everywhere" is not the case using the proper abstraction if you want to treat main as though it were a fiber.

And if you want fibers can run the C function without the ZendVM stack, it's ok and it was already in my plan because we also need it, but not in this PR.

So you do not think a C-stack switching API necessary but you plan to add it back in another PR? I'm confused. Fine if it isn't part of the Zend API, but IMO scattering the content of those functions into others is not helping readability.


This PR is changing the existing asymmetric fiber model to a symmetric model where fibers are not modeled as a stack, but rather independent "threads" where each must determine the next fiber to resume (which is often the prior fiber) rather than always suspending to the prior context. I think if we went this direction, we would want the user API to reflect this model and remove Fiber::suspend() and add other methods such as Fiber->getPrevious().

I really like a couple of the ideas you have here, particularly the way you separated fiber status from flags, returned bool from zend_fiber_resume and similar, and swapping EG values using memcpy, but I'm not sure on some of the others such as moving the first VM stack page into the mmap'd memory. I'd like to see these individual concepts introduced as separate PRs.

@mvorisek
Copy link
Contributor

mvorisek commented Jun 2, 2021

I have a two questions about fibers:

  1. what is the performance, how much time does one fiber create/switch consumes?
  2. any plan to implement isolated context fiber, eg. still single thread but with separated userspace memory? It would allow using multiple versions of the same classes, functions, which currently need proc_open/fork which is slow and often disabled for security reasons.

@twose
Copy link
Member Author

twose commented Jun 2, 2021

Having written a lot of fiber code, "have to check it everywhere" is not the case using the proper abstraction if you want to treat main as though it were a fiber.

Sorry for my poor English... I mean we have to check it everywhere if we have not main fiber abstraction.

So you do not think a C-stack switching API necessary but you plan to add it back in another PR? I'm confused. Fine if it isn't part of the Zend API, but IMO scattering the content of those functions into others is not helping readability.

I mean we can run the pure C function with zend_fiber directly (and we don't need to switch ZendVM stack for pure C fiber),
so C-stack switching API is unnecessary. Maybe I can show you how to do this soon in a new PR?

This PR is changing the existing asymmetric fiber model to a symmetric model

I do not think so, I just simplified the code logic.
Control flow with symmetric coroutines instead is not stack-like. A coroutine can always yield freely to any other coroutine and is not restricted to return to its caller.
But even now, we still can not do that, for example, Fiber::this()->resume(Fiber::this()->getPrevious()->getPrevious()); is not allowed.

but I'm not sure on some of the others such as moving the first VM stack page into the mmap'd memory. I'd like to see these individual concepts introduced as separate PRs.

Thank you for your approval. But can we confirm all of these here, it could reduce my workload, we still have a lot to do... unless you are totally opposed to other changes.

@twose
Copy link
Member Author

twose commented Jun 2, 2021

Hi, @mvorisek

  1. what is the performance, how much time does one fiber create/switch consumes?

I think this has included in the RFC... I think some of us still have limited knowledge of it although it has discussed and landed.
Fiber's context is much smaller than process, it can switch millions of times per second.
I prefer to take the magnitude of time rather than a specific time as an index of its performance.
Generally speaking, system calls may take microseconds, but it only takes nanoseconds to switch the fiber. So we can use fiber and multiplexing IO to make a process run in user mode as much as possible and reduce syscalls. Fiber greatly reduces the overhead of context switching between processes/threads, that's why golang has such performance.
Fork a process/thread may even take milliseconds sometimes, so expensive, but creating a fiber is as fast as alloc a piece of memory.

  1. any plan to implement isolated context fiber, eg. still single thread but with separated userspace memory? It would allow using multiple versions of the same classes, functions, which currently need proc_open/fork which is slow and often disabled for security reasons.

No. Technologies like Swoole are more like nodejs in golang style. It cannot solve the performance problem of CPU-intensive computing, but it can solve the problem of intensive IO.
Modern web frameworks have already provided isolation of global resources. Fibers can handle request/response object separately instead of using global vars like $_REQUEST, so we can speed up our application 100x faster in Swoole-like server without a lot of refactoring.

@trowski
Copy link
Member

trowski commented Jun 2, 2021

Sorry for my poor English... I mean we have to check it everywhere if we have not main fiber abstraction.

We seem to be still misunderstanding each other, sorry. I was alluding to a scheduler running in a fiber that switches between {main} and a set of independent fibers, where such an abstraction handles the logic of switching.

My future intention is to run the scheduler (event-loop) within {main} and then run the user script within a fiber. This is part of why I hesitate to alter {main} to be a fiber now, as I have not given any thought to what repercussions this may have with this future goal.

I mean we can run the pure C function with zend_fiber directly (and we don't need to switch ZendVM stack for pure C fiber),
so C-stack switching API is unnecessary. Maybe I can show you how to do this soon in a new PR?

Yes, an extension can use zend_fiber for this, and maybe is better anyway. I'm fine with removing the functions from the API, but again eliminating them has moved the responsibilities to other functions, increasing their complexity and reducing readability.

This PR is changing the existing asymmetric fiber model to a symmetric model

I do not think so, I just simplified the code logic.
Control flow with symmetric coroutines instead is not stack-like. A coroutine can always yield freely to any other coroutine and is not restricted to return to its caller.
But even now, we still can not do that, for example, Fiber::this()->resume(Fiber::this()->getPrevious()->getPrevious()); is not allowed.

The internal API is now symmetric, that's why {main} had to be a fiber in this model, as a fiber cannot suspend to main, it must resume main. The user API is asymmetric. This divergence bothers me somewhat.

Thank you for your approval. But can we confirm all of these here, it could reduce my workload, we still have a lot to do... unless you are totally opposed to other changes.

There are parts to which I'm opposed. I'm unsure about putting a single fiber object onto the VM stack only for the main fiber to avoid a single alloc. IMO, the complexity does not justify any perceived performance gain.

I think using memcpy for the EGs is interesting, but I'd like know how much performance is gained over a few assignments. This design again increases complexity and makes it more difficult for a newcomer to determine what the code is doing.

The PR is noisy due to superfluous changes, such as rearranging functions and renaming parameters because you prefer "data" over "value." This makes some changes more difficult to notice (such as I just noticed you're now using ZEND_FIBER_GUARD_PAGES as a bool instead of a count of guard pages).

I'd prefer these changes to come as separate PRs so we can better understand all the changes that are being made. If you don't have the time, I can break some of this down and tag you as a co-author.

@twose
Copy link
Member Author

twose commented Jun 2, 2021

The design I advocate is the content that has been verified in Swoole.
And I think it allows us to make ZendVM work as a micro operating system, make the fiber more like a real thread but lighter. It can help us implement technologies similar to golang and reach the final goal: speed up PHP 100x faster with minimal cost.

About ZEND_FIBER_GUARD_PAGES, I forgot to talk about that. Do we have any reason to protect multiple memory pages? Its purpose is to prevent stack overflows, in my understanding, protecting multiple pages will only waste memory. Could you explain the reason for doing this?

I think using memcpy for the EGs is interesting, but I'd like to know how much performance is gained over a few assignments. This design again increases complexity and makes it more difficult for a newcomer to determine what the code is doing.

Performance is not important, because they are always on some magnitude of time.
I do not think it's difficult for C developers who have begun to study the implementation of fiber.

The PR is noisy due to superfluous changes.

Diff does look terrible for me too... we have ensured that all tests pass and are consistent with the original behavior, I think it may be better if we re-read it instead of looking at the diff. I believe the new code has clearer logic, otherwise refactoring is meaningless...

I'd prefer these changes to come as separate PRs so we can better understand all the changes that are being made.

It's OK for me. I just want to speed up. I always work on open-source projects :)
we still have a lot to do => we still have a lot to do on fiber.
it’s just a bit cumbersome for me to reorganize ideas and re-explain them in English. (It would be great if someone can understand my thoughts and help me with writing RFCs ^^)
I may open new PRs as soon as possible, but I will not close this PR temporarily, I still hope we can merge it directly.

@twose
Copy link
Member Author

twose commented Jun 2, 2021

The internal API is now symmetric, that's why {main} had to be a fiber in this model, as a fiber cannot suspend to main, it must resume main. The user API is asymmetric. This divergence bothers me somewhat.

I still do not think so.
In the original implementation, you just stored the context of the main elsewhere (on the stack) without abstraction, and therefore the logic of switching becomes more complicated.
We can remove the main fiber here (I have done this at the user level), we just need a clearer place to store the context of the main. As you saw, nothing changed, all tests passed and we break nothing. Only the logic is simpler here.
The refactored fiber still does not violate the definition of asymmetric coroutine, the content in the link clarifies the differences between symmetry and asymmetry well.
Correct me if I am wrong.

@trowski
Copy link
Member

trowski commented Jun 3, 2021

Do we have any reason to protect multiple memory pages?

The protected page only triggers an error if memory is accessed. If a large chunk of memory were allocated on the stack it could exceed the size of a single page. I don't see this as a problem in php-src where 1 page should be more than adequate, though I can see the potential for users of a particular extension to use multiple guard pages.

The refactored fiber still does not violate the definition of asymmetric coroutine.

Internally it does, this PR uses a symmetric model as outlined in the link you provided. As mentioned in that link, asymmetric coroutines are the main abstraction provided by Boost.Coroutine2. I feel there is some wisdom in this and hesitate to deviate from that model. Swoole also appears to use asymmetric cotroutines both in its internal and user APIs.

It’s just a bit cumbersome for me to reorganize ideas and re-explain them in English. (It would be great if someone can understand my thoughts and help me with writing RFCs ^^)

I should have time tomorrow and this weekend to start splitting this up into smaller PRs. I'll definitely tag you as co-author on any commits. I'm happy to collaborate on future changes. Language has been a barrier in the past so I'm happy to do what I can to help.

@twose
Copy link
Member Author

twose commented Jun 3, 2021

If a large chunk of memory were allocated on the stack it could exceed the size of a single page.

I think what we need to guard against is recursive calls rather than big memory on the stack, and I did not see the code for multi-page memory protection in other coroutine projects. Maybe you can specify the reference of this idea, it is new to me.
The cost of protecting multiple pages is far greater than the benefit, and we will never know how much is appropriate. Perhaps the most common case is to alloc a large buffer on the stack, although it may push the pointer far beyond the stack boundary, we always write from the beginning, don’t we?

About symmetric and asymmetric, maybe you are entangled in these concepts, I only focused on its actual behaviour, I think these concepts are defined based on the behaviour of specified technological. I don't quite understand what you are worried about . And Swoole has never stated what kind of model its' coroutine is, it do not care.

I should have time tomorrow and this weekend to start splitting this up into smaller PRs.

You may have misunderstood what I meant, writing code is fine for me, I work on open source projects all the time. we still have a lot to do on fiber did not mean that I have no time, I just want to speed up the process of these changes, especially after you have already understood all of these changes ^^

@twose
Copy link
Member Author

twose commented Jun 4, 2021

Trowski : The changes got rather ugly. There were a couple of good ideas and bug fixes baked in there, but a lot of it is opinionated changes or simply a different implementation for the sake of being different.

I hope you can directly comment on my PR here instead of the chat room and I can't speak there.
I think my changes are for reasons, and many of them are conducive to subsequent optimization or expansion.
Projects like Swoole will never benefit from zend_fiber, you just provide APIs that you think are useful. In fact, you refuse to support anything meaningful to Swoole, even at the kernel level. I did not change any user-level behavior, which has no effect on amphp, unless you want to migrate more content of amphp to the PHP core.
You referred to the implementation of this project (https://github.com/kelunik/task/), this is why there are many places in the original code that you don't even understand, and you even never questioned it before you merged it into PHP. This makes me worry about your ability and the future of fiber.
The things that are different only for the sake of difference are what you propose, so many implementations referenced in your RFC such as Ruby, boost.context, Lua, all of them use yield as the method name. But you just use suspend. You named transfer.data as value instead of data. They protect one page but you expect more and it's not limited to these. Why? I do not know.
You always try to prove that we are symmetric and you are asymmetric which I do not think so, you have a language advantage so you can always make others believe what you say, this is indeed my shortcoming, I can't explain these well.
If we can understand each other, your code is ugly to me too, that's why I made so many code changes.
Why we always think Swoole is right? In China, PHP is the favorite programming language of startups, but after they have enough users, migration to golang has become an inevitable trend, only Swoole reversed this trend. Things like amphp won’t save PHP at all, you provide so many different things, we can not use cURL, PDO, mysqli, phpredis, blocking php stream, and all libraries built on them, you completely split the ecology. Why don't we migrate to nodejs/golang? They perform better and even have async file IO, async DNS, DTLS, quic and so on. We may never support these things. We just came from HTTP/1.0 to HTTP/1.1 and encountered problems 🥴. I do repeat my views, but I also think that some of you have never thought about it seriously.
I am very worried that after you tried to dominate the fiber and asynchronous development of the PHP core, PHP will forever lose its competitiveness in the asynchronous areas.

@twose
Copy link
Member Author

twose commented Jun 4, 2021

And It's interesting. Although I made it clear that I am willing to complete these tasks, you still committed part of the code I wrote, and you are so eager. I do not care whether I am co-author, it would be great if you can copy every thought of mine and commit them, then we can make zend_fiber play with Swoole. But you probably wouldn't do it. So, let zend_fiber and amphp have fun.

@twose twose closed this Jun 4, 2021
@twose twose deleted the fiber-refactor branch June 4, 2021 04:37
@thinker-gao
Copy link

The asynchronous programming of php should be more like go's coroutine, single-threaded coroutine is the best, obviously fiber is still on the way to pit, maybe we should learn from the experience of swoole and swow

@php php deleted a comment from yydick Jun 4, 2021
@trowski
Copy link
Member

trowski commented Jun 4, 2021

@twose I apologize for upsetting you, that was not my intention. I know you're capable of breaking up this PR, but I was unsure if you intended to do so, as I and others asked you to do that here and in other discussions. I too want to keep things moving forward, so I committed a portion of the changes as they weren't controversial and had clear advantages. There are other parts of this PR I would like to commit, but not all of them, which is why I wanted to break things into multiple PRs. I hope you'll still open more PRs focused on smaller changes at a time. I do want to work with you in the future, but please understand that we may have disagreements. The language you used feels like an attack on my motivations, and isn't the type of conversation that makes me want to continue trying to put the work in to address your concerns.

@twose
Copy link
Member Author

twose commented Jun 4, 2021

  1. What you say in the chat room is not friendly (at least, after translation, did I misunderstand?).
  2. I never tried to change for the sake of change, and I am also trying to reduce the noise.
  3. I have made some contributions that were not related to Swoole in PHP for years, this is my way of trying to fit in with the PHP core community. But I always feel that some of you define me as people from other camps. What you do makes me feel that you need me to do something but you do not need me. And some of you tried to avoid me by your game rules.
  4. What I mean is that maybe I can persuade you to change your mind so I won’t close this PR for the time being (not forever), but it doesn’t mean I don’t want to split the PR. Actually, I am trying to. And I'm also trying to figure all of the prolems out with you.

@trowski
Copy link
Member

trowski commented Jun 4, 2021

  1. Ugly was not the best word as this may have translated harsher than I intended. Noisy and complex is more in-line with what I meant.
  2. Changing synonymous variable names falls into this category.
  3. Association of people with other projects is common, it's not happening to just you, I promise. 😄
  4. I misunderstood then, but I do prefer the simpler approach in the other PR and plan to merge it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants