Skip to content

Implement INIT_FCALL IS_PTR specialization #13518

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

iluuu1994
Copy link
Member

@iluuu1994 iluuu1994 commented Feb 26, 2024

Alternative to GH-13445.

@dstogov Let me know if you prefer this approach.

@iluuu1994
Copy link
Member Author

I also added support for user functions just now, it wasn't very hard. After this, the optimizer may switch to IS_PTR directly if the right function is discovered later.

Methods may be a bit more tricky, since they may be loaded out of shm, and then persisted again with inheritance cache. The pointers will need to be corrected in both cases.

@iluuu1994
Copy link
Member Author

I did some more benchmarking, and the user function spec does not seem profitable. That's likely because we need to check whether the runtime cache is initialized.

Details

repeat 50 { taskset --cpu-list 0 sapi/cgi/php-cgi -d opcache.enable=1 -T10,100 benchmark/repos/wordpress-6.2/index.php > /dev/null }

Neither:
3.762726 + 3.768535 + 3.771041 + 3.793699 + 3.805442 + 3.815612 + 3.837482 + 3.838954 + 3.860690 + 3.865533 + 3.866347 + 3.871582 + 3.872912 + 3.895969 + 3.896959 + 3.911085 + 3.923052 + 3.927516 + 3.935412 + 3.939109 + 3.939473 + 3.947130 + 3.950068 + 3.958032 + 3.958504 + 3.961674 + 3.964286 + 3.967680 + 3.967994 + 3.970991 + 3.972464 + 3.974816 + 3.984622 + 3.989775 + 3.995528 + 4.007374 + 4.007675 + 4.007903 + 4.013771 + 4.014148 + 4.014702 + 4.019993 + 4.022132 + 4.026598 + 4.039633 + 4.043646 + 4.080778 + 4.082001 + 4.111558 + 4.113607
Mean: 3.94536426

User:
3.773289 + 3.786341 + 3.805858 + 3.807011 + 3.827118 + 3.845064 + 3.847427 + 3.865596 + 3.871730 + 3.880164 + 3.883483 + 3.901640 + 3.907380 + 3.908480 + 3.912329 + 3.921006 + 3.921376 + 3.922029 + 3.923276 + 3.930067 + 3.931526 + 3.932408 + 3.932424 + 3.936956 + 3.940837 + 3.943710 + 3.946612 + 3.957370 + 3.962609 + 3.965309 + 3.968274 + 3.974549 + 3.978358 + 3.979959 + 3.981013 + 3.990681 + 3.991379 + 3.992361 + 3.999986 + 4.000727 + 4.003596 + 4.010576 + 4.019616 + 4.022690 + 4.026571 + 4.036277 + 4.046699 + 4.057316 + 4.065969 + 4.121929
Mean: 3.94317902

Internal:
3.481124 + 3.714152 + 3.745230 + 3.761990 + 3.764559 + 3.784365 + 3.791994 + 3.801843 + 3.812558 + 3.813365 + 3.818387 + 3.836053 + 3.840914 + 3.847869 + 3.850975 + 3.852203 + 3.855373 + 3.858980 + 3.869898 + 3.882274 + 3.888404 + 3.893990 + 3.917226 + 3.920044 + 3.920878 + 3.921135 + 3.927452 + 3.939740 + 3.940730 + 3.944614 + 3.946942 + 3.956156 + 3.956342 + 3.959493 + 3.963957 + 3.968221 + 3.971289 + 3.978916 + 3.982685 + 3.983679 + 3.990739 + 3.997215 + 4.017127 + 4.019084 + 4.027841 + 4.035953 + 4.050415 + 4.064969 + 4.083900 + 4.092450
Mean: 3.90491384

Both:
3.547845 + 3.699903 + 3.721320 + 3.776205 + 3.796750 + 3.804656 + 3.809363 + 3.813012 + 3.813258 + 3.814128 + 3.834106 + 3.847160 + 3.851038 + 3.851900 + 3.855668 + 3.858115 + 3.859298 + 3.860511 + 3.861609 + 3.868758 + 3.890588 + 3.894022 + 3.901472 + 3.902939 + 3.903822 + 3.906937 + 3.907018 + 3.907107 + 3.909948 + 3.917986 + 3.927443 + 3.929114 + 3.930116 + 3.933649 + 3.940895 + 3.942536 + 3.944273 + 3.948556 + 3.962118 + 3.962158 + 3.981574 + 3.989901 + 3.991792 + 4.000190 + 4.009081 + 4.011751 + 4.015655 + 4.017385 + 4.043893 + 4.058971
Mean: 3.89454986

I think we may have more luck with methods, because they perform a polymorphic cache slot check, along with a staticness check. Both of these can be avoided, while we still need a runtime cache init check for the callee.

@iluuu1994
Copy link
Member Author

iluuu1994 commented Mar 3, 2024

@dstogov Ok, please have a look at this alternative implementation. It adjusts zend_vm_gen.php so that some specialized handlers are used even without opcache. zend_vm_get_opcode_handler_spec_ex, which picks specialized handlers, is inlined using either the actual type info, or with MAY_BE_ANY|.... This will get rid of most of these conditions at compile time when the information isn't actually available.

I'm not sure if this approach is better. What it did make trivial is adding another specialized handler for internal calls that drops init_func_run_time_cache. With the other approach, I would not add this optimization. Instead, I would either completely drop user functions, or only allow them for functions with zend_op_array.cache_size == 0. The majority of the improvement came from internal functions in my benchmarks.

Also, I benchmarked this for method calls, but unfortunately they weren't really profitable.

@dstogov
Copy link
Member

dstogov commented Mar 4, 2024

I'm not sure if this approach is better. What it did make trivial is adding another specialized handler for internal calls that drops init_func_run_time_cache. With the other approach, I would not add this optimization. Instead, I would either completely drop user functions, or only allow them for functions with zend_op_array.cache_size == 0. The majority of the improvement came from internal functions in my benchmarks.

Also, I benchmarked this for method calls, but unfortunately they weren't really profitable.

I'm not sure if it makes sense to implement this idea for internal functions only.
The patch is not very complex, and removing support for user functions would make it even simpler (remove weird xlat_ptrs), but anyway it misses filecache support and makes more fragmentation for Windows.

@iluuu1994
Copy link
Member Author

Ok. So do you think the approach with zend_vm_gen.php is ok? I didn't benchmark without opcache, I can do that to make sure there's no regression. I would expect almost all checks to be elided.

@dstogov
Copy link
Member

dstogov commented Mar 4, 2024

Ok. So do you think the approach with zend_vm_gen.php is ok?

I don't see problems with zend_vm_gen.php modification.

I didn't benchmark without opcache, I can do that to make sure there's no regression. I would expect almost all checks to be elided.

I wouldn't care about +/-1% difference without opcache.

@iluuu1994
Copy link
Member Author

weird xlat_ptrs

Do you have a suggestion on how to improve this? I kept the name generic in case this is ever used for something else (e.g. class pointers). I can make the name more specific too if it helps with clarity.

@iluuu1994
Copy link
Member Author

Results w/o opcache: ≈ +0.6%. That seems acceptable.

Details

repeat 50 { taskset --cpu-list 0 php-cgi -d opcache.enable=0 -T50 benchmark/repos/symfony-demo-2.2.3/public/index.php > /dev/null }

Before:
5.179320 + 5.180669 + 5.181782 + 5.182677 + 5.182747 + 5.182867 + 5.183959 + 5.184205 + 5.184520 + 5.184898 + 5.184955 + 5.185163 + 5.185524 + 5.185919 + 5.186071 + 5.186154 + 5.186170 + 5.186516 + 5.186654 + 5.186699 + 5.186725 + 5.186754 + 5.186824 + 5.186845 + 5.187235 + 5.187375 + 5.187570 + 5.187602 + 5.187783 + 5.187903 + 5.188775 + 5.188818 + 5.189023 + 5.189539 + 5.189575 + 5.190156 + 5.190673 + 5.191029 + 5.191423 + 5.191448 + 5.193231 + 5.194073 + 5.194680 + 5.194789 + 5.196229 + 5.198124 + 5.199429 + 5.201007 + 5.202640 + 5.223372
Mean: 5.18916236

After:
5.182583 + 5.182629 + 5.182842 + 5.183502 + 5.183816 + 5.183834 + 5.184461 + 5.184504 + 5.184643 + 5.184873 + 5.185626 + 5.185970 + 5.186039 + 5.186207 + 5.187356 + 5.187569 + 5.187717 + 5.187831 + 5.188137 + 5.188225 + 5.188314 + 5.188420 + 5.188450 + 5.188572 + 5.188796 + 5.188857 + 5.189181 + 5.189486 + 5.189542 + 5.189784 + 5.189879 + 5.189961 + 5.190795 + 5.191420 + 5.191594 + 5.191917 + 5.192271 + 5.192301 + 5.192666 + 5.193100 + 5.193195 + 5.193430 + 5.196551 + 5.196732 + 5.197515 + 5.197951 + 5.201427 + 5.206127 + 5.208891 + 5.304555
Mean: 5.19220088

@dstogov
Copy link
Member

dstogov commented Mar 5, 2024

weird xlat_ptrs

Do you have a suggestion on how to improve this? I kept the name generic in case this is ever used for something else (e.g. class pointers). I can make the name more specific too if it helps with clarity.

In case of internal functions only, we don't need these arrays at all.

@dstogov
Copy link
Member

dstogov commented Mar 5, 2024

As I already said, I'm not sure if this INIT_FCALL optimization makes sense for internal functions only.

We already thought about similar optimization long time ago, but decided that it doesn't cost the complication and fragmentation.

~0.8% percent improvement on real life apps with internal functions only looked promising, but as you discovered this idea doesn't work for user functions and methods.

@iluuu1994
Copy link
Member Author

Closed in favor of #13634.

@iluuu1994 iluuu1994 closed this Mar 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants