-
Notifications
You must be signed in to change notification settings - Fork 7.9k
JIT buffer relocation and 2~3% PHP performance gain #8618
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@dstogov, do you think this is worth pursuing? (it can't work on Windows, but maybe on other systems) |
I read that blog. Unfortunately, this PR is just a very basic PoC. Currently opcache tries to allocate SHM in the low 2GB using MAP_32BIT. Probably, we may improve this approach by searching for the best candidate for SHM analysing |
@dstogov Thanks Dmitry for the comments and good hint for a better way. I can work out a more workable patch. However, If you guys have the bandwidth to develop a quick mergeable patch, feel free to go ahead of me. Reason: I am quite new to PHP and still ramping up PHP source code, so it probably takes me a few months for development and need to consult you experts from time to time. |
@dstogov @cmb69 @ramsey I just pushed the proposed patch which is ready for review and merge. |
This is a JIT buffer relocation inspired by this blog https://v8.dev/blog/short-builtin-calls For 64-bit applications, branch prediction performance can be negatively impacted when the target of a branch is more than 4 GB away from the branch. We try to allocate opcache/JIT buffer just within 4GB of PHP text segment through mmap() with a calculated preferred memory address while creating segments. In our benchmark, we found PHP interpreter archieved 2~3% performance and much better branching performance for both 2MB huge pages and ordinary 4KB pages. Signed-off-by: Su, Tao <tao.su@intel.com> Signed-off-by: Wang, Xue <xue1.wang@intel.com> Tested-by: Wang, Xue <xue1.wang@intel.com> Reviewed-by: Chen, Hu <hu1.chen@intel.com> Reviewed-by: You, Lizhen <Lizhen.You@intel.com>
@dstogov @cmb69 @ramsey @arnaud-lb A brand-new patch has been uploaded and passed all CI checks. Ready for review.
|
Our benchmark with the latest patch shows steadily performance gain 1) 4kb pages +2.6%, and 2) huge pages +3.0% |
Thank you @stkeke. This was nice to review. This looks good to me apart from a few questions and nit picks. (I want to see Dmitry's review as well) For information, how do you run the benchmarks ? |
Thanks for the careful review and catches/questions. They are valuable to code quality and maintainability. I simply answered a few of them, and will give you more update next Monday (Beijing time). |
1) fix bugs captured by arnad-lb and by ourselves 2) unify coding style and convert tab to space 3) remove unnecessary function declaration 4) eliminate duplicated code 5) clarify code with more comments Arnaud-lb's comments: #8618 Signed-off-by: Su, Tao <tao.su@intel.com> Reviewed-by: Wang, Xue xue1.wang@intel.com Tested-by: Wang, Xue xue1.wang@intel.com
@arnaud-lb I created a new patch which includes all the corrections/updates according to your comments and minor issues found by us. No big program logic changes. As of our benchmark, we are actually maintaining a PHP benchmark framework based on Wordpress/MediaWiki with our best-known PHP configurations. The performance indicator is TPS (transaction per second). Some of benchmarks are already open sourced at here: https://github.com/intel/iodlr/tree/master/containers/wordpress/; more on the way this year, we are speeding up. Sorry for not being able to provide detailed performance data here. so far, we have not got legal department approval. |
In my tests |
@dstogov Thanks for the information. We have also written a simple test C program and verified that it will not block heap growth after mmap()'ing some memory immediately following [heap]. So we can now confidentially search BEFORE and AFTER PHP .text segment without worrying about heap things. We will update our patch and enhance searching soon... |
…P-relative calls and jumps This implementation is based on php#8618
The same idea is implemented via 17aa81a |
This is a JIT buffer relocation inspired by this blog
https://v8.dev/blog/short-builtin-calls
We try to allocate opcache/JIT buffer just prior to PHP .text
segments through mmap() with a calculated preferred memory address
while creating segments.
In our benchmark, we found PHP interpreter archieved 2-3% performance
and much better branching performance with both 2MB huge pages and
ordinary 4KB pages.
Signed-off-by: Su, Tao tao.su@intel.com
Tested-by: Wang, Xue xue1.wang@intel.com
Reviewed-by: Chen, Hu hu1.chen@intel.com
Reviewed-by: You, Lizhen Lizhen.You@intel.com