Skip to content

Fix the BOLT problem with GCC10. #8495

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Conversation

pingzhaozz
Copy link

BOLT method can bring PHP 3% and even more performance improvement. While BOLT requires CFLAGS ‘-fno-reorder-blocks-and-partition’ from GCC9. More information can be found at following link:
https://github.com/facebookincubator/BOLT
facebookarchive/BOLT#288

In GCC7, ‘reorder-blocks-and-partition’ is disable by default. So there’s no problem by default with GCC7.
In GCC10, ‘reorder-blocks-and-partition’ is enabled by default. We need set CFLAGS ‘-fno-reorder-blocks-and-partition’ when using BOLT. But there’s still a bug in code which ignore the CFLAGS with ‘GCC push_options’ like below example:

#if (ZEND_VM_KIND != ZEND_VM_KIND_CALL) && (ZEND_GCC_VERSION >= 4000) && !defined(__clang__)
# pragma GCC push_options
# pragma GCC optimize("no-gcse")
# pragma GCC optimize("no-ivopts")
#endif
ZEND_API void execute_ex(zend_execute_data *ex)

This patch is to fix this issue.

BOLT requires adding GCC option "-fno-reorder-blocks-and-partition".
In some codes, it will be omitted because "GCC push_options".
This patch is to handle this case correctly.
@arnaud-lb
Copy link
Member

Thank you for this pull request.

This would need a benchmark comparing the speed before, after-without-bold, and after-with-bolt so that this could be merged.

https://github.com/kocsismate/php-version-benchmarks could be used for that.

@@ -2934,6 +2934,7 @@ enum XXH_VECTOR_TYPE /* fake enum */ {
&& defined(__OPTIMIZE__) && !defined(__OPTIMIZE_SIZE__) /* respect -O0 and -Os */
# pragma GCC push_options
# pragma GCC optimize("-O2")
# pragma GCC optimize("no-reorder-blocks-and-partition")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are all of these changes actually needed, or only this one (because it overrides with -O2)? If it's only this one that's really necessary, I think we can just drop the GCC optimize usage here entirely.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Seems -O2 is not necessary here, we should remove this logic.

@pingzhaozz
Copy link
Author

pingzhaozz commented May 12, 2022

Thank you for this pull request.

This would need a benchmark comparing the speed before, after-without-bold, and after-with-bolt so that this could be merged.

https://github.com/kocsismate/php-version-benchmarks could be used for that.

Thanks for the comment. I tried to run the php-version-benchmarks with PHP8.1, but got 2 problems:

  1. Container test use debian:buster which seems not support GCC 10.
  2. With host mode test, all the results look very similar. I suppose the load is not high enough to see bolt benefit.

Total 0.629029
Elapsed time: 9.438301 sec

Could you tell me how to run php-version-benchmarks with more loadings? From the README I only find run command : ./benchmark.sh run local with test 4_bench.php

@kocsismate
Copy link
Member

I'll update the benchmark to use GCC 10 both in Docker and on AWS. Then I'll benchmark this branch on AWS against the latest master commit your branch is based on (pingzhaozz@829f297).

@kocsismate
Copy link
Member

I was not able to observe any change when "no-reorder-blocks-and-partition" is set. Can it be related to the fact that I ran the benchmark on ARM?

The results are available at https://gist.github.com/kocsismate/58853e9ab0faaebb6112287d880170ea

I suppose the load is not high enough to see bolt benefit.

Why load is important here? The aim of my benchmark suite is to reveal performance changes even if they are subtle, but this is not suitable for load testing.

@pingzhaozz
Copy link
Author

Add no-reorder-blocks-and-partition is for BOLT building, while BOLT is more complex than that. Your result perhaps shows after-without-bolt.

The detail BOLT step is as below:
https://github.com/facebookincubator/BOLT/tree/main/bolt

Load test is to find out the hot codes which to do the BOLT later.

@pingzhaozz
Copy link
Author

Is there any comments from review?

@devnexen devnexen requested a review from arnaud-lb June 28, 2022 18:21
@arnaud-lb
Copy link
Member

@pingzhaozz as per #8495 (comment), does it work if you remove all the pragma GCC optimize("no-reorder-blocks-and-partition"), and pragma GCC optimize("-O2") ?

@pingzhaozz
Copy link
Author

pingzhaozz commented Jul 1, 2022

@pingzhaozz as per #8495 (comment), does it work if you remove all the pragma GCC optimize("no-reorder-blocks-and-partition"), and pragma GCC optimize("-O2") ?

@arnaud-lb
For BOLT case, BOLT only works if either remove 'pragma GCC push_options' in the code or changes like this PR.
For non-BOLT case, I don't see impact of the change.

@arnaud-lb
Copy link
Member

But does it work if you don't add pragma GCC optimize("no-reorder-blocks-and-partition") in the code, and just remove the pragma GCC optimize("-O2") ?

@pingzhaozz
Copy link
Author

But does it work if you don't add pragma GCC optimize("no-reorder-blocks-and-partition") in the code, and just remove the pragma GCC optimize("-O2") ?

It's not. BOLT requires GCC option -fno-reorder-blocks-and-partition.

@@ -4,6 +4,7 @@
# pragma GCC push_options
# pragma GCC optimize("no-gcse")
# pragma GCC optimize("no-ivopts")
# pragma GCC optimize("no-reorder-blocks-and-partition")
Copy link
Member

@dstogov dstogov Oct 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The flag -fno-reorder-blocks-and-partition should be passed to CC through CFLAGS in Makefile.

  • Using this line here won't disable this optimization for other files.
  • This disables useful optimization for usual PHP builds

Copy link
Member

@dstogov dstogov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible to change PHP build system to optionally add -fno-reorder-blocks-and-partition flag. It's also possible to pass it through CC or CFLAGS environment variables without any changes.

@pingzhaozz
Copy link
Author

pingzhaozz commented Oct 9, 2023

It's possible to change PHP build system to optionally add -fno-reorder-blocks-and-partition flag. It's also possible to pass it through CC or CFLAGS environment variables without any changes.

Because of this line:

# pragma GCC push_options

-fno-reorder-blocks-and-partition flag won't work for some functions in building. This error in BOLT is hard to root cause. So I manually added the flag back as this PR.

@dstogov
Copy link
Member

dstogov commented Oct 9, 2023

# pragma GCC push_options

This line shouldn't enable reorder-blocks-and-partition optimization if it was disabled by -fno-reorder-blocks-and-partition flag. I wander how these lines way affect BOLT (I understand how this optimization may make troubles). Can you explain? Did you say this is a "error in BOLT"? I didn't understand.

Anyway, we can't disable optimizations and make PHP slower to make BOLT work.

In worst case, this new pragma GCC optimize("no-reorder-blocks-and-partition") might be wrapped with #ifdef HAVE_BOLT and -DHAVE_BOLT CFLAGS added.

@pingzhaozz
Copy link
Author

pingzhaozz commented Oct 9, 2023

Let me go back to the beginning what happened when using BOLT. BOLT requires CFLAGS with -fno-reorder-blocks-and-partition from GCC9(because GCC9 enables -freorder-blocks-and-partition by default). Here's the original discussion: facebookarchive/BOLT#288

When I tried to use BOLT in PHP, I met the problem that some functions(such as those changed in this PR) doesn't work because of

# pragma GCC push_options

which would overwrite the CFLAGS like -fno-reorder-blocks-and-partition so that BOLT would fail with PHP since GCC9.

I agree that pragma GCC optimize("no-reorder-blocks-and-partition") might be wrapped with #ifdef HAVE_BOLT and -DHAVE_BOLT CFLAGS added. I'll update the PR if you agreed it's a proper way to solve this issue.

@dstogov
Copy link
Member

dstogov commented Oct 11, 2023

Here's the original discussion: facebookarchive/BOLT#288

This doesn't talk anything about pragmas.

I still think that something is wrong...
Are you sure you need adding pragma GCC optimize("no-reorder-blocks-and-partition") in all these places?

It should be necessary only in ext/hash/xxhash/xxhash.h

@pingzhaozz
Copy link
Author

Here's the original discussion: facebookarchive/BOLT#288

This doesn't talk anything about pragmas.

I still think that something is wrong... Are you sure you need adding pragma GCC optimize("no-reorder-blocks-and-partition") in all these places?

It should be necessary only in ext/hash/xxhash/xxhash.h

The link is the discussion about BOLT requires -fno-reorder-blocks-and-partition CFLAGS history. It explains where is this CFLAGS requirement from.

Since BOLT detects the hot code and move them, it depends on your workloads which functions will be hot. So it could be only xxhash.h in your case and like crc32.c in other cases.

@dstogov
Copy link
Member

dstogov commented Oct 16, 2023

In ext/hash/xxhash/xxhash.h you can see -O2 that may enable reorder-blocks-and-partition and I can understand the addition of -fno-reorder-blocks-and-partition. crc32 and Zend/zend_vm_execute.skl don't enable reorder-blocks-and-partition, so I don't understand why you disable it.

…e it's for BOLT only.

You can use EXTRA_CFLAGS="-DHAVE_BOLT ..." make -j to enable this.

Signed-off-by: Ping Zhao <ping.zhao@intel.com>
@pingzhaozz
Copy link
Author

In ext/hash/xxhash/xxhash.h you can see -O2 that may enable reorder-blocks-and-partition and I can understand the addition of -fno-reorder-blocks-and-partition. crc32 and Zend/zend_vm_execute.skl don't enable reorder-blocks-and-partition, so I don't understand why you disable it.

Other files will use default GCC option settings after GCC push_options , and for GCC9 reorder-blocks-and-partition will be enabled by default.(not by -O2) So we also need to manually disable it.

I updated code with #ifdef HAVE_BOLT, so that user can use EXTRA_CFLAGS="-DHAVE_BOLT ..." make -j to enable it or not.

@dstogov
Copy link
Member

dstogov commented Oct 16, 2023

Other files will use default GCC option settings after GCC push_options , and for GCC9 reorder-blocks-and-partition will be enabled by default.(not by -O2) So we also need to manually disable it.

GCC push_options should not reset GCC options to default. It should only remember the current setting to restore them on #pragma GCC pop_options.

I updated code with #ifdef HAVE_BOLT, so that user can use EXTRA_CFLAGS="-DHAVE_BOLT ..." make -j to enable it or not.

Did you test it? There is a merge conflict...

I've tried to test BOLT...

  • I built main from github main branch.
  • configured PHP muster branch with CC='gcc -fno-reorder-blocks-and-partition' CXX='g++ -fno-reorder-blocks-and-partition'
  • Added --emit-relocs PHP Makefile EXTRA_LDFLAGS_PROGRAM = -Wl,--emit-relocs ...
  • built PHP (without this patch yet)
  • run the load perf record -e cycles:u -j any,u -o perf.data -- sapi/cgi/php-cgi -d opcache.jit=0 -T1000 /var/www/html/bench/wordpress/index.php > /dev/null
  • instrumented with bolt llvm-bolt sapi/cgi/php-cgi -instrument -o sapi/cgi/php-cgi.bolt I didn't see any reorder-blocks-and-partition related warnings
  • tried to run sapi/cgi/php-cgi.bolt -d opcache.jit=0 -T1000 /var/www/html/bench/wordpress/index.php > /dev/null

Segmentation fault (core dumped)

  • now I applied your patch, rebuilt PHP with -DHAVE_BOLT, rerun perf and instrumentation. The result is the same:
$ gdb --args sapi/cgi/php-cgi.bolt -d opcache.jit=0 -T1000 /var/www/html/bench/wordpress/index.php
(gdb) r
Program received signal SIGSEGV, Segmentation fault.
0x000000000264f706 in sapi_register_post_entry ()
(gdb) bt
#0  0x000000000264f706 in sapi_register_post_entry ()
#1  0x000000000264f8fc in sapi_register_post_entries ()
#2  0x0000000002655544 in php_setup_sapi_content_types ()
#3  0x0000000001e44287 in main ()

What did I do wrong?
The patch seems don't fix the bolt build.

@pingzhaozz
Copy link
Author

pingzhaozz commented Oct 17, 2023

GCC push_options should not reset GCC options to default. It should only remember the current setting to restore them on #pragma GCC pop_options.

GCC push_options will push the CFLAGS -fno-reorder-blocks-and-partition until restore it on #pragma GCC pop_options. The problem happened between push and pop, where CFLAGS is reset to 'default'(no CFLAGS setting working, like -fno-reorder-blocks-and-partition)

run the load perf record -e cycles:u -j any,u -o perf.data -- sapi/cgi/php-cgi -d opcache.jit=0 -T1000 /var/www/html/bench/wordpress/index.php > /dev/null

I think this workload is too small for perf to record enough data. I record perf data at least for 30s with php-fpm. This is my build cmd for reference:

$EXTRA_CFLAGS="-DHAVE_BOLT -g -fno-reorder-blocks-and-partition"  LDFLAGS="-Wl,--emit-relocs,-znow"  make -j
$perf record -e cycles:u -j any,u -a -o perf-php-fpm.fdata
$llvm-bolt  ./php-fpm  -o ./php-fpm.bolt  -data=/workspace/perf-php-fpm.fdata  -reorder-blocks=cache+  -reorder-functions=hfsort+  -split-functions=3  -split-all-cold  -split-eh  -dyno-stats

I checked both PHP 7.4.29 which I found this issue and PHP 8.1.4 which I submitted this PR, both works after change. I didn't find any tag for 8.2, could you tell me how to pull a 8.2 taged version?

@pingzhaozz
Copy link
Author

A quick check on PHP 8.2.6 is ok, but fail with 8.2.10 which use different GCC/GLIBC version that isn't compatible with my environment.

@dstogov
Copy link
Member

dstogov commented Oct 18, 2023

I understood my mistake. I run bolt -instrument...

Now I followed your build instructions and it seems work out of the box (without this patch).
BOLT makes about 4% speedup on the trained Wordpress request(s).

I'm not sure if your patch is necessary now.
May be the problem is related to GCC or BOLT versions. I use GCC 12.3.1 and BOLT built from the main LLVM branch.

What kind of error(s) do you see without the patch?

@pingzhaozz
Copy link
Author

The BOLTed binary will crash.
If GCC/BOLT fix the -fno-reorder-blocks-and-partition problem, we don't need this patch anymore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants