-
Notifications
You must be signed in to change notification settings - Fork 7.9k
More performance increases for mbstring functions on UTF-8 text #10313
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I must say I was hoping for better than a 3x speedup here... sigh. |
Can you please fix CI errors? Does this need new unit tests or the existing ones cover it? |
A 3x speedup looks rather considerable to me. :) |
Let me do that. This change is well covered by (extensive) existing unit tests. |
Yeah, it is... it would have been nice to get 10x or better, but it's still decent. @cmb69 @nikic @Girgias @dstogov Do we have anything in PHP like what the Linux kernel does, whereby it does a CPUID on startup, checks what ISA extensions the host CPU supports, and patches in accelerated versions of some functions based on the result? If not, are you interested in doing something like that? It makes it possible for packagers to create binaries with broad support for older CPUs, while still taking advantage of ISA extensions on newer CPUs. |
See https://github.com/php/php-src/blob/master/Zend/zend_cpuinfo.h. (The obvious drawback of using this is that it's unlikely that all variations are tested in CI.) |
This is clearly out of my depth here as I have no idea what any of these SSE2 instruction do, so I will trust the test suite and everything works |
The intention here is that any experienced programmer should be able to understand what is going on by reading the code comments, without having to consult the Intel intrinsics documentation. Are there some places where I could make the comments clearer? |
I don't think that the concern is whether the comments are understandable. The nature of code comments is that they often lie. Not on purpose but because code comments describe what the author intended to do, not necessarily what the code actually does. It would be good if an SSE expert could verify that the functions you used actually do what the code comments say they should do. |
Cool, thanks! It looks like there are just a couple users of this feature, including Base64 encoding/decoding, CRC32, and a couple of built-in string functions. Just noticed something really interesting here... Lines 3878 to 3931 in abc41c2
Although this accelerated function is only used if the host CPU supports SSE4.2... there are no SSE4.2 instructions there! Everything is just SSE2, which (as @nielsdos kindly educated me the other day) is automatically supported by all x86-64 CPUs. |
Off topic, but this is an interesting read: Convert a String In C++ To Upper Case |
Yeah, unfortunately a lack of developer bandwidth.
I think you are right! Would be great if we could actually test that, but it may be very hard to find a machine which only supports SSE2, but no other SIMD extensions nowadays. |
You can test that statically on GCC at compile time by passing in |
Nice suggestion, I may try that. |
I used both the compiler trick and Intel's Intrinsics Guide and can confirm that that stripslashes function indeed only relies on SSE2 and not SSE4.x |
…UTF-8 The new SSE2-based implementation of mb_check_encoding for UTF-8 is about 10% faster for 0-5 byte strings, more than 3 times faster for ~100-byte strings, and just under 4 times faster for ~10,000-byte strings. I believe it may be possible to make this function much faster again. Some possible directions for further performance optimization include: • If other ISA extensions like AVX or AVX-512 are available, use a similar algorithm, but process text in blocks of 32 or 64 bytes (instead of 16 bytes). • If other SIMD ISA extensions are available, use the greater variety of available instructions to make some of the checks tighter. • Even if only SSE/SSE2 are available, find clever ways to squeeze instructions out of the hot path. This would probably require a lot of perusing instruction mauals and thinking hard about which SIMD instructions could be used to perform the same checks with fewer instructions. • Find a better algorithm, possibly one where more checks could be combined (just as the current algorithm combines the checks for certain overlong code units and reserved codepoints).
On longer strings, this gives a small speed boost of 10% or less.
I just tried fuzzing this accelerated UTF-8 validation function to look for cases where its output differs from the unaccelerated one... and guess what, the fuzzer found a bug. After fixing the bug, I was able to run the fuzzer for an extended period of time without finding any other bugs. |
I have added a regression test for the bug which was found by the fuzzer. Since there are some code changes, I will leave this for a bit longer to see if anyone wants to review again, but if not, I will land on |
Looks like no more comments are forthcoming... Landed on Many thanks to all who commented. I'm not sure if I will clean up the issue with |
Alex Dowad noticed[1] that the SIMD stripslashes implementation actually only depended on SSE2, and not on SSE4.2 instructions. Remove the checking for SSE4.2 and only check for SSE2. This also greatly simplifies the supporting code. [1] php#10313 (comment)
Alex Dowad noticed[1] that the SIMD stripslashes implementation actually only depended on SSE2, and not on SSE4.2 instructions. Remove the checking for SSE4.2 and only check for SSE2. This also greatly simplifies the supporting code. [1] php#10313 (comment)
Alex Dowad noticed[1] that the SIMD stripslashes implementation actually only depended on SSE2, and not on SSE4.2 instructions. Remove the checking for SSE4.2 and only check for SSE2. This also greatly simplifies the supporting code. [1] #10313 (comment)
FYA @cmb69 @Girgias @nikic @kamil-tekiela @youkidearitai