Improve performance of mbfl_name2encoding() by using perfect hashing #12707

nielsdos · 2023-11-17T16:44:49Z

mbfl_name2encoding() uses a linear loop through the encodings, comparing the name one by one, which is very slow. For the benchmark [1] just looking up the name takes about 50% of run-time.

By using perfect hashing instead, we no longer have to loop over the list, and the number of string comparisons is reduced to just a single one. The perfect hashing table is generated using GNU gperf and amended manually to fit in with mbstring and manually changed to reduce the cache size.

Previously the linked benchmark took 2.39s on my system, now it takes 1.50s. This is a nice improvement that can be felt when processing large amounts of data.

[1] #12684 (comment)

mbfl_name2encoding() uses a linear loop through the encodings, comparing the name one by one, which is very slow. For the benchmark [1] just looking up the name takes about 50% of run-time. By using perfect hashing instead, we no longer have to loop over the list, and the number of string comparisons is reduced to just a single one. The perfect hashing table is generated using GNU gperf and amended manually to fit in with mbstring and manually changed to reduce the cache size. [1] php#12684 (comment)

alexdowad · 2023-11-17T17:51:59Z

Nice job!!

I haven't looked at this code for a while, but I seem to recall that it cached the last looked-up encoding and checked if the next requested one was the same using something like strcmp. Or did it just use a pointer equality comparison on the string (so the optimization would only work if the encoding name was interned)? Don't remember. Or perhaps I am thinking about something completely different?

nielsdos · 2023-11-17T18:38:33Z

Thanks for the review Alex!
The caching seems to be for calls to php_mb_get_encoding, and it only caches the last used encoding.
I'm not sure if the benchmark actually calls php_mb_get_encoding, but even if it does, it is alternating between different encodings so it doesn't benefit from the caching.

nielsdos requested a review from alexdowad as a code owner November 17, 2023 16:44

github-actions bot added the Extension: mbstring label Nov 17, 2023

nielsdos mentioned this pull request Nov 17, 2023

PHP8.1 Slow processing speed of string operations #12684

Closed

alexdowad approved these changes Nov 17, 2023

View reviewed changes

nielsdos merged commit 7658220 into php:master Nov 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve performance of mbfl_name2encoding() by using perfect hashing #12707

Improve performance of mbfl_name2encoding() by using perfect hashing #12707

Uh oh!

nielsdos commented Nov 17, 2023

Uh oh!

alexdowad commented Nov 17, 2023

Uh oh!

nielsdos commented Nov 17, 2023

Uh oh!

Uh oh!

Improve performance of mbfl_name2encoding() by using perfect hashing #12707

Improve performance of mbfl_name2encoding() by using perfect hashing #12707

Uh oh!

Conversation

nielsdos commented Nov 17, 2023

Uh oh!

alexdowad commented Nov 17, 2023

Uh oh!

nielsdos commented Nov 17, 2023

Uh oh!

Uh oh!