Improve performance of mbfl_name2encoding() by using perfect hashing #12707
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
mbfl_name2encoding() uses a linear loop through the encodings, comparing the name one by one, which is very slow. For the benchmark [1] just looking up the name takes about 50% of run-time.
By using perfect hashing instead, we no longer have to loop over the list, and the number of string comparisons is reduced to just a single one. The perfect hashing table is generated using GNU gperf and amended manually to fit in with mbstring and manually changed to reduce the cache size.
Previously the linked benchmark took 2.39s on my system, now it takes 1.50s. This is a nice improvement that can be felt when processing large amounts of data.
[1] #12684 (comment)