-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Proposal: Optimize creation of empty arrays in json_decode #4861
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Use the shared empty array from ZVAL_EMPTY_ARRAY For code that created an 10 arrays of 100000 empty arrays (has the same result with `$assoc=true` and `{}`) - This is the worst-case comparison, but I'd expect 0-length arrays to be fairly common in regular data for json_decode - The parser implementation was using function pointers so that third party extension developers could reuse the json parser for their own data structures, etc. (I think). This PR is meant to let those third party extensions continue working without changes. Before this patch: In 0.126 seconds: added 97.99 MiB After this patch: In 0.096 seconds: added 41.99 MiB ```php <?php $json = '[' . str_repeat('[],', 100000) . "null]"; $start_memory = memory_get_usage(); $start_time = microtime(true); $result = []; for ($i = 0; $i < 10; $i++) { $result[] = json_decode($json); } $end_memory = memory_get_usage(); $end_time = microtime(true); // Before this patch: In 0.126 seconds: added 97.99 MiB // After this patch: In 0.096 seconds: added 41.99 MiB printf("In %.3f seconds: added %.2f MiB\n", $end_time - $start_time, ($end_memory - $start_memory)/1000000); // For objects $json = '[' . str_repeat('{},', 100000) . "null]"; $start_memory = memory_get_usage(); $start_time = microtime(true); for ($i = 0; $i < 10; $i++) { $result[] = json_decode($json, true); } $end_memory = memory_get_usage(); $end_time = microtime(true); // Before this patch: In 0.126 seconds: added 97.99 MiB // After this patch: In 0.096 seconds: added 41.99 MiB printf("In %.3f seconds: added %.2f MiB (objects decoded as arrays) \n", $end_time - $start_time, ($end_memory - $start_memory)/1000000); ```
Maybe @bukka can point us towards where these parser handlers are used outside of php-src? I think given that they exist, we should probably be adding |
I saw https://github.com/bukka/php-jsond - but that's not it, since the latest versions of json are what use the code from jsond. https://github.com/bukka/php-jsond/blob/master/php_jsond_parser.h
I don't see anything else on github for it: https://github.com/search?q=php_json_parser_methods+-filename%3Ajson_parser.tab.c+-filename%3Ajson_parser.y++-filename%3Aphp_json_parser.h&type=Code (could possibly be a repo that google doesn't index, or closed source, or nothing at all at the moment) |
The parser methods were added for https://github.com/php/pecl-database-mysql_xdevapi . Not sure if and how it is used in there though. Maybe @johannes or @marinesovitch can give more info. |
It can still be used by some 3rd party code though as it's now exposed API so I would agree to add create_empty_array / create_empty_object. |
It seems like the reference to php_json_parser_init_ex was removed in php/pecl-database-mysql_xdevapi@f73b37f#diff-278d5d76d931c6a89df0c984fc137809 Adding a separate array_create_empty() would potentially break code that only implemented overrode array_create() when switching to php 8.0 and be unintuitive - instead of the custom behavior on empty arrays, it would use the default php implementation. E.g. the example took the default methods and added a single helper to update an object. A hypothetical example that overrid array_create and array_update instead would break.
|
If there are no known users of this extension point, then we should drop it in master. These kinds of fine-grained extension points are rather problematic and pretty much the reason why we're afraid to do any changes in the mysqlnd codebase, which is full of this. If there's no good reason to have them, it's best not to have them and save us the second-guessing of what people might be doing with them. |
I'd agree with this point - a major php version seems like it should be acceptable for dropping an API, and this API had a low adoption rate.
Actual/hypothetical use cases could just run on the arrays/objects returned by json_decode, or even duplicate the implementation and replace the default methods with their own hooks. Additionally, dropping indirection might have performance benefits from allowing inlining, better branch prediction, etc. (haven't benchmarked that) |
Well there was a good use case for it in past and there might be another one in the future. There might be 3rd party extensions that are not public so I don't want to really drop it to add it again in the future. The maintenance cost is minimal and I don't really mind it as a maintainer of this ext. So unless someone has got a rewrite of the parser where this would be really limiting, then I don't see reason to drop it. Also extension API can be broken with minor version so there is no problem in terms of BC once someone comes with rewrite like this. If you want to optimize inlining then you can add conditions similar to this one which will just cost a single branch which should have quite minimal impact. |
I'm against adding create_empty_array and create_empty_object - This PR offers performance benefits without requiring third-party code changes.
The introduction of create_empty_array and create_empty_object would potentially cause subtle bugs in existing code similar to the linked snippet (e.g. if they overrid create_array to return ArrayObject, any future maintainers of that code would be surprised when With the PR-as-is, any existing code (previously) targeting PHP 7 would continue to work without modification. I'd consider optimizing the common use case (PHP json_decode) the higher priority, and this does that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can also go in 7.4 in this form.
Yeah that's fine - this PR is good as it is and I agree that it can go to 7.4 if RM doesn't have any objection ofc. |
I confirm that currently we neither use, nor need the function: php_json_parser_init_ex. |
In my opinion, this is too late. So, this should be master only. |
Use the shared empty array from ZVAL_EMPTY_ARRAY
For code that created an 10 arrays of 100000 empty arrays
(has the same result with
$assoc=true
and{}
)This is the worst-case comparison, but I'd expect 0-length arrays to be fairly common in regular data for json_decode
The parser implementation was using function pointers so that third party extension developers could reuse the json parser for their own custom data structures (I think).
My PR is intended to let those third party extensions continue working without modification.
Before this patch: In 0.126 seconds: added 97.99 MiB
After this patch: In 0.096 seconds: added 41.99 MiB