Skip to content

Proposal: Optimize creation of empty arrays in json_decode #4861

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

TysonAndre
Copy link
Contributor

Use the shared empty array from ZVAL_EMPTY_ARRAY

For code that created an 10 arrays of 100000 empty arrays
(has the same result with $assoc=true and {})

  • This is the worst-case comparison, but I'd expect 0-length arrays to be fairly common in regular data for json_decode

  • The parser implementation was using function pointers so that third party extension developers could reuse the json parser for their own custom data structures (I think).

    My PR is intended to let those third party extensions continue working without modification.

Before this patch: In 0.126 seconds: added 97.99 MiB
After this patch: In 0.096 seconds: added 41.99 MiB

<?php
$json = '[' . str_repeat('[],', 100000) . "null]";
$start_memory = memory_get_usage();
$start_time = microtime(true);
$result = [];
for ($i = 0; $i < 10; $i++) {
    $result[] = json_decode($json);
}
$end_memory = memory_get_usage();
$end_time = microtime(true);
// Before this patch: In 0.126 seconds: added 97.99 MiB
// After this patch:  In 0.096 seconds: added 41.99 MiB
printf("In %.3f seconds: added %.2f MiB\n", $end_time - $start_time, ($end_memory - $start_memory)/1000000);

// For objects
$json = '[' . str_repeat('{},', 100000) . "null]";
$start_memory = memory_get_usage();
$start_time = microtime(true);
for ($i = 0; $i < 10; $i++) {
    $result[] = json_decode($json, true);
}
$end_memory = memory_get_usage();
$end_time = microtime(true);
// Before this patch: In 0.126 seconds: added 97.99 MiB
// After this patch:  In 0.096 seconds: added 41.99 MiB
printf("In %.3f seconds: added %.2f MiB (objects decoded as arrays) \n", $end_time - $start_time, ($end_memory - $start_memory)/1000000);

Use the shared empty array from ZVAL_EMPTY_ARRAY

For code that created an 10 arrays of 100000 empty arrays
(has the same result with `$assoc=true` and `{}`)

- This is the worst-case comparison, but I'd expect 0-length arrays to be fairly
  common in regular data for json_decode
- The parser implementation was using function pointers so that third party
  extension developers could reuse the json parser for their own
  data structures, etc. (I think).

  This PR is meant to let those third party extensions continue working
  without changes.

Before this patch: In 0.126 seconds: added 97.99 MiB
After this patch:  In 0.096 seconds: added 41.99 MiB

```php
<?php
$json = '[' . str_repeat('[],', 100000) . "null]";
$start_memory = memory_get_usage();
$start_time = microtime(true);
$result = [];
for ($i = 0; $i < 10; $i++) {
    $result[] = json_decode($json);
}
$end_memory = memory_get_usage();
$end_time = microtime(true);
// Before this patch: In 0.126 seconds: added 97.99 MiB
// After this patch:  In 0.096 seconds: added 41.99 MiB
printf("In %.3f seconds: added %.2f MiB\n", $end_time - $start_time, ($end_memory - $start_memory)/1000000);

// For objects
$json = '[' . str_repeat('{},', 100000) . "null]";
$start_memory = memory_get_usage();
$start_time = microtime(true);
for ($i = 0; $i < 10; $i++) {
    $result[] = json_decode($json, true);
}
$end_memory = memory_get_usage();
$end_time = microtime(true);
// Before this patch: In 0.126 seconds: added 97.99 MiB
// After this patch:  In 0.096 seconds: added 41.99 MiB
printf("In %.3f seconds: added %.2f MiB (objects decoded as arrays) \n", $end_time - $start_time, ($end_memory - $start_memory)/1000000);
```
@nikic
Copy link
Member

nikic commented Oct 26, 2019

Maybe @bukka can point us towards where these parser handlers are used outside of php-src?

I think given that they exist, we should probably be adding create_empty_array() and create_empty_object() here.

@TysonAndre
Copy link
Contributor Author

Maybe bukka can point us towards where these parser handlers are used outside of php-src?

I saw https://github.com/bukka/php-jsond - but that's not it, since the latest versions of json are what use the code from jsond. https://github.com/bukka/php-jsond/blob/master/php_jsond_parser.h

  • Actually, was jsond the extension the current version of ext/json is based on?

I don't see anything else on github for it: https://github.com/search?q=php_json_parser_methods+-filename%3Ajson_parser.tab.c+-filename%3Ajson_parser.y++-filename%3Aphp_json_parser.h&type=Code (could possibly be a repo that google doesn't index, or closed source, or nothing at all at the moment)

@bukka
Copy link
Member

bukka commented Oct 27, 2019

The parser methods were added for https://github.com/php/pecl-database-mysql_xdevapi . Not sure if and how it is used in there though. Maybe @johannes or @marinesovitch can give more info.

@bukka
Copy link
Member

bukka commented Oct 27, 2019

It can still be used by some 3rd party code though as it's now exposed API so I would agree to add create_empty_array / create_empty_object.

@TysonAndre
Copy link
Contributor Author

TysonAndre commented Oct 27, 2019

It seems like the reference to php_json_parser_init_ex was removed in php/pecl-database-mysql_xdevapi@f73b37f#diff-278d5d76d931c6a89df0c984fc137809

Adding a separate array_create_empty() would potentially break code that only implemented overrode array_create() when switching to php 8.0 and be unintuitive - instead of the custom behavior on empty arrays, it would use the default php implementation.

E.g. the example took the default methods and added a single helper to update an object. A hypothetical example that overrid array_create and array_update instead would break.

	php_json_parser_init(&parser.parser, &return_value, (char *) json.s, json.l, options, depth);
	own_methods = parser.parser.methods;
	own_methods.object_update = xmysqlnd_json_parser_object_update;

	php_json_parser_init_ex(&parser.parser, &return_value, (char *) json.s, json.l, options, depth, &own_methods);
	status->found = FALSE;

@nikic
Copy link
Member

nikic commented Oct 27, 2019

If there are no known users of this extension point, then we should drop it in master. These kinds of fine-grained extension points are rather problematic and pretty much the reason why we're afraid to do any changes in the mysqlnd codebase, which is full of this. If there's no good reason to have them, it's best not to have them and save us the second-guessing of what people might be doing with them.

@TysonAndre
Copy link
Contributor Author

If there are no known users of this extension point, then we should drop it in master.

I'd agree with this point - a major php version seems like it should be acceptable for dropping an API, and this API had a low adoption rate.

  • I went with this approach because I didn't know the process involved in dropping an API.

Actual/hypothetical use cases could just run on the arrays/objects returned by json_decode, or even duplicate the implementation and replace the default methods with their own hooks.

Additionally, dropping indirection might have performance benefits from allowing inlining, better branch prediction, etc. (haven't benchmarked that)

@bukka
Copy link
Member

bukka commented Oct 27, 2019

Well there was a good use case for it in past and there might be another one in the future. There might be 3rd party extensions that are not public so I don't want to really drop it to add it again in the future. The maintenance cost is minimal and I don't really mind it as a maintainer of this ext. So unless someone has got a rewrite of the parser where this would be really limiting, then I don't see reason to drop it. Also extension API can be broken with minor version so there is no problem in terms of BC once someone comes with rewrite like this.

If you want to optimize inlining then you can add conditions similar to this one which will just cost a single branch which should have quite minimal impact.

@TysonAndre
Copy link
Contributor Author

I'm against adding create_empty_array and create_empty_object - This PR offers performance benefits without requiring third-party code changes.

It can still be used by some 3rd party code though as it's now exposed API so I would agree to add create_empty_array / create_empty_object.

The introduction of create_empty_array and create_empty_object would potentially cause subtle bugs in existing code similar to the linked snippet (e.g. if they overrid create_array to return ArrayObject, any future maintainers of that code would be surprised when "[]" resulted in a regular array).

With the PR-as-is, any existing code (previously) targeting PHP 7 would continue to work without modification.

I'd consider optimizing the common use case (PHP json_decode) the higher priority, and this does that.

Copy link
Member

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can also go in 7.4 in this form.

@bukka
Copy link
Member

bukka commented Oct 28, 2019

Yeah that's fine - this PR is good as it is and I agree that it can go to 7.4 if RM doesn't have any objection ofc.

@marinesovitch
Copy link
Contributor

The parser methods were added for https://github.com/php/pecl-database-mysql_xdevapi . Not sure if and how it is used in there though. Maybe @johannes or @marinesovitch can give more info.
[...]
It seems like the reference to php_json_parser_init_ex was removed in php/pecl-database-mysql_xdevapi@f73b37f#diff-278d5d76d931c6a89df0c984fc137809
[...]

I confirm that currently we neither use, nor need the function: php_json_parser_init_ex.

@php-pulls php-pulls closed this in 447f07c Oct 30, 2019
@TysonAndre
Copy link
Contributor Author

Yeah that's fine - this PR is good as it is and I agree that it can go to 7.4 if RM doesn't have any objection ofc.

cc @derickr @petk - thoughts on including this in php 7.4.0RC6?

@derickr
Copy link
Member

derickr commented Nov 4, 2019

In my opinion, this is too late. So, this should be master only.

@TysonAndre TysonAndre deleted the json_decode-optimize branch November 16, 2019 00:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants