-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Reduce memory used by token_get_all() #4753
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Around a quarter of all strings in array tokens would have a string that's one character long (e.g. ` `, `\`, `1`) For parsing a large number of php files, The memory increase dropped from 378374248 to 369535688 (2.5%)
It may be worthwhile to locally intern strings used during one token_get_all run, because it's likely that quite a few strings are going to appear many times. This will have some performance cost, as we'll have to look up the string in a hashtable, but it may be worthwhile. |
Though probably string size just isn't that much of a factor right now anyway, because memory usage is dominated by the array size. We should really have a mode that returns objects instead, which use much less memory (#2430). |
I think adding it to Php-ast as ast\Token might make it immediately available to all PHP versions to users/apps focused on performance, but that’s less broadly accessible long-term. |
I'd expect the performance cost to be noticeable (Off-topic: If an application needed to reduce memory, they could use Igbinary has a data structure similar to what you would want to do that. In igbinary, deduplicating strings (in addition to strings used as class names) caused a noticeable hit to the amount of time taken to generate a result (i.e. serialized data). This was acceptable because unserialization was more common and reducing storage is also important. https://github.com/igbinary/igbinary
I haven't profiled it with a C profiler, but my best guesses are:
I tried to optimize the original implementation in https://github.com/igbinary/igbinary/blob/9e7e8d6/src/php7/hash_si.c#L78-L130 (haven't tried other approaches such as cuckoo hashing, or existing hash sets, and would be interested if anyone knew better options). |
Another possibility would be to have a per-token cache of a single string. For a lot of tokens there is only a single possible value (like (Though again, I'm not sure if this is really worthwhile without objects.) |
On the note of reducing memory for preg_match, token_get_all, etc. Have there been any proposals for a built-in tuple type (i.e. fixed-length array), like in python (fixed-length, optionally immutable at the top level)? TOKEN_AS_TUPLE, PREG_MATCH_AS_TUPLE, etc. could be used
This is along the lines of what php-ds has planned for 2.0, but I haven't looked at their implementation. https://github.com/php-ds/ext-ds/blob/2.0/DRAFT.php#L521-L535 |
While we're speculating... we could also optimize the array implementation for the tuple case. In particular, when packed arrays are used, there is no strong reason to require a power of two array size. We could start off with an exact number of elements, but would then have to perform a pow2 check on resize. As resize is relatively rare, it may be worthwhile. For this case (array with three elements) we'd save |
Oh, good. the minimum of a capacity for 8 was to optimize just hash tables, not regular packed arrays. I wasn't sure if that would have been needed for packed arrays as well (e.g. if benchmarking found that reallocs were slower than using extra memory.) |
@nikic - I created TysonAndre#5 to test if this is viable, whether it'd cause problems for extensions, etc. (This currently still rounds up to the nearest power of 2, though, but decreases the minimum size for non-packed arrays from 8 to 2. I haven't gotten around to removing the rounding) |
Continue to require that associative arrays have a minimum size of 8 and are powers of 8. Stop requiring that packed arrays have a size that is a power of 2. There's no reason to for packed arrays, since php isn't doing any hash lookups with bit operations. Add a helper to create an associative array with a minimum size. This may help with clarity/avoiding unnecessary size recalculations. (performance impact may be negligible) https://nikic.github.io/2014/12/22/PHPs-new-hashtable-implementation.html gives some background information on the `HashTable` implementation on master. This is a followup to php#4753 (comment) (I didn't see any other PRs mentioning HT_MIN_SIZE)
Continue to require that associative arrays have a minimum size of 8 and are powers of 8. This can reduce the memory usage of some applications by ~16% in applications that heavily use small arrays. Stop requiring that packed arrays have a size that is a power of 2. There's no reason to for packed arrays, since php isn't doing any hash lookups with bit operations. Add a helper to create an associative array with a minimum size. This may help with clarity/avoiding unnecessary size recalculations. (performance impact may be negligible) https://nikic.github.io/2014/12/22/PHPs-new-hashtable-implementation.html gives some background information on the `HashTable` implementation on master. This is a followup to php#4753 (comment) (I didn't see any other PRs mentioning HT_MIN_SIZE)
Continue to require that associative arrays have a minimum size of 8 and are powers of 8. This can reduce the memory usage of some applications by ~16% in applications that heavily use small arrays. Stop requiring that packed arrays have a size that is a power of 2. There's no reason to for packed arrays, since php isn't doing any hash lookups with bit operations. Add a helper to create an associative array with a minimum size. This may help with clarity/avoiding unnecessary size recalculations. (performance impact may be negligible) https://nikic.github.io/2014/12/22/PHPs-new-hashtable-implementation.html gives some background information on the `HashTable` implementation on master. This is a followup to php#4753 (comment) (I didn't see any other PRs mentioning HT_MIN_SIZE)
Continue to require that associative arrays have a minimum size of 8 and are powers of 8. This can reduce the memory usage of some applications by ~16% in applications that heavily use small arrays. Stop requiring that packed arrays have a size that is a power of 2. There's no reason to for packed arrays, since php isn't doing any hash lookups with bit operations. Add a helper to create an associative array with a minimum size. This may help with clarity/avoiding unnecessary size recalculations. (performance impact may be negligible) https://nikic.github.io/2014/12/22/PHPs-new-hashtable-implementation.html gives some background information on the `HashTable` implementation on master. This is a followup to php#4753 (comment) (I didn't see any other PRs mentioning HT_MIN_SIZE)
Continue to require that associative arrays have a minimum size of 8 and are powers of 8. This can reduce the memory usage of some applications by ~16% in applications that heavily use small arrays. Stop requiring that packed arrays have a size that is a power of 2. There's no reason to for packed arrays, since php isn't doing any hash lookups with bit operations. Add a helper to create an associative array with a minimum size. This may help with clarity/avoiding unnecessary size recalculations. (performance impact may be negligible) https://nikic.github.io/2014/12/22/PHPs-new-hashtable-implementation.html gives some background information on the `HashTable` implementation on master. This is a followup to php#4753 (comment) (I didn't see any other PRs mentioning HT_MIN_SIZE)
Continue to require that associative arrays have a minimum size of 8 and are powers of 8. This can reduce the memory usage of some applications by ~16% in applications that heavily use small arrays. Stop requiring that packed arrays have a size that is a power of 2. There's no reason to for packed arrays, since php isn't doing any hash lookups with bit operations. Add a helper to create an associative array with a minimum size. This may help with clarity/avoiding unnecessary size recalculations. (performance impact may be negligible) https://nikic.github.io/2014/12/22/PHPs-new-hashtable-implementation.html gives some background information on the `HashTable` implementation on master. This is a followup to php#4753 (comment) (I didn't see any other PRs mentioning HT_MIN_SIZE)
Continue to require that associative arrays have a minimum size of 8 and are powers of 8. This can reduce the memory usage of some applications by ~16% in applications that heavily use small arrays. Stop requiring that packed arrays have a size that is a power of 2. There's no reason to for packed arrays, since php isn't doing any hash lookups with bit operations. Add a helper to create an associative array with a minimum size. This may help with clarity/avoiding unnecessary size recalculations. (performance impact may be negligible) https://nikic.github.io/2014/12/22/PHPs-new-hashtable-implementation.html gives some background information on the `HashTable` implementation on master. This is a followup to php#4753 (comment) (I didn't see any other PRs mentioning HT_MIN_SIZE)
Around a quarter of all tokens in files would have a string that's one
,
character long (e.g.
\
,1
)For parsing a large number of php files,
The memory increase dropped from 378374248 to 369535688 (2.5%) (other projects probably have less single-byte tokens)
Script used: