-
Notifications
You must be signed in to change notification settings - Fork 7.9k
[Proposal] Reduce the minimum size for packed arrays from 8 to 2 #4783
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Generally this doesn't necessarily have to be limited to packaged arrays. Associative arrays don't require a pow2 data array either, only the hash has to be pow2. |
That makes sense. That optimization might have a large memory usage benefit in Zend/zend_ast.c for obvious associative arrays (e.g. convert the first time a non-string is seen), when opcache isn't used. The stored array representation should be optimized more in the values stored in opcache once the restriction is lifted (My PR currently forces the hash size to be 8 when persisting arrays to opcache, even if there's only one element): https://github.com/TysonAndre/php-src/blob/816b69eea0448f622eea98aab493949474020d49/ext/opcache/zend_persist.c#L118-L153
|
Do any of the maintainers have suggestions for benchmarks to use?
|
816b69e
to
abbf793
Compare
abbf793
to
407d804
Compare
407d804
to
b28a496
Compare
Continue to require that associative arrays have a minimum size of 8 and are powers of 8. This can reduce the memory usage of some applications by ~16% in applications that heavily use small arrays. Stop requiring that packed arrays have a size that is a power of 2. There's no reason to for packed arrays, since php isn't doing any hash lookups with bit operations. Add a helper to create an associative array with a minimum size. This may help with clarity/avoiding unnecessary size recalculations. (performance impact may be negligible) https://nikic.github.io/2014/12/22/PHPs-new-hashtable-implementation.html gives some background information on the `HashTable` implementation on master. This is a followup to php#4753 (comment) (I didn't see any other PRs mentioning HT_MIN_SIZE)
b28a496
to
751f0ef
Compare
It seems like this might make very well make workloads slower, e.g. loading the index page of an uninitialized laravel app in php 8.0 with and without this patch.
The example I'm using this with (Phan) isn't a typical workload - it creates a lot of temporary arrays (in addition to creating a lot of dynamic properties), and keeps references to many of those permanent arrays for a long time One hypothesis: It's possible the cost of branch mispredicts (for the decision to raise array capacity) might outweigh the savings from reduced memory usage in typical code that use short-lived arrays that weren't cached in opcache. There, arrays would be resized more often.
For example, running the index page of a fresh laravel app with a placeholder index.php (that loaded vendor/autoload.php but wasn't set up), through the development php server with opcache enabled, it got slower
Using this command, with a valid file_cache folder that existed
Before:
After this patch, loading index.php is consistently a few percent slower, both in
(Other web frameworks were not tested) It might be possible to fix whatever the issue's cause is by ensuring (in generated opcodes) that arrays are only allocated with a smaller capacity than 8 if opcache can infer that it probably wouldn't grow beyond that capacity within the function body, but the general case of that might fall into the "sufficiently smart compiler" category of things one might want to implement but be impractical (with copy on write, etc)
|
Closing this because it can make performance worse in some cases, and this proposal has been open for a while. Limiting this to a subset of places where arrays are used is doable and might eventually help performance/memory usage, but doesn't seem to help typical use cases on web servers, where non-cached arrays are typically short-lived and can be used in a wide variety of ways (see previous comment) |
I revisited this after #7491 was merged. The updated changes are https://github.com/php/php-src/compare/master...TysonAndre:hash-minsize-pr?expand=1
The web server with opcache enabled (using CLI and file cache to simulate that for now) seems more promising. @kocsismate mentions we have automatic performance benchmarking scripts that can run in AWS to provide a more reproducible check
(0.62s was the fastest time without this change, 0.58s with this change, running in a loop and saving output to a file. The file output was the same)
(using setup from https://laravel.com/docs/8.x to create the demo app) |
I run the benchmark against the commit on which your branch is based, and I mainly see a slight slowdown: https://gist.github.com/kocsismate/331a682049942a01e7d6ce1482eb918e |
@@ -394,7 +394,8 @@ struct _zend_array { | |||
#define HT_INVALID_IDX ((uint32_t) -1) | |||
|
|||
#define HT_MIN_MASK ((uint32_t) -2) | |||
#define HT_MIN_SIZE 8 | |||
#define HT_MIN_SIZE 2 | |||
#define HT_MIN_SIZE_UNPACKED 8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UNPACKED
is a bad name. We already use HASH_MAP
name it iterators API.
Also the meaning of HT_MIN_SIZE
is not clear now. Should it be renamed into HT_MIN_SIZE_PACKED
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't commit this yet.
The initial array size was tuned edges ago and then revisited during PHP-7 development.
It's selected as a compromise between memory consumption and number of reallocations.
When we reduce the initial size we might need a few more reallocations later.
Is this PR is mainly targeted to represent callabels?
Note that constant arrays are already shrunk by opcache and they are properly resized when copied on write.
I think, it might be possible to use more compact representations for some specific cases e.g. zend_new_pair()
but I don't like to touch the defaults. This always will make something better and something worse.
#define array_init(arg) ZVAL_ARR((arg), zend_new_array(0)) | ||
#define array_init_size(arg, size) ZVAL_ARR((arg), zend_new_array(size)) | ||
#define array_init(arg) ZVAL_ARR((arg), zend_new_array(0)) | ||
#define array_init_size(arg, size) ZVAL_ARR((arg), zend_new_array(size)) | ||
#define array_init_assoc(arg) ZVAL_ARR((arg), zend_new_array_assoc(0)) | ||
#define array_init_assoc_size(arg, size) ZVAL_ARR((arg), zend_new_array_assoc(size)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HashTable may be in 3 states - packed
, map
and uninitialized
. array_init()
creates HashTable in unitilaized
state and it may become packed
or map
depending on the following zend_hash_real_init_*
or update operations.
array_init_assoc()
creates HashTable in unitilaized
state as well, despite of the name.
if (size < 64) { \ | ||
ZEND_ASSERT(size == 16 || size == 32); \ | ||
_mm_storeu_si128((__m128i*)p, xmm0); \ | ||
if (size >= 32) { \ | ||
_mm_storeu_si128((__m128i*)(p+16), xmm0); \ | ||
} \ | ||
break; \ | ||
} \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How this is related to packed array size? This code is for assoc/maps only.
Continue to require that associative arrays have a minimum
size of 8 and are powers of 2.
~~This can reduce the memory usage of some applications by ~16%
in applications that heavily use small arrays.~~ (outdated with memory usage reduction in php 8.2)
./phan --print-memory-usage-summary
) went from 444MB/576MB to 387MB/475MB (end memory/max memory) with this patch. (This is a static analyzer for PHP which heavily uses small arrays)Stop requiring that packed arrays have a size that is a power of 2.
There's no reason to for packed arrays,
since php isn't doing any hash lookups with bit operations.
Add a helper to create an associative array with a minimum size.
This may help with clarity/avoiding unnecessary size recalculations.
(performance impact may be negligible)
ext/
) as an example of how existing codewould avoid recomputing the initial associative array size.
https://nikic.github.io/2014/12/22/PHPs-new-hashtable-implementation.html
gives some background information on the
HashTable
implementation onmaster.
This is a followup to discussion in
#4753 (comment)
(I didn't see any other PRs mentioning HT_MIN_SIZE)
$x = []
, etc.) was left with a capacity of 8. This could also be dropped to 2 as a followup if it was beneficial overall