Skip to content

[Proposal] Reduce the minimum size for packed arrays from 8 to 2 #4783

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

TysonAndre
Copy link
Contributor

@TysonAndre TysonAndre commented Oct 5, 2019

Continue to require that associative arrays have a minimum
size of 8 and are powers of 2.

~~This can reduce the memory usage of some applications by ~16%
in applications that heavily use small arrays.~~ (outdated with memory usage reduction in php 8.2)

  • For example, Phan's memory usage for self-analysis (./phan --print-memory-usage-summary) went from 444MB/576MB to 387MB/475MB (end memory/max memory) with this patch. (This is a static analyzer for PHP which heavily uses small arrays)
  • e.g. the memory usage of token_get_all will drop because that function returns a lot of arrays of size 3

Stop requiring that packed arrays have a size that is a power of 2.
There's no reason to for packed arrays,
since php isn't doing any hash lookups with bit operations.

  • A followup PR could remove that restriction for the bucket data of non-packed arrays

Add a helper to create an associative array with a minimum size.
This may help with clarity/avoiding unnecessary size recalculations.
(performance impact may be negligible)

  • Use this helper in a few places (e.g. ext/) as an example of how existing code
    would avoid recomputing the initial associative array size.

https://nikic.github.io/2014/12/22/PHPs-new-hashtable-implementation.html
gives some background information on the HashTable implementation on
master.

This is a followup to discussion in
#4753 (comment)
(I didn't see any other PRs mentioning HT_MIN_SIZE)


  • Are there any good benchmarks you'd recommend for testing this change? (representative of typical usage in large applications / frameworks)
  • Are there microbenchmarks that would be recommended for this? What's the impact of this change in a standard hardware setup?
  • A possible followup is to also allow a min size of 1 for packed arrays. I don't know what benchmarks are used to decide on defaults for data structures in php-src, so I'm leaving it at 2 for now.
  • The empty array (used in ZVAL_EMPTY_ARRAY, $x = [], etc.) was left with a capacity of 8. This could also be dropped to 2 as a followup if it was beneficial overall

@KalleZ KalleZ requested a review from dstogov October 5, 2019 21:53
@nikic
Copy link
Member

nikic commented Oct 6, 2019

Generally this doesn't necessarily have to be limited to packaged arrays. Associative arrays don't require a pow2 data array either, only the hash has to be pow2.

@TysonAndre
Copy link
Contributor Author

Generally this doesn't necessarily have to be limited to packaged arrays. Associative arrays don't require a pow2 data array either, only the hash has to be pow2.

That makes sense. That optimization might have a large memory usage benefit in Zend/zend_ast.c for obvious associative arrays (e.g. convert the first time a non-string is seen), when opcache isn't used.

The stored array representation should be optimized more in the values stored in opcache once the restriction is lifted (My PR currently forces the hash size to be 8 when persisting arrays to opcache, even if there's only one element): https://github.com/TysonAndre/php-src/blob/816b69eea0448f622eea98aab493949474020d49/ext/opcache/zend_persist.c#L118-L153

  • I've run into various issues getting this to work (e.g. alignment assertions, size modulo assertions), so I'd want to keep array representation change PRs small to make bisecting easier for third-party extensions, PHP applications, etc. They might have their own helpers similar to zend_new_pair, or hit edge cases (e.g. code paths only hit with opcache)
  • I'd want to know how this would be benchmarked before adding more changes, in case additional optimizations somehow turn out to be a performance regression.

@TysonAndre
Copy link
Contributor Author

I'd want to know how this would be benchmarked before adding more changes, in case additional optimizations somehow turn out to be a performance regression.

Do any of the maintainers have suggestions for benchmarks to use?

  • I checked the php-internals mailing list - it seems like benchmarks were run regularly for intel but stopped in 2017 or so - I don't see anything since then.
  • Many of the third-party benchmarks are vague on the steps - They mention what they set up, and a few provide docker images for the application being benchmarked, but don't provide reproducible steps for running the benchmark. (e.g. third-party for wordpress, etc.)

Continue to require that associative arrays have a minimum
size of 8 and are powers of 8.

This can reduce the memory usage of some applications by ~16%
in applications that heavily use small arrays.

Stop requiring that packed arrays have a size that is a power of 2.
There's no reason to for packed arrays,
since php isn't doing any hash lookups with bit operations.

Add a helper to create an associative array with a minimum size.
This may help with clarity/avoiding unnecessary size recalculations.
(performance impact may be negligible)

https://nikic.github.io/2014/12/22/PHPs-new-hashtable-implementation.html
gives some background information on the `HashTable` implementation on
master.

This is a followup to
php#4753 (comment)
(I didn't see any other PRs mentioning HT_MIN_SIZE)
@TysonAndre
Copy link
Contributor Author

TysonAndre commented Jul 19, 2020

It seems like this might make very well make workloads slower, e.g. loading the index page of an uninitialized laravel app in php 8.0 with and without this patch.

  • As I've mentioned before, there aren't any authoritative benchmarks that php is targeting, so this is somewhat of a shot in the dark.

The example I'm using this with (Phan) isn't a typical workload - it creates a lot of temporary arrays (in addition to creating a lot of dynamic properties), and keeps references to many of those permanent arrays for a long time

One hypothesis: It's possible the cost of branch mispredicts (for the decision to raise array capacity) might outweigh the savings from reduced memory usage in typical code that use short-lived arrays that weren't cached in opcache. There, arrays would be resized more often.

// Assume there's additional control flow when building the array, and string or numeric offsets are added to the array
$x = [$i];
$x[] = 2;
$x[] = 'third'; // branch mispredict with this patch, but none without this patch
process($x);
// Assume no references to $x are kept

For example, running the index page of a fresh laravel app with a placeholder index.php (that loaded vendor/autoload.php but wasn't set up), through the development php server with opcache enabled, it got slower

 ~/.config/composer/vendor/bin/laravel new laravel

Using this command, with a valid file_cache folder that existed

for i in {0..7}; do sudo cpufreq-set --freq 2.5GHz; done
taskset 1 perf stat ~/php-8.0.0-dev-xsize-install/bin/php -d zend_extension=opcache.so -d opcache.file_cache=/path/to/opcache --no-php-ini -d opcache.enable_cli=1  index.php 2>&1|tee xsize_perf.txt

Before:

 Performance counter stats for '/path/to/php-8.0.0-dev-8size-install/bin/php -d zend_extension=opcache.so -d opcache.file_cache=/path/to/opcache --no-php-ini -d opcache.enable_cli=1 index.php':

         62.232029      task-clock (msec)         #    0.995 CPUs utilized          
                 1      context-switches          #    0.016 K/sec                  
                 0      cpu-migrations            #    0.000 K/sec                  
             5,504      page-faults               #    0.088 M/sec                  
       155,199,373      cycles                    #    2.494 GHz                    
       196,942,054      instructions              #    1.27  insn per cycle         
        26,907,119      branches                  #  432.368 M/sec                  
           647,386      branch-misses             #    2.41% of all branches        

       0.062570130 seconds time elapsed

After this patch, loading index.php is consistently a few percent slower, both in php -S and a CLI invocation.

 Performance counter stats for '/path/to/php-8.0.0-dev-minsize-install/bin/php -d zend_extension=opcache.so -d opcache.file_cache=/path/to/opcache --no-php-ini -d opcache.enable_cli=1 index.php':

         62.396356      task-clock (msec)         #    0.995 CPUs utilized          
                 2      context-switches          #    0.032 K/sec                  
                 0      cpu-migrations            #    0.000 K/sec                  
             5,497      page-faults               #    0.088 M/sec                  
       155,598,486      cycles                    #    2.494 GHz                    
       197,018,739      instructions              #    1.27  insn per cycle         
        26,916,351      branches                  #  431.377 M/sec                  
           648,043      branch-misses             #    2.41% of all branches        

       0.062729067 seconds time elapsed

(Other web frameworks were not tested)


It might be possible to fix whatever the issue's cause is by ensuring (in generated opcodes) that arrays are only allocated with a smaller capacity than 8 if opcache can infer that it probably wouldn't grow beyond that capacity within the function body, but the general case of that might fall into the "sufficiently smart compiler" category of things one might want to implement but be impractical (with copy on write, etc)

  • Maybe if it could be limited to narrow use cases such as return ['key' => $value];
  • I haven't looked into the actual branch mispredict cause
  • I'm not sure if my laptop is representative of CPUs used in typical web servers due to recent intel mitigations, etc

@TysonAndre
Copy link
Contributor Author

Closing this because it can make performance worse in some cases, and this proposal has been open for a while.

Limiting this to a subset of places where arrays are used is doable and might eventually help performance/memory usage, but doesn't seem to help typical use cases on web servers, where non-cached arrays are typically short-lived and can be used in a wide variety of ways (see previous comment)

@TysonAndre TysonAndre closed this Jul 29, 2020
@TysonAndre
Copy link
Contributor Author

TysonAndre commented Nov 26, 2021

I revisited this after #7491 was merged. The updated changes are https://github.com/php/php-src/compare/master...TysonAndre:hash-minsize-pr?expand=1

  • Max memory usage of phan decreased from 606MB to 592MB in php 8.2 (Phan's number of checks and amount of code to self-analyzed increased), with the decrease partly thanks to already being low due to memory improvements for packed arrays. 2.5% seems negligible even given the heavy use of small arrays in that application.

  • Performance seems a tiny bit better (vs php 8.2 without this change) with opcache.file_cache=1, but is around 3% worse without this change.

    I haven't tested this with a web server - hopefully those would see a similar benefit.

  • Getting similar performance on Zend/bench.php for ary3 required using the size of 8 for array_init - I don't know why, exactly, though. It doesn't seem like reading from an array repeatedly would be affected by this at all, and the resulting array's capacity would be the same. An issue with the way I benchmarked? CPU instruction caches? CPU L1/L2 caches?


The web server with opcache enabled (using CLI and file cache to simulate that for now) seems more promising. @kocsismate mentions we have automatic performance benchmarking scripts that can run in AWS to provide a more reproducible check

  • My earlier steps from 2019 for laravel look like they're missing opcache.enable=1 and may have been done wrong

time taskset 1 ~/php-8.2.0-minsize-optimized-install/bin/php example-app/public/index.php > minsize.txt seems to now consistently take less time than without this change (with opcache.file_cache, opcache.enable=1, opcache.enable_cli=1, zend_extension=opcache, and the same set of installed extensions)

(0.62s was the fastest time without this change, 0.58s with this change, running in a loop and saving output to a file. The file output was the same)

  • I think the setup was the same?

(using setup from https://laravel.com/docs/8.x to create the demo app)

@kocsismate
Copy link
Member

I run the benchmark against the commit on which your branch is based, and I mainly see a slight slowdown:

https://gist.github.com/kocsismate/331a682049942a01e7d6ce1482eb918e

@@ -394,7 +394,8 @@ struct _zend_array {
#define HT_INVALID_IDX ((uint32_t) -1)

#define HT_MIN_MASK ((uint32_t) -2)
#define HT_MIN_SIZE 8
#define HT_MIN_SIZE 2
#define HT_MIN_SIZE_UNPACKED 8
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UNPACKED is a bad name. We already use HASH_MAP name it iterators API.
Also the meaning of HT_MIN_SIZE is not clear now. Should it be renamed into HT_MIN_SIZE_PACKED?

Copy link
Member

@dstogov dstogov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't commit this yet.

The initial array size was tuned edges ago and then revisited during PHP-7 development.
It's selected as a compromise between memory consumption and number of reallocations.
When we reduce the initial size we might need a few more reallocations later.

Is this PR is mainly targeted to represent callabels?

Note that constant arrays are already shrunk by opcache and they are properly resized when copied on write.
I think, it might be possible to use more compact representations for some specific cases e.g. zend_new_pair() but I don't like to touch the defaults. This always will make something better and something worse.

Comment on lines -419 to +422
#define array_init(arg) ZVAL_ARR((arg), zend_new_array(0))
#define array_init_size(arg, size) ZVAL_ARR((arg), zend_new_array(size))
#define array_init(arg) ZVAL_ARR((arg), zend_new_array(0))
#define array_init_size(arg, size) ZVAL_ARR((arg), zend_new_array(size))
#define array_init_assoc(arg) ZVAL_ARR((arg), zend_new_array_assoc(0))
#define array_init_assoc_size(arg, size) ZVAL_ARR((arg), zend_new_array_assoc(size))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HashTable may be in 3 states - packed, map and uninitialized. array_init() creates HashTable in unitilaized state and it may become packed or map depending on the following zend_hash_real_init_* or update operations.

array_init_assoc() creates HashTable in unitilaized state as well, despite of the name.

Comment on lines +443 to +450
if (size < 64) { \
ZEND_ASSERT(size == 16 || size == 32); \
_mm_storeu_si128((__m128i*)p, xmm0); \
if (size >= 32) { \
_mm_storeu_si128((__m128i*)(p+16), xmm0); \
} \
break; \
} \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How this is related to packed array size? This code is for assoc/maps only.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants