random: Optimize Randomizer::getBytes() #15228

TimWolla · 2024-08-04T15:29:05Z

This is my attempt at optimizing Randomizer::getBytes(), building on the work done by @SakiTakamachi in #14891, but modifying it so much that the spirit of Saki's version is no longer recognizable. Thus I'm filing this as a separate PR.

This patch greatly improves the performance for the common case of using a 64-bit engine and requesting a length that is a multiple of 8.

It does so by providing a fast path that will just memcpy() (which will be optimized out) the returned uint64_t directly into the output buffer, byteswapping it for big endian architectures.

The existing byte-wise copying logic was mostly left alone. It only received an optimization of the shifting and masking that was previously applied to Randomizer::getBytesFromString() in 1fc2ddc.

Benchmarks:

The baseline commit is d6a75e1. I've rebased both Saki's and my branch onto that commit.

gcc is configured as: ./configure --enable-zend-test --enable-option-checking=fatal --enable-phpdbg --enable-fpm
clang is configured as: ./configure --enable-zend-test --enable-option-checking=fatal --enable-phpdbg --enable-fpm --enable-werror CC=clang-16 CXX=clang++-16

Summary: Results are unexpectedly dependent on the engine chosen even if both engines are 64-bit engines. They are also heavily dependent on the compiler. There is no clear winner. My version appears to be better for 32 bits engines, whereas Saki's version appears to be better for short lengths that are not a multiple of 8. For the important cases of 16 bytes they are equal and for 32 bytes they are equal with gcc. For clang it depends on the engine.

For Mt19937 and 1024 bytes. My version is the fastest for both clang and gcc.

<?php

use Random\Engine\Mt19937;
use Random\Engine\PcgOneseq128XslRr64;
use Random\Randomizer;

$r = new Randomizer(new Mt19937(0));

for ($i = 0; $i < 500000; $i++) {
    $r->getBytes(1024);
}

$ hyperfine -L compiler gcc,clang -L binary baseline,saki,tim '/tmp/php/{compiler}-{binary} test.php'
Benchmark 1: /tmp/php/gcc-baseline test.php
  Time (mean ± σ):     522.7 ms ±   5.9 ms    [User: 520.1 ms, System: 2.4 ms]
  Range (min … max):   515.1 ms … 530.8 ms    10 runs
 
Benchmark 2: /tmp/php/clang-baseline test.php
  Time (mean ± σ):     665.6 ms ±  10.5 ms    [User: 663.2 ms, System: 2.1 ms]
  Range (min … max):   658.6 ms … 693.3 ms    10 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark 3: /tmp/php/gcc-saki test.php
  Time (mean ± σ):     630.6 ms ±   2.9 ms    [User: 627.5 ms, System: 2.8 ms]
  Range (min … max):   627.2 ms … 635.9 ms    10 runs
 
Benchmark 4: /tmp/php/clang-saki test.php
  Time (mean ± σ):     624.5 ms ±   4.0 ms    [User: 620.9 ms, System: 3.3 ms]
  Range (min … max):   621.8 ms … 632.5 ms    10 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark 5: /tmp/php/gcc-tim test.php
  Time (mean ± σ):     512.9 ms ±   4.6 ms    [User: 510.9 ms, System: 1.7 ms]
  Range (min … max):   506.0 ms … 522.2 ms    10 runs
 
Benchmark 6: /tmp/php/clang-tim test.php
  Time (mean ± σ):     571.0 ms ±   1.6 ms    [User: 568.6 ms, System: 2.4 ms]
  Range (min … max):   567.6 ms … 572.9 ms    10 runs
 
Summary
  /tmp/php/gcc-tim test.php ran
    1.02 ± 0.01 times faster than /tmp/php/gcc-baseline test.php
    1.11 ± 0.01 times faster than /tmp/php/clang-tim test.php
    1.22 ± 0.01 times faster than /tmp/php/clang-saki test.php
    1.23 ± 0.01 times faster than /tmp/php/gcc-saki test.php
    1.30 ± 0.02 times faster than /tmp/php/clang-baseline test.php

For PcgOneseq128XslRr64 and 1024 bytes. Saki's and my version are equal for both clang and gcc.

<?php

use Random\Engine\Mt19937;
use Random\Engine\PcgOneseq128XslRr64;
use Random\Randomizer;

$r = new Randomizer(new PcgOneseq128XslRr64(0));

for ($i = 0; $i < 500000; $i++) {
    $r->getBytes(1024);
}

$ hyperfine -L compiler gcc,clang -L binary baseline,saki,tim '/tmp/php/{compiler}-{binary} test.php'
Benchmark 1: /tmp/php/gcc-baseline test.php
  Time (mean ± σ):     349.1 ms ±   3.5 ms    [User: 346.0 ms, System: 3.1 ms]
  Range (min … max):   346.2 ms … 357.3 ms    10 runs
 
Benchmark 2: /tmp/php/clang-baseline test.php
  Time (mean ± σ):     471.3 ms ±   8.6 ms    [User: 468.5 ms, System: 2.7 ms]
  Range (min … max):   465.9 ms … 492.3 ms    10 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark 3: /tmp/php/gcc-saki test.php
  Time (mean ± σ):     171.1 ms ±   0.7 ms    [User: 168.4 ms, System: 2.8 ms]
  Range (min … max):   170.1 ms … 173.5 ms    17 runs
 
Benchmark 4: /tmp/php/clang-saki test.php
  Time (mean ± σ):     137.6 ms ±   4.4 ms    [User: 135.2 ms, System: 2.4 ms]
  Range (min … max):   134.4 ms … 155.6 ms    21 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark 5: /tmp/php/gcc-tim test.php
  Time (mean ± σ):     170.6 ms ±   0.5 ms    [User: 168.8 ms, System: 1.9 ms]
  Range (min … max):   169.8 ms … 171.5 ms    17 runs
 
Benchmark 6: /tmp/php/clang-tim test.php
  Time (mean ± σ):     139.0 ms ±   1.7 ms    [User: 136.5 ms, System: 2.4 ms]
  Range (min … max):   136.7 ms … 142.9 ms    20 runs
 
Summary
  /tmp/php/clang-saki test.php ran
    1.01 ± 0.03 times faster than /tmp/php/clang-tim test.php
    1.24 ± 0.04 times faster than /tmp/php/gcc-tim test.php
    1.24 ± 0.04 times faster than /tmp/php/gcc-saki test.php
    2.54 ± 0.09 times faster than /tmp/php/gcc-baseline test.php
    3.43 ± 0.13 times faster than /tmp/php/clang-baseline test.php

For Secure and 16 bytes. All of them are equal.

<?php

use Random\Engine\Mt19937;
use Random\Engine\PcgOneseq128XslRr64;
use Random\Randomizer;

$r = new Randomizer();

for ($i = 0; $i < 500000; $i++) {
    $r->getBytes(16);
}

$ hyperfine -L compiler gcc,clang -L binary baseline,saki,tim '/tmp/php/{compiler}-{binary} test.php'
Benchmark 1: /tmp/php/gcc-baseline test.php
  Time (mean ± σ):     257.7 ms ±   1.5 ms    [User: 54.4 ms, System: 203.1 ms]
  Range (min … max):   256.8 ms … 261.9 ms    11 runs
 
Benchmark 2: /tmp/php/clang-baseline test.php
  Time (mean ± σ):     264.7 ms ±   6.4 ms    [User: 56.6 ms, System: 208.0 ms]
  Range (min … max):   258.7 ms … 275.4 ms    11 runs
 
Benchmark 3: /tmp/php/gcc-saki test.php
  Time (mean ± σ):     255.1 ms ±   4.1 ms    [User: 51.5 ms, System: 203.6 ms]
  Range (min … max):   253.0 ms … 267.4 ms    11 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark 4: /tmp/php/clang-saki test.php
  Time (mean ± σ):     257.0 ms ±   2.4 ms    [User: 49.3 ms, System: 207.5 ms]
  Range (min … max):   254.9 ms … 261.7 ms    11 runs
 
Benchmark 5: /tmp/php/gcc-tim test.php
  Time (mean ± σ):     257.2 ms ±   8.1 ms    [User: 49.3 ms, System: 207.6 ms]
  Range (min … max):   253.5 ms … 281.0 ms    11 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark 6: /tmp/php/clang-tim test.php
  Time (mean ± σ):     257.5 ms ±   6.4 ms    [User: 51.2 ms, System: 205.9 ms]
  Range (min … max):   254.0 ms … 275.7 ms    11 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Summary
  /tmp/php/gcc-saki test.php ran
    1.01 ± 0.02 times faster than /tmp/php/clang-saki test.php
    1.01 ± 0.04 times faster than /tmp/php/gcc-tim test.php
    1.01 ± 0.03 times faster than /tmp/php/clang-tim test.php
    1.01 ± 0.02 times faster than /tmp/php/gcc-baseline test.php
    1.04 ± 0.03 times faster than /tmp/php/clang-baseline test.php

For Pcg + 16 bytes. Saki's version is faster for gcc, for clang they are equal.

<?php

use Random\Engine\Mt19937;
use Random\Engine\PcgOneseq128XslRr64;
use Random\Randomizer;

$r = new Randomizer(new PcgOneseq128XslRr64(0));

for ($i = 0; $i < 5000000; $i++) {
    $r->getBytes(16);
}

$ hyperfine -L compiler gcc,clang -L binary baseline,saki,tim '/tmp/php/{compiler}-{binary} test.php'
Benchmark 1: /tmp/php/gcc-baseline test.php
  Time (mean ± σ):     119.8 ms ±   1.6 ms    [User: 117.4 ms, System: 2.1 ms]
  Range (min … max):   117.7 ms … 123.8 ms    24 runs
 
Benchmark 2: /tmp/php/clang-baseline test.php
  Time (mean ± σ):     163.2 ms ±   3.9 ms    [User: 160.9 ms, System: 2.0 ms]
  Range (min … max):   158.5 ms … 177.0 ms    18 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark 3: /tmp/php/gcc-saki test.php
  Time (mean ± σ):      96.7 ms ±   1.2 ms    [User: 94.5 ms, System: 2.1 ms]
  Range (min … max):    95.3 ms … 101.6 ms    30 runs
 
Benchmark 4: /tmp/php/clang-saki test.php
  Time (mean ± σ):     116.5 ms ±   2.9 ms    [User: 114.5 ms, System: 1.9 ms]
  Range (min … max):   113.3 ms … 124.5 ms    24 runs
 
Benchmark 5: /tmp/php/gcc-tim test.php
  Time (mean ± σ):     101.1 ms ±   6.1 ms    [User: 99.2 ms, System: 1.8 ms]
  Range (min … max):    97.3 ms … 121.8 ms    29 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark 6: /tmp/php/clang-tim test.php
  Time (mean ± σ):     117.2 ms ±   2.0 ms    [User: 115.5 ms, System: 1.5 ms]
  Range (min … max):   114.1 ms … 124.7 ms    23 runs
 
Summary
  /tmp/php/gcc-saki test.php ran
    1.05 ± 0.06 times faster than /tmp/php/gcc-tim test.php
    1.20 ± 0.03 times faster than /tmp/php/clang-saki test.php
    1.21 ± 0.03 times faster than /tmp/php/clang-tim test.php
    1.24 ± 0.02 times faster than /tmp/php/gcc-baseline test.php
    1.69 ± 0.05 times faster than /tmp/php/clang-baseline test.php

For Mt19937 + 16 bytes. The baseline version is faster than both Saki's and my version for gcc. For clang my version is faster.

<?php

use Random\Engine\Mt19937;
use Random\Engine\PcgOneseq128XslRr64;
use Random\Randomizer;

$r = new Randomizer(new Mt19937(0));

for ($i = 0; $i < 5000000; $i++) {
    $r->getBytes(16);
}

$ hyperfine -L compiler gcc,clang -L binary baseline,saki,tim '/tmp/php/{compiler}-{binary} test.php'
Benchmark 1: /tmp/php/gcc-baseline test.php
  Time (mean ± σ):     149.6 ms ±   2.7 ms    [User: 146.6 ms, System: 2.7 ms]
  Range (min … max):   146.6 ms … 155.9 ms    19 runs
 
Benchmark 2: /tmp/php/clang-baseline test.php
  Time (mean ± σ):     194.6 ms ±   0.9 ms    [User: 192.1 ms, System: 2.5 ms]
  Range (min … max):   193.4 ms … 197.2 ms    15 runs
 
Benchmark 3: /tmp/php/gcc-saki test.php
  Time (mean ± σ):     164.3 ms ±   1.3 ms    [User: 160.9 ms, System: 3.4 ms]
  Range (min … max):   163.0 ms … 168.0 ms    17 runs
 
Benchmark 4: /tmp/php/clang-saki test.php
  Time (mean ± σ):     190.3 ms ±   2.9 ms    [User: 188.0 ms, System: 2.2 ms]
  Range (min … max):   187.4 ms … 199.3 ms    15 runs
 
Benchmark 5: /tmp/php/gcc-tim test.php
  Time (mean ± σ):     162.6 ms ±   5.6 ms    [User: 159.3 ms, System: 2.9 ms]
  Range (min … max):   158.6 ms … 183.0 ms    18 runs
 
Benchmark 6: /tmp/php/clang-tim test.php
  Time (mean ± σ):     182.8 ms ±   2.7 ms    [User: 180.0 ms, System: 2.7 ms]
  Range (min … max):   180.1 ms … 191.1 ms    16 runs
 
Summary
  /tmp/php/gcc-baseline test.php ran
    1.09 ± 0.04 times faster than /tmp/php/gcc-tim test.php
    1.10 ± 0.02 times faster than /tmp/php/gcc-saki test.php
    1.22 ± 0.03 times faster than /tmp/php/clang-tim test.php
    1.27 ± 0.03 times faster than /tmp/php/clang-saki test.php
    1.30 ± 0.02 times faster than /tmp/php/clang-baseline test.php

For Pcg + 20 bytes. Saki's version is faster.

<?php

use Random\Engine\Mt19937;
use Random\Engine\PcgOneseq128XslRr64;
use Random\Randomizer;

$r = new Randomizer(new PcgOneseq128XslRr64(0));

for ($i = 0; $i < 5000000; $i++) {
    $r->getBytes(20);
}

$ hyperfine -L compiler gcc,clang -L binary baseline,saki,tim '/tmp/php/{compiler}-{binary} test.php'
Benchmark 1: /tmp/php/gcc-baseline test.php
  Time (mean ± σ):     136.8 ms ±   1.4 ms    [User: 134.0 ms, System: 2.6 ms]
  Range (min … max):   134.5 ms … 139.3 ms    21 runs
 
Benchmark 2: /tmp/php/clang-baseline test.php
  Time (mean ± σ):     186.5 ms ±   4.5 ms    [User: 183.6 ms, System: 2.6 ms]
  Range (min … max):   181.4 ms … 197.3 ms    16 runs
 
Benchmark 3: /tmp/php/gcc-saki test.php
  Time (mean ± σ):     111.5 ms ±   4.4 ms    [User: 110.1 ms, System: 1.2 ms]
  Range (min … max):   108.0 ms … 124.6 ms    23 runs
 
Benchmark 4: /tmp/php/clang-saki test.php
  Time (mean ± σ):     132.7 ms ±   5.8 ms    [User: 130.5 ms, System: 2.1 ms]
  Range (min … max):   127.2 ms … 146.9 ms    22 runs
 
Benchmark 5: /tmp/php/gcc-tim test.php
  Time (mean ± σ):     122.2 ms ±   3.4 ms    [User: 119.8 ms, System: 2.2 ms]
  Range (min … max):   118.5 ms … 133.2 ms    24 runs
 
Benchmark 6: /tmp/php/clang-tim test.php
  Time (mean ± σ):     140.9 ms ±   3.5 ms    [User: 138.7 ms, System: 2.1 ms]
  Range (min … max):   136.7 ms … 151.3 ms    21 runs
 
Summary
  /tmp/php/gcc-saki test.php ran
    1.10 ± 0.05 times faster than /tmp/php/gcc-tim test.php
    1.19 ± 0.07 times faster than /tmp/php/clang-saki test.php
    1.23 ± 0.05 times faster than /tmp/php/gcc-baseline test.php
    1.26 ± 0.06 times faster than /tmp/php/clang-tim test.php
    1.67 ± 0.08 times faster than /tmp/php/clang-baseline test.php

For Pcg + 32 bytes. For gcc they are equal, for clang Saki's version is faster.

<?php

use Random\Engine\Mt19937;
use Random\Engine\PcgOneseq128XslRr64;
use Random\Randomizer;

$r = new Randomizer(new PcgOneseq128XslRr64(0));

for ($i = 0; $i < 5000000; $i++) {
    $r->getBytes(32);
}

$ hyperfine -L compiler gcc,clang -L binary baseline,saki,tim '/tmp/php/{compiler}-{binary} test.php'
Benchmark 1: /tmp/php/gcc-baseline test.php
  Time (mean ± σ):     167.2 ms ±   3.5 ms    [User: 164.9 ms, System: 2.1 ms]
  Range (min … max):   164.7 ms … 180.1 ms    17 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark 2: /tmp/php/clang-baseline test.php
  Time (mean ± σ):     236.3 ms ±   4.4 ms    [User: 232.3 ms, System: 3.8 ms]
  Range (min … max):   232.6 ms … 248.2 ms    12 runs
 
Benchmark 3: /tmp/php/gcc-saki test.php
  Time (mean ± σ):     119.8 ms ±   6.0 ms    [User: 118.1 ms, System: 1.5 ms]
  Range (min … max):   115.0 ms … 140.3 ms    22 runs
 
Benchmark 4: /tmp/php/clang-saki test.php
  Time (mean ± σ):     130.3 ms ±   4.5 ms    [User: 127.8 ms, System: 2.1 ms]
  Range (min … max):   126.6 ms … 147.7 ms    22 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark 5: /tmp/php/gcc-tim test.php
  Time (mean ± σ):     118.3 ms ±   4.4 ms    [User: 116.0 ms, System: 2.1 ms]
  Range (min … max):   115.4 ms … 139.0 ms    25 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark 6: /tmp/php/clang-tim test.php
  Time (mean ± σ):     135.9 ms ±   6.1 ms    [User: 133.9 ms, System: 1.8 ms]
  Range (min … max):   131.1 ms … 152.8 ms    22 runs
 
Summary
  /tmp/php/gcc-tim test.php ran
    1.01 ± 0.06 times faster than /tmp/php/gcc-saki test.php
    1.10 ± 0.06 times faster than /tmp/php/clang-saki test.php
    1.15 ± 0.07 times faster than /tmp/php/clang-tim test.php
    1.41 ± 0.06 times faster than /tmp/php/gcc-baseline test.php
    2.00 ± 0.08 times faster than /tmp/php/clang-baseline test.php

For Xoshiro256** + 32 bytes. For gcc they are equal, for clang my version is faster.

<?php

use Random\Engine\Mt19937;
use Random\Engine\PcgOneseq128XslRr64;
use Random\Engine\Xoshiro256StarStar;
use Random\Randomizer;

$r = new Randomizer(new Xoshiro256StarStar(0));

for ($i = 0; $i < 5000000; $i++) {
    $r->getBytes(32);
}

$ hyperfine -L compiler gcc,clang -L binary baseline,saki,tim '/tmp/php/{compiler}-{binary} test.php'
Benchmark 1: /tmp/php/gcc-baseline test.php
  Time (mean ± σ):     166.5 ms ±   1.3 ms    [User: 164.7 ms, System: 1.7 ms]
  Range (min … max):   164.3 ms … 169.1 ms    17 runs
 
Benchmark 2: /tmp/php/clang-baseline test.php
  Time (mean ± σ):     229.5 ms ±   2.2 ms    [User: 227.1 ms, System: 2.1 ms]
  Range (min … max):   226.5 ms … 232.3 ms    13 runs
 
Benchmark 3: /tmp/php/gcc-saki test.php
  Time (mean ± σ):     118.1 ms ±   0.9 ms    [User: 116.5 ms, System: 1.5 ms]
  Range (min … max):   116.8 ms … 120.6 ms    25 runs
 
Benchmark 4: /tmp/php/clang-saki test.php
  Time (mean ± σ):     135.8 ms ±   2.6 ms    [User: 133.7 ms, System: 2.0 ms]
  Range (min … max):   132.3 ms … 144.9 ms    21 runs
 
Benchmark 5: /tmp/php/gcc-tim test.php
  Time (mean ± σ):     119.9 ms ±   4.9 ms    [User: 117.3 ms, System: 2.4 ms]
  Range (min … max):   117.6 ms … 142.2 ms    24 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark 6: /tmp/php/clang-tim test.php
  Time (mean ± σ):     131.6 ms ±   3.1 ms    [User: 128.8 ms, System: 2.7 ms]
  Range (min … max):   129.3 ms … 144.7 ms    23 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Summary
  /tmp/php/gcc-saki test.php ran
    1.02 ± 0.04 times faster than /tmp/php/gcc-tim test.php
    1.11 ± 0.03 times faster than /tmp/php/clang-tim test.php
    1.15 ± 0.02 times faster than /tmp/php/clang-saki test.php
    1.41 ± 0.02 times faster than /tmp/php/gcc-baseline test.php
    1.94 ± 0.02 times faster than /tmp/php/clang-baseline test.php

For Xoshiro + 16 bytes. They are equal.

<?php

use Random\Engine\Mt19937;
use Random\Engine\PcgOneseq128XslRr64;
use Random\Engine\Xoshiro256StarStar;
use Random\Randomizer;

$r = new Randomizer(new Xoshiro256StarStar(0));

for ($i = 0; $i < 5000000; $i++) {
    $r->getBytes(16);
}

Benchmark 1: /tmp/php/gcc-baseline test.php
  Time (mean ± σ):     119.9 ms ±   2.6 ms    [User: 118.2 ms, System: 1.7 ms]
  Range (min … max):   117.9 ms … 129.5 ms    23 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark 2: /tmp/php/clang-baseline test.php
  Time (mean ± σ):     158.1 ms ±   3.4 ms    [User: 156.2 ms, System: 1.7 ms]
  Range (min … max):   155.0 ms … 171.2 ms    19 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark 3: /tmp/php/gcc-saki test.php
  Time (mean ± σ):      97.1 ms ±   1.4 ms    [User: 94.8 ms, System: 2.1 ms]
  Range (min … max):    95.3 ms … 102.7 ms    30 runs
 
Benchmark 4: /tmp/php/clang-saki test.php
  Time (mean ± σ):     116.6 ms ±   2.7 ms    [User: 114.4 ms, System: 2.3 ms]
  Range (min … max):   113.2 ms … 124.9 ms    24 runs
 
Benchmark 5: /tmp/php/gcc-tim test.php
  Time (mean ± σ):      98.6 ms ±   1.3 ms    [User: 96.0 ms, System: 2.6 ms]
  Range (min … max):    96.9 ms … 102.6 ms    29 runs
 
Benchmark 6: /tmp/php/clang-tim test.php
  Time (mean ± σ):     115.8 ms ±   2.9 ms    [User: 113.3 ms, System: 2.5 ms]
  Range (min … max):   113.1 ms … 124.2 ms    25 runs
 
Summary
  /tmp/php/gcc-saki test.php ran
    1.02 ± 0.02 times faster than /tmp/php/gcc-tim test.php
    1.19 ± 0.03 times faster than /tmp/php/clang-tim test.php
    1.20 ± 0.03 times faster than /tmp/php/clang-saki test.php
    1.24 ± 0.03 times faster than /tmp/php/gcc-baseline test.php
    1.63 ± 0.04 times faster than /tmp/php/clang-baseline test.php

For Mt19937 + 32 bytes. The baseline slightly beats my version for gcc. For clang my version is the fastest.

<?php

use Random\Engine\Mt19937;
use Random\Engine\PcgOneseq128XslRr64;
use Random\Engine\Xoshiro256StarStar;
use Random\Randomizer;

$r = new Randomizer(new Mt19937(0));

for ($i = 0; $i < 5000000; $i++) {
    $r->getBytes(32);
}

$ hyperfine -L compiler gcc,clang -L binary baseline,saki,tim '/tmp/php/{compiler}-{binary} test.php'
Benchmark 1: /tmp/php/gcc-baseline test.php
  Time (mean ± σ):     232.3 ms ±   8.5 ms    [User: 228.8 ms, System: 3.3 ms]
  Range (min … max):   225.6 ms … 255.8 ms    13 runs
 
Benchmark 2: /tmp/php/clang-baseline test.php
  Time (mean ± σ):     303.6 ms ±   9.8 ms    [User: 301.4 ms, System: 2.0 ms]
  Range (min … max):   294.0 ms … 323.7 ms    10 runs
 
Benchmark 3: /tmp/php/gcc-saki test.php
  Time (mean ± σ):     271.4 ms ±  12.6 ms    [User: 268.3 ms, System: 2.3 ms]
  Range (min … max):   259.9 ms … 303.4 ms    11 runs
 
Benchmark 4: /tmp/php/clang-saki test.php
  Time (mean ± σ):     282.8 ms ±   2.7 ms    [User: 280.9 ms, System: 1.7 ms]
  Range (min … max):   280.8 ms … 288.1 ms    10 runs
 
Benchmark 5: /tmp/php/gcc-tim test.php
  Time (mean ± σ):     238.3 ms ±   3.0 ms    [User: 235.5 ms, System: 2.6 ms]
  Range (min … max):   235.7 ms … 247.2 ms    12 runs
 
  Warning: The first benchmarking run for this command was significantly slower than the rest (247.2 ms). This could be caused by (filesystem) caches that were not filled until after the first run. You should consider using the '--warmup' option to fill those caches before the actual benchmark. Alternatively, use the '--prepare' option to clear the caches before each timing run.
 
Benchmark 6: /tmp/php/clang-tim test.php
  Time (mean ± σ):     268.1 ms ±   7.7 ms    [User: 266.0 ms, System: 1.6 ms]
  Range (min … max):   262.3 ms … 286.1 ms    10 runs
 
Summary
  /tmp/php/gcc-baseline test.php ran
    1.03 ± 0.04 times faster than /tmp/php/gcc-tim test.php
    1.15 ± 0.05 times faster than /tmp/php/clang-tim test.php
    1.17 ± 0.07 times faster than /tmp/php/gcc-saki test.php
    1.22 ± 0.05 times faster than /tmp/php/clang-saki test.php
    1.31 ± 0.06 times faster than /tmp/php/clang-baseline test.php

Closes #14891

This patch greatly improves the performance for the common case of using a 64-bit engine and requesting a length that is a multiple of 8. It does so by providing a fast path that will just `memcpy()` (which will be optimized out) the returned uint64_t directly into the output buffer, byteswapping it for big endian architectures. The existing byte-wise copying logic was mostly left alone. It only received an optimization of the shifting and masking that was previously applied to `Randomizer::getBytesFromString()` in 1fc2ddc. Co-authored-by: Saki Takamachi <saki@php.net>

Girgias · 2024-08-04T15:40:07Z

ext/random/randomizer.c

+		}
+
+#ifdef WORDS_BIGENDIAN
+		uint64_t swapped = ZEND_BYTES_SWAP64(result.result);
+		memcpy(ZSTR_VAL(retval) + total_size, &swapped, 8);
+#else
+		memcpy(ZSTR_VAL(retval) + total_size, &result.result, 8);
+#endif
+		total_size += 8;
+	}
+


I am confused, is it intended for this to run the second while loop again? Or should it skip it to return the value directly?

It will enter the second loop if the requested length is not a multiple of 8 to handle the remaining 1-7 bytes.

SakiTakamachi

The use of goto is smart.
LGTM!

zeriyoshi

LGTM, Thank you!

TimWolla requested review from Girgias and SakiTakamachi August 4, 2024 15:29

TimWolla requested a review from zeriyoshi as a code owner August 4, 2024 15:29

github-actions bot added the Extension: random label Aug 4, 2024

TimWolla mentioned this pull request Aug 4, 2024

ext/random: Optimized getBytes loop processing #14891

Closed

Girgias reviewed Aug 4, 2024

View reviewed changes

SakiTakamachi approved these changes Aug 5, 2024

View reviewed changes

zeriyoshi approved these changes Aug 5, 2024

View reviewed changes

Girgias approved these changes Aug 5, 2024

View reviewed changes

TimWolla merged commit 31e2d2b into php:master Aug 5, 2024
9 of 11 checks passed

TimWolla deleted the random-getBytes-optimize branch August 5, 2024 17:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

random: Optimize Randomizer::getBytes() #15228

random: Optimize Randomizer::getBytes() #15228

Uh oh!

TimWolla commented Aug 4, 2024

Uh oh!

Girgias Aug 4, 2024

Uh oh!

TimWolla Aug 4, 2024

Uh oh!

SakiTakamachi left a comment

Uh oh!

zeriyoshi left a comment

Uh oh!

Uh oh!

Uh oh!

random: Optimize Randomizer::getBytes() #15228

random: Optimize Randomizer::getBytes() #15228

Uh oh!

Conversation

TimWolla commented Aug 4, 2024

Uh oh!

Girgias Aug 4, 2024

Choose a reason for hiding this comment

Uh oh!

TimWolla Aug 4, 2024

Choose a reason for hiding this comment

Uh oh!

SakiTakamachi left a comment

Choose a reason for hiding this comment

Uh oh!

zeriyoshi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!