Skip to content

random: Optimize Randomizer::getBytes() #15228

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 5, 2024

Conversation

TimWolla
Copy link
Member

@TimWolla TimWolla commented Aug 4, 2024

This is my attempt at optimizing Randomizer::getBytes(), building on the work done by @SakiTakamachi in #14891, but modifying it so much that the spirit of Saki's version is no longer recognizable. Thus I'm filing this as a separate PR.


This patch greatly improves the performance for the common case of using a 64-bit engine and requesting a length that is a multiple of 8.

It does so by providing a fast path that will just memcpy() (which will be optimized out) the returned uint64_t directly into the output buffer, byteswapping it for big endian architectures.

The existing byte-wise copying logic was mostly left alone. It only received an optimization of the shifting and masking that was previously applied to Randomizer::getBytesFromString() in 1fc2ddc.


Benchmarks:

The baseline commit is d6a75e1. I've rebased both Saki's and my branch onto that commit.

gcc is configured as: ./configure --enable-zend-test --enable-option-checking=fatal --enable-phpdbg --enable-fpm
clang is configured as: ./configure --enable-zend-test --enable-option-checking=fatal --enable-phpdbg --enable-fpm --enable-werror CC=clang-16 CXX=clang++-16

Summary: Results are unexpectedly dependent on the engine chosen even if both engines are 64-bit engines. They are also heavily dependent on the compiler. There is no clear winner. My version appears to be better for 32 bits engines, whereas Saki's version appears to be better for short lengths that are not a multiple of 8. For the important cases of 16 bytes they are equal and for 32 bytes they are equal with gcc. For clang it depends on the engine.

For Mt19937 and 1024 bytes. My version is the fastest for both clang and gcc.

<?php

use Random\Engine\Mt19937;
use Random\Engine\PcgOneseq128XslRr64;
use Random\Randomizer;

$r = new Randomizer(new Mt19937(0));

for ($i = 0; $i < 500000; $i++) {
    $r->getBytes(1024);
}
$ hyperfine -L compiler gcc,clang -L binary baseline,saki,tim '/tmp/php/{compiler}-{binary} test.php'
Benchmark 1: /tmp/php/gcc-baseline test.php
  Time (mean ± σ):     522.7 ms ±   5.9 ms    [User: 520.1 ms, System: 2.4 ms]
  Range (min … max):   515.1 ms … 530.8 ms    10 runs
 
Benchmark 2: /tmp/php/clang-baseline test.php
  Time (mean ± σ):     665.6 ms ±  10.5 ms    [User: 663.2 ms, System: 2.1 ms]
  Range (min … max):   658.6 ms … 693.3 ms    10 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark 3: /tmp/php/gcc-saki test.php
  Time (mean ± σ):     630.6 ms ±   2.9 ms    [User: 627.5 ms, System: 2.8 ms]
  Range (min … max):   627.2 ms … 635.9 ms    10 runs
 
Benchmark 4: /tmp/php/clang-saki test.php
  Time (mean ± σ):     624.5 ms ±   4.0 ms    [User: 620.9 ms, System: 3.3 ms]
  Range (min … max):   621.8 ms … 632.5 ms    10 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark 5: /tmp/php/gcc-tim test.php
  Time (mean ± σ):     512.9 ms ±   4.6 ms    [User: 510.9 ms, System: 1.7 ms]
  Range (min … max):   506.0 ms … 522.2 ms    10 runs
 
Benchmark 6: /tmp/php/clang-tim test.php
  Time (mean ± σ):     571.0 ms ±   1.6 ms    [User: 568.6 ms, System: 2.4 ms]
  Range (min … max):   567.6 ms … 572.9 ms    10 runs
 
Summary
  /tmp/php/gcc-tim test.php ran
    1.02 ± 0.01 times faster than /tmp/php/gcc-baseline test.php
    1.11 ± 0.01 times faster than /tmp/php/clang-tim test.php
    1.22 ± 0.01 times faster than /tmp/php/clang-saki test.php
    1.23 ± 0.01 times faster than /tmp/php/gcc-saki test.php
    1.30 ± 0.02 times faster than /tmp/php/clang-baseline test.php

For PcgOneseq128XslRr64 and 1024 bytes. Saki's and my version are equal for both clang and gcc.

<?php

use Random\Engine\Mt19937;
use Random\Engine\PcgOneseq128XslRr64;
use Random\Randomizer;

$r = new Randomizer(new PcgOneseq128XslRr64(0));

for ($i = 0; $i < 500000; $i++) {
    $r->getBytes(1024);
}
$ hyperfine -L compiler gcc,clang -L binary baseline,saki,tim '/tmp/php/{compiler}-{binary} test.php'
Benchmark 1: /tmp/php/gcc-baseline test.php
  Time (mean ± σ):     349.1 ms ±   3.5 ms    [User: 346.0 ms, System: 3.1 ms]
  Range (min … max):   346.2 ms … 357.3 ms    10 runs
 
Benchmark 2: /tmp/php/clang-baseline test.php
  Time (mean ± σ):     471.3 ms ±   8.6 ms    [User: 468.5 ms, System: 2.7 ms]
  Range (min … max):   465.9 ms … 492.3 ms    10 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark 3: /tmp/php/gcc-saki test.php
  Time (mean ± σ):     171.1 ms ±   0.7 ms    [User: 168.4 ms, System: 2.8 ms]
  Range (min … max):   170.1 ms … 173.5 ms    17 runs
 
Benchmark 4: /tmp/php/clang-saki test.php
  Time (mean ± σ):     137.6 ms ±   4.4 ms    [User: 135.2 ms, System: 2.4 ms]
  Range (min … max):   134.4 ms … 155.6 ms    21 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark 5: /tmp/php/gcc-tim test.php
  Time (mean ± σ):     170.6 ms ±   0.5 ms    [User: 168.8 ms, System: 1.9 ms]
  Range (min … max):   169.8 ms … 171.5 ms    17 runs
 
Benchmark 6: /tmp/php/clang-tim test.php
  Time (mean ± σ):     139.0 ms ±   1.7 ms    [User: 136.5 ms, System: 2.4 ms]
  Range (min … max):   136.7 ms … 142.9 ms    20 runs
 
Summary
  /tmp/php/clang-saki test.php ran
    1.01 ± 0.03 times faster than /tmp/php/clang-tim test.php
    1.24 ± 0.04 times faster than /tmp/php/gcc-tim test.php
    1.24 ± 0.04 times faster than /tmp/php/gcc-saki test.php
    2.54 ± 0.09 times faster than /tmp/php/gcc-baseline test.php
    3.43 ± 0.13 times faster than /tmp/php/clang-baseline test.php

For Secure and 16 bytes. All of them are equal.

<?php

use Random\Engine\Mt19937;
use Random\Engine\PcgOneseq128XslRr64;
use Random\Randomizer;

$r = new Randomizer();

for ($i = 0; $i < 500000; $i++) {
    $r->getBytes(16);
}
$ hyperfine -L compiler gcc,clang -L binary baseline,saki,tim '/tmp/php/{compiler}-{binary} test.php'
Benchmark 1: /tmp/php/gcc-baseline test.php
  Time (mean ± σ):     257.7 ms ±   1.5 ms    [User: 54.4 ms, System: 203.1 ms]
  Range (min … max):   256.8 ms … 261.9 ms    11 runs
 
Benchmark 2: /tmp/php/clang-baseline test.php
  Time (mean ± σ):     264.7 ms ±   6.4 ms    [User: 56.6 ms, System: 208.0 ms]
  Range (min … max):   258.7 ms … 275.4 ms    11 runs
 
Benchmark 3: /tmp/php/gcc-saki test.php
  Time (mean ± σ):     255.1 ms ±   4.1 ms    [User: 51.5 ms, System: 203.6 ms]
  Range (min … max):   253.0 ms … 267.4 ms    11 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark 4: /tmp/php/clang-saki test.php
  Time (mean ± σ):     257.0 ms ±   2.4 ms    [User: 49.3 ms, System: 207.5 ms]
  Range (min … max):   254.9 ms … 261.7 ms    11 runs
 
Benchmark 5: /tmp/php/gcc-tim test.php
  Time (mean ± σ):     257.2 ms ±   8.1 ms    [User: 49.3 ms, System: 207.6 ms]
  Range (min … max):   253.5 ms … 281.0 ms    11 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark 6: /tmp/php/clang-tim test.php
  Time (mean ± σ):     257.5 ms ±   6.4 ms    [User: 51.2 ms, System: 205.9 ms]
  Range (min … max):   254.0 ms … 275.7 ms    11 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Summary
  /tmp/php/gcc-saki test.php ran
    1.01 ± 0.02 times faster than /tmp/php/clang-saki test.php
    1.01 ± 0.04 times faster than /tmp/php/gcc-tim test.php
    1.01 ± 0.03 times faster than /tmp/php/clang-tim test.php
    1.01 ± 0.02 times faster than /tmp/php/gcc-baseline test.php
    1.04 ± 0.03 times faster than /tmp/php/clang-baseline test.php

For Pcg + 16 bytes. Saki's version is faster for gcc, for clang they are equal.

<?php

use Random\Engine\Mt19937;
use Random\Engine\PcgOneseq128XslRr64;
use Random\Randomizer;

$r = new Randomizer(new PcgOneseq128XslRr64(0));

for ($i = 0; $i < 5000000; $i++) {
    $r->getBytes(16);
}
$ hyperfine -L compiler gcc,clang -L binary baseline,saki,tim '/tmp/php/{compiler}-{binary} test.php'
Benchmark 1: /tmp/php/gcc-baseline test.php
  Time (mean ± σ):     119.8 ms ±   1.6 ms    [User: 117.4 ms, System: 2.1 ms]
  Range (min … max):   117.7 ms … 123.8 ms    24 runs
 
Benchmark 2: /tmp/php/clang-baseline test.php
  Time (mean ± σ):     163.2 ms ±   3.9 ms    [User: 160.9 ms, System: 2.0 ms]
  Range (min … max):   158.5 ms … 177.0 ms    18 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark 3: /tmp/php/gcc-saki test.php
  Time (mean ± σ):      96.7 ms ±   1.2 ms    [User: 94.5 ms, System: 2.1 ms]
  Range (min … max):    95.3 ms … 101.6 ms    30 runs
 
Benchmark 4: /tmp/php/clang-saki test.php
  Time (mean ± σ):     116.5 ms ±   2.9 ms    [User: 114.5 ms, System: 1.9 ms]
  Range (min … max):   113.3 ms … 124.5 ms    24 runs
 
Benchmark 5: /tmp/php/gcc-tim test.php
  Time (mean ± σ):     101.1 ms ±   6.1 ms    [User: 99.2 ms, System: 1.8 ms]
  Range (min … max):    97.3 ms … 121.8 ms    29 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark 6: /tmp/php/clang-tim test.php
  Time (mean ± σ):     117.2 ms ±   2.0 ms    [User: 115.5 ms, System: 1.5 ms]
  Range (min … max):   114.1 ms … 124.7 ms    23 runs
 
Summary
  /tmp/php/gcc-saki test.php ran
    1.05 ± 0.06 times faster than /tmp/php/gcc-tim test.php
    1.20 ± 0.03 times faster than /tmp/php/clang-saki test.php
    1.21 ± 0.03 times faster than /tmp/php/clang-tim test.php
    1.24 ± 0.02 times faster than /tmp/php/gcc-baseline test.php
    1.69 ± 0.05 times faster than /tmp/php/clang-baseline test.php

For Mt19937 + 16 bytes. The baseline version is faster than both Saki's and my version for gcc. For clang my version is faster.

<?php

use Random\Engine\Mt19937;
use Random\Engine\PcgOneseq128XslRr64;
use Random\Randomizer;

$r = new Randomizer(new Mt19937(0));

for ($i = 0; $i < 5000000; $i++) {
    $r->getBytes(16);
}
$ hyperfine -L compiler gcc,clang -L binary baseline,saki,tim '/tmp/php/{compiler}-{binary} test.php'
Benchmark 1: /tmp/php/gcc-baseline test.php
  Time (mean ± σ):     149.6 ms ±   2.7 ms    [User: 146.6 ms, System: 2.7 ms]
  Range (min … max):   146.6 ms … 155.9 ms    19 runs
 
Benchmark 2: /tmp/php/clang-baseline test.php
  Time (mean ± σ):     194.6 ms ±   0.9 ms    [User: 192.1 ms, System: 2.5 ms]
  Range (min … max):   193.4 ms … 197.2 ms    15 runs
 
Benchmark 3: /tmp/php/gcc-saki test.php
  Time (mean ± σ):     164.3 ms ±   1.3 ms    [User: 160.9 ms, System: 3.4 ms]
  Range (min … max):   163.0 ms … 168.0 ms    17 runs
 
Benchmark 4: /tmp/php/clang-saki test.php
  Time (mean ± σ):     190.3 ms ±   2.9 ms    [User: 188.0 ms, System: 2.2 ms]
  Range (min … max):   187.4 ms … 199.3 ms    15 runs
 
Benchmark 5: /tmp/php/gcc-tim test.php
  Time (mean ± σ):     162.6 ms ±   5.6 ms    [User: 159.3 ms, System: 2.9 ms]
  Range (min … max):   158.6 ms … 183.0 ms    18 runs
 
Benchmark 6: /tmp/php/clang-tim test.php
  Time (mean ± σ):     182.8 ms ±   2.7 ms    [User: 180.0 ms, System: 2.7 ms]
  Range (min … max):   180.1 ms … 191.1 ms    16 runs
 
Summary
  /tmp/php/gcc-baseline test.php ran
    1.09 ± 0.04 times faster than /tmp/php/gcc-tim test.php
    1.10 ± 0.02 times faster than /tmp/php/gcc-saki test.php
    1.22 ± 0.03 times faster than /tmp/php/clang-tim test.php
    1.27 ± 0.03 times faster than /tmp/php/clang-saki test.php
    1.30 ± 0.02 times faster than /tmp/php/clang-baseline test.php

For Pcg + 20 bytes. Saki's version is faster.

<?php

use Random\Engine\Mt19937;
use Random\Engine\PcgOneseq128XslRr64;
use Random\Randomizer;

$r = new Randomizer(new PcgOneseq128XslRr64(0));

for ($i = 0; $i < 5000000; $i++) {
    $r->getBytes(20);
}
$ hyperfine -L compiler gcc,clang -L binary baseline,saki,tim '/tmp/php/{compiler}-{binary} test.php'
Benchmark 1: /tmp/php/gcc-baseline test.php
  Time (mean ± σ):     136.8 ms ±   1.4 ms    [User: 134.0 ms, System: 2.6 ms]
  Range (min … max):   134.5 ms … 139.3 ms    21 runs
 
Benchmark 2: /tmp/php/clang-baseline test.php
  Time (mean ± σ):     186.5 ms ±   4.5 ms    [User: 183.6 ms, System: 2.6 ms]
  Range (min … max):   181.4 ms … 197.3 ms    16 runs
 
Benchmark 3: /tmp/php/gcc-saki test.php
  Time (mean ± σ):     111.5 ms ±   4.4 ms    [User: 110.1 ms, System: 1.2 ms]
  Range (min … max):   108.0 ms … 124.6 ms    23 runs
 
Benchmark 4: /tmp/php/clang-saki test.php
  Time (mean ± σ):     132.7 ms ±   5.8 ms    [User: 130.5 ms, System: 2.1 ms]
  Range (min … max):   127.2 ms … 146.9 ms    22 runs
 
Benchmark 5: /tmp/php/gcc-tim test.php
  Time (mean ± σ):     122.2 ms ±   3.4 ms    [User: 119.8 ms, System: 2.2 ms]
  Range (min … max):   118.5 ms … 133.2 ms    24 runs
 
Benchmark 6: /tmp/php/clang-tim test.php
  Time (mean ± σ):     140.9 ms ±   3.5 ms    [User: 138.7 ms, System: 2.1 ms]
  Range (min … max):   136.7 ms … 151.3 ms    21 runs
 
Summary
  /tmp/php/gcc-saki test.php ran
    1.10 ± 0.05 times faster than /tmp/php/gcc-tim test.php
    1.19 ± 0.07 times faster than /tmp/php/clang-saki test.php
    1.23 ± 0.05 times faster than /tmp/php/gcc-baseline test.php
    1.26 ± 0.06 times faster than /tmp/php/clang-tim test.php
    1.67 ± 0.08 times faster than /tmp/php/clang-baseline test.php

For Pcg + 32 bytes. For gcc they are equal, for clang Saki's version is faster.

<?php

use Random\Engine\Mt19937;
use Random\Engine\PcgOneseq128XslRr64;
use Random\Randomizer;

$r = new Randomizer(new PcgOneseq128XslRr64(0));

for ($i = 0; $i < 5000000; $i++) {
    $r->getBytes(32);
}
$ hyperfine -L compiler gcc,clang -L binary baseline,saki,tim '/tmp/php/{compiler}-{binary} test.php'
Benchmark 1: /tmp/php/gcc-baseline test.php
  Time (mean ± σ):     167.2 ms ±   3.5 ms    [User: 164.9 ms, System: 2.1 ms]
  Range (min … max):   164.7 ms … 180.1 ms    17 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark 2: /tmp/php/clang-baseline test.php
  Time (mean ± σ):     236.3 ms ±   4.4 ms    [User: 232.3 ms, System: 3.8 ms]
  Range (min … max):   232.6 ms … 248.2 ms    12 runs
 
Benchmark 3: /tmp/php/gcc-saki test.php
  Time (mean ± σ):     119.8 ms ±   6.0 ms    [User: 118.1 ms, System: 1.5 ms]
  Range (min … max):   115.0 ms … 140.3 ms    22 runs
 
Benchmark 4: /tmp/php/clang-saki test.php
  Time (mean ± σ):     130.3 ms ±   4.5 ms    [User: 127.8 ms, System: 2.1 ms]
  Range (min … max):   126.6 ms … 147.7 ms    22 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark 5: /tmp/php/gcc-tim test.php
  Time (mean ± σ):     118.3 ms ±   4.4 ms    [User: 116.0 ms, System: 2.1 ms]
  Range (min … max):   115.4 ms … 139.0 ms    25 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark 6: /tmp/php/clang-tim test.php
  Time (mean ± σ):     135.9 ms ±   6.1 ms    [User: 133.9 ms, System: 1.8 ms]
  Range (min … max):   131.1 ms … 152.8 ms    22 runs
 
Summary
  /tmp/php/gcc-tim test.php ran
    1.01 ± 0.06 times faster than /tmp/php/gcc-saki test.php
    1.10 ± 0.06 times faster than /tmp/php/clang-saki test.php
    1.15 ± 0.07 times faster than /tmp/php/clang-tim test.php
    1.41 ± 0.06 times faster than /tmp/php/gcc-baseline test.php
    2.00 ± 0.08 times faster than /tmp/php/clang-baseline test.php

For Xoshiro256** + 32 bytes. For gcc they are equal, for clang my version is faster.

<?php

use Random\Engine\Mt19937;
use Random\Engine\PcgOneseq128XslRr64;
use Random\Engine\Xoshiro256StarStar;
use Random\Randomizer;

$r = new Randomizer(new Xoshiro256StarStar(0));

for ($i = 0; $i < 5000000; $i++) {
    $r->getBytes(32);
}
$ hyperfine -L compiler gcc,clang -L binary baseline,saki,tim '/tmp/php/{compiler}-{binary} test.php'
Benchmark 1: /tmp/php/gcc-baseline test.php
  Time (mean ± σ):     166.5 ms ±   1.3 ms    [User: 164.7 ms, System: 1.7 ms]
  Range (min … max):   164.3 ms … 169.1 ms    17 runs
 
Benchmark 2: /tmp/php/clang-baseline test.php
  Time (mean ± σ):     229.5 ms ±   2.2 ms    [User: 227.1 ms, System: 2.1 ms]
  Range (min … max):   226.5 ms … 232.3 ms    13 runs
 
Benchmark 3: /tmp/php/gcc-saki test.php
  Time (mean ± σ):     118.1 ms ±   0.9 ms    [User: 116.5 ms, System: 1.5 ms]
  Range (min … max):   116.8 ms … 120.6 ms    25 runs
 
Benchmark 4: /tmp/php/clang-saki test.php
  Time (mean ± σ):     135.8 ms ±   2.6 ms    [User: 133.7 ms, System: 2.0 ms]
  Range (min … max):   132.3 ms … 144.9 ms    21 runs
 
Benchmark 5: /tmp/php/gcc-tim test.php
  Time (mean ± σ):     119.9 ms ±   4.9 ms    [User: 117.3 ms, System: 2.4 ms]
  Range (min … max):   117.6 ms … 142.2 ms    24 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark 6: /tmp/php/clang-tim test.php
  Time (mean ± σ):     131.6 ms ±   3.1 ms    [User: 128.8 ms, System: 2.7 ms]
  Range (min … max):   129.3 ms … 144.7 ms    23 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Summary
  /tmp/php/gcc-saki test.php ran
    1.02 ± 0.04 times faster than /tmp/php/gcc-tim test.php
    1.11 ± 0.03 times faster than /tmp/php/clang-tim test.php
    1.15 ± 0.02 times faster than /tmp/php/clang-saki test.php
    1.41 ± 0.02 times faster than /tmp/php/gcc-baseline test.php
    1.94 ± 0.02 times faster than /tmp/php/clang-baseline test.php

For Xoshiro + 16 bytes. They are equal.

<?php

use Random\Engine\Mt19937;
use Random\Engine\PcgOneseq128XslRr64;
use Random\Engine\Xoshiro256StarStar;
use Random\Randomizer;

$r = new Randomizer(new Xoshiro256StarStar(0));

for ($i = 0; $i < 5000000; $i++) {
    $r->getBytes(16);
}
Benchmark 1: /tmp/php/gcc-baseline test.php
  Time (mean ± σ):     119.9 ms ±   2.6 ms    [User: 118.2 ms, System: 1.7 ms]
  Range (min … max):   117.9 ms … 129.5 ms    23 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark 2: /tmp/php/clang-baseline test.php
  Time (mean ± σ):     158.1 ms ±   3.4 ms    [User: 156.2 ms, System: 1.7 ms]
  Range (min … max):   155.0 ms … 171.2 ms    19 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark 3: /tmp/php/gcc-saki test.php
  Time (mean ± σ):      97.1 ms ±   1.4 ms    [User: 94.8 ms, System: 2.1 ms]
  Range (min … max):    95.3 ms … 102.7 ms    30 runs
 
Benchmark 4: /tmp/php/clang-saki test.php
  Time (mean ± σ):     116.6 ms ±   2.7 ms    [User: 114.4 ms, System: 2.3 ms]
  Range (min … max):   113.2 ms … 124.9 ms    24 runs
 
Benchmark 5: /tmp/php/gcc-tim test.php
  Time (mean ± σ):      98.6 ms ±   1.3 ms    [User: 96.0 ms, System: 2.6 ms]
  Range (min … max):    96.9 ms … 102.6 ms    29 runs
 
Benchmark 6: /tmp/php/clang-tim test.php
  Time (mean ± σ):     115.8 ms ±   2.9 ms    [User: 113.3 ms, System: 2.5 ms]
  Range (min … max):   113.1 ms … 124.2 ms    25 runs
 
Summary
  /tmp/php/gcc-saki test.php ran
    1.02 ± 0.02 times faster than /tmp/php/gcc-tim test.php
    1.19 ± 0.03 times faster than /tmp/php/clang-tim test.php
    1.20 ± 0.03 times faster than /tmp/php/clang-saki test.php
    1.24 ± 0.03 times faster than /tmp/php/gcc-baseline test.php
    1.63 ± 0.04 times faster than /tmp/php/clang-baseline test.php

For Mt19937 + 32 bytes. The baseline slightly beats my version for gcc. For clang my version is the fastest.

<?php

use Random\Engine\Mt19937;
use Random\Engine\PcgOneseq128XslRr64;
use Random\Engine\Xoshiro256StarStar;
use Random\Randomizer;

$r = new Randomizer(new Mt19937(0));

for ($i = 0; $i < 5000000; $i++) {
    $r->getBytes(32);
}
$ hyperfine -L compiler gcc,clang -L binary baseline,saki,tim '/tmp/php/{compiler}-{binary} test.php'
Benchmark 1: /tmp/php/gcc-baseline test.php
  Time (mean ± σ):     232.3 ms ±   8.5 ms    [User: 228.8 ms, System: 3.3 ms]
  Range (min … max):   225.6 ms … 255.8 ms    13 runs
 
Benchmark 2: /tmp/php/clang-baseline test.php
  Time (mean ± σ):     303.6 ms ±   9.8 ms    [User: 301.4 ms, System: 2.0 ms]
  Range (min … max):   294.0 ms … 323.7 ms    10 runs
 
Benchmark 3: /tmp/php/gcc-saki test.php
  Time (mean ± σ):     271.4 ms ±  12.6 ms    [User: 268.3 ms, System: 2.3 ms]
  Range (min … max):   259.9 ms … 303.4 ms    11 runs
 
Benchmark 4: /tmp/php/clang-saki test.php
  Time (mean ± σ):     282.8 ms ±   2.7 ms    [User: 280.9 ms, System: 1.7 ms]
  Range (min … max):   280.8 ms … 288.1 ms    10 runs
 
Benchmark 5: /tmp/php/gcc-tim test.php
  Time (mean ± σ):     238.3 ms ±   3.0 ms    [User: 235.5 ms, System: 2.6 ms]
  Range (min … max):   235.7 ms … 247.2 ms    12 runs
 
  Warning: The first benchmarking run for this command was significantly slower than the rest (247.2 ms). This could be caused by (filesystem) caches that were not filled until after the first run. You should consider using the '--warmup' option to fill those caches before the actual benchmark. Alternatively, use the '--prepare' option to clear the caches before each timing run.
 
Benchmark 6: /tmp/php/clang-tim test.php
  Time (mean ± σ):     268.1 ms ±   7.7 ms    [User: 266.0 ms, System: 1.6 ms]
  Range (min … max):   262.3 ms … 286.1 ms    10 runs
 
Summary
  /tmp/php/gcc-baseline test.php ran
    1.03 ± 0.04 times faster than /tmp/php/gcc-tim test.php
    1.15 ± 0.05 times faster than /tmp/php/clang-tim test.php
    1.17 ± 0.07 times faster than /tmp/php/gcc-saki test.php
    1.22 ± 0.05 times faster than /tmp/php/clang-saki test.php
    1.31 ± 0.06 times faster than /tmp/php/clang-baseline test.php

Closes #14891

This patch greatly improves the performance for the common case of using a
64-bit engine and requesting a length that is a multiple of 8.

It does so by providing a fast path that will just `memcpy()` (which will be
optimized out) the returned uint64_t directly into the output buffer,
byteswapping it for big endian architectures.

The existing byte-wise copying logic was mostly left alone. It only received an
optimization of the shifting and masking that was previously applied to
`Randomizer::getBytesFromString()` in 1fc2ddc.

Co-authored-by: Saki Takamachi <saki@php.net>
Comment on lines +315 to +325
}

#ifdef WORDS_BIGENDIAN
uint64_t swapped = ZEND_BYTES_SWAP64(result.result);
memcpy(ZSTR_VAL(retval) + total_size, &swapped, 8);
#else
memcpy(ZSTR_VAL(retval) + total_size, &result.result, 8);
#endif
total_size += 8;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused, is it intended for this to run the second while loop again? Or should it skip it to return the value directly?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will enter the second loop if the requested length is not a multiple of 8 to handle the remaining 1-7 bytes.

Copy link
Member

@SakiTakamachi SakiTakamachi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of goto is smart.
LGTM!

Copy link
Contributor

@zeriyoshi zeriyoshi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Thank you!

@TimWolla TimWolla merged commit 31e2d2b into php:master Aug 5, 2024
9 of 11 checks passed
@TimWolla TimWolla deleted the random-getBytes-optimize branch August 5, 2024 17:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants