Skip to content

ext/random: Optimized getBytes loop processing #14891

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 7 commits into from

Conversation

SakiTakamachi
Copy link
Member

@SakiTakamachi SakiTakamachi commented Jul 9, 2024

All measurement results are from the latest commit.

Benchmark codes

Since the description is too long, <?php and use are omitted.

Using PcgOneseq128XslRr64

0.php:

$r = new Randomizer(new PcgOneseq128XslRr64());

for ($i = 0; $i < 10000000; $i++) {
    $r->getBytes(1);
}

1.php:

$r = new Randomizer(new PcgOneseq128XslRr64());

for ($i = 0; $i < 10000000; $i++) {
    $r->getBytes(16);
}

2.php

$r = new Randomizer(new PcgOneseq128XslRr64());

for ($i = 0; $i < 500000; $i++) {
    $r->getBytes(1024);
}

Omit constructor arguments

n0.php:

$r = new Randomizer();

for ($i = 0; $i < 1000000; $i++) {
    $r->getBytes(1);
}

n1.php:

$r = new Randomizer();

for ($i = 0; $i < 500000; $i++) {
    $r->getBytes(16);
}

n2.php:

$r = new Randomizer();

for ($i = 0; $i < 5000; $i++) {
    $r->getBytes(1024);
}

Using PcgOneseq128XslRr64

before

# hyperfine "php /mount/random/0.php" --warmup 10
Benchmark 1: php /mount/random/0.php
  Time (mean ± σ):     278.7 ms ±   2.5 ms    [User: 274.0 ms, System: 4.0 ms]
  Range (min … max):   275.0 ms … 283.1 ms    10 runs
 
# hyperfine "php /mount/random/1.php" --warmup 10
Benchmark 1: php /mount/random/1.php
  Time (mean ± σ):     425.6 ms ±  12.5 ms    [User: 422.3 ms, System: 2.5 ms]
  Range (min … max):   413.2 ms … 457.5 ms    10 runs
 
# hyperfine "php /mount/random/2.php" --warmup 10
Benchmark 1: php /mount/random/2.php
  Time (mean ± σ):     561.9 ms ±   5.2 ms    [User: 558.2 ms, System: 3.0 ms]
  Range (min … max):   556.5 ms … 572.7 ms    10 runs

after Optimized bit shifting

# hyperfine "php /mount/random/0.php" --warmup 10
Benchmark 1: php /mount/random/0.php
  Time (mean ± σ):     267.0 ms ±  13.4 ms    [User: 263.3 ms, System: 3.0 ms]
  Range (min … max):   257.9 ms … 302.5 ms    11 runs
 
# hyperfine "php /mount/random/1.php" --warmup 10
Benchmark 1: php /mount/random/1.php
  Time (mean ± σ):     409.1 ms ±  33.2 ms    [User: 405.3 ms, System: 3.1 ms]
  Range (min … max):   385.3 ms … 476.4 ms    10 runs
 
# hyperfine "php /mount/random/2.php" --warmup 10
Benchmark 1: php /mount/random/2.php
  Time (mean ± σ):     499.8 ms ±   7.8 ms    [User: 496.3 ms, System: 2.9 ms]
  Range (min … max):   491.3 ms … 515.6 ms    10 runs

after Using memcpy

# hyperfine "php /mount/random/0.php" --warmup 10
Benchmark 1: php /mount/random/0.php
  Time (mean ± σ):     295.5 ms ±   9.3 ms    [User: 290.9 ms, System: 3.9 ms]
  Range (min … max):   287.6 ms … 318.5 ms    10 runs
 
# hyperfine "php /mount/random/1.php" --warmup 10
Benchmark 1: php /mount/random/1.php
  Time (mean ± σ):     327.9 ms ±   3.8 ms    [User: 323.6 ms, System: 3.5 ms]
  Range (min … max):   323.0 ms … 332.2 ms    10 runs
 
# hyperfine "php /mount/random/2.php" --warmup 10
Benchmark 1: php /mount/random/2.php
  Time (mean ± σ):     260.5 ms ±   3.0 ms    [User: 256.2 ms, System: 3.7 ms]
  Range (min … max):   255.9 ms … 265.3 ms    11 runs

after Loop optimization

# hyperfine "php /mount/random/0.php" --warmup 10
Benchmark 1: php /mount/random/0.php
  Time (mean ± σ):     268.8 ms ±   2.5 ms    [User: 265.3 ms, System: 2.7 ms]
  Range (min … max):   264.9 ms … 273.9 ms    11 runs

# hyperfine "php /mount/random/1.php" --warmup 10
Benchmark 1: php /mount/random/1.php
  Time (mean ± σ):     323.7 ms ±  13.5 ms    [User: 320.0 ms, System: 3.0 ms]
  Range (min … max):   314.9 ms … 354.8 ms    10 runs
  
# hyperfine "php /mount/random/2.php" --warmup 10
Benchmark 1: php /mount/random/2.php
  Time (mean ± σ):     265.2 ms ±   3.6 ms    [User: 260.7 ms, System: 3.7 ms]
  Range (min … max):   260.3 ms … 273.6 ms    11 runs

Omit constructor arguments

before

# hyperfine "php /mount/random/n0.php" --warmup 10
Benchmark 1: php /mount/random/n0.php
  Time (mean ± σ):     478.8 ms ±   5.0 ms    [User: 177.2 ms, System: 300.8 ms]
  Range (min … max):   474.9 ms … 489.1 ms    10 runs

# hyperfine "php /mount/random/n1.php" --warmup 10
Benchmark 1: php /mount/random/n1.php
  Time (mean ± σ):     475.6 ms ±   6.7 ms    [User: 172.3 ms, System: 302.6 ms]
  Range (min … max):   470.9 ms … 493.4 ms    10 runs

# hyperfine "php /mount/random/n2.php" --warmup 10
Benchmark 1: php /mount/random/n2.php
  Time (mean ± σ):     297.1 ms ±   4.4 ms    [User: 101.6 ms, System: 194.7 ms]
  Range (min … max):   292.2 ms … 307.0 ms    10 runs

after

# hyperfine "php /mount/random/n0.php" --warmup 10
Benchmark 1: php /mount/random/n0.php
  Time (mean ± σ):     478.0 ms ±   4.5 ms    [User: 175.8 ms, System: 301.4 ms]
  Range (min … max):   469.3 ms … 483.8 ms    10 runs
  
# hyperfine "php /mount/random/n1.php" --warmup 10
Benchmark 1: php /mount/random/n1.php
  Time (mean ± σ):     473.6 ms ±  12.5 ms    [User: 171.3 ms, System: 301.5 ms]
  Range (min … max):   463.2 ms … 501.3 ms    10 runs

# hyperfine "php /mount/random/n2.php" --warmup 10
Benchmark 1: php /mount/random/n2.php
  Time (mean ± σ):     297.5 ms ±   9.0 ms    [User: 95.5 ms, System: 201.3 ms]
  Range (min … max):   286.2 ms … 317.8 ms    10 runs

rptr = bulk_convert_generated_result_to_char_64(rptr, result.result, &to_read);
} else if (EXPECTED(result.size == sizeof(uint32_t))){
rptr = bulk_convert_generated_result_to_char_32(rptr, result.result, &to_read);
} else {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a possibility that this else will be executed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It can happen with userland engines that return single bytes:

final class MyEngine implements \Random\Engine {
  public function generate(): string { return 'x'; }
}

$r = new \Random\Randomizer(new MyEngine());

var_dump($r->getBytes(10));

@SakiTakamachi
Copy link
Member Author

SakiTakamachi commented Jul 9, 2024

CI is red, I'll fix

(edit) done

@SakiTakamachi SakiTakamachi force-pushed the refactor_randomizer branch 2 times, most recently from fbb5635 to 49fa32e Compare July 9, 2024 16:18
Copy link
Member

@TimWolla TimWolla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. I also noticed before that getBytes() is slower than it could be.

I'm not too happy about the additional complexity introduced here and I would prefer a solution where the compiler itself detects that this effectively is a memcpy() for little-endian architectures, while the C code stays high level and readable.

I'll try to experiment myself when I have some time to spare, given this is a self-contained change that does not change the behavior, we could still ship this with the PHP 8.4 Betas.

@SakiTakamachi
Copy link
Member Author

@TimWolla
Thanks for your review.

I'm not too happy about the additional complexity introduced here and I would prefer a solution where the compiler itself detects that this effectively is a memcpy() for little-endian architectures, while the C code stays high level and readable.

I could probably simplify the code a bit more, but it would still be somewhat complicated.

I'm going to simplify it a bit more so you can check back later to see if it's satisfactory to you :)

@SakiTakamachi SakiTakamachi force-pushed the refactor_randomizer branch from efbe1d1 to 278a1d2 Compare July 9, 2024 23:11
@SakiTakamachi
Copy link
Member Author

@TimWolla
I've simplified it as much as possible.

Copy link
Contributor

@zeriyoshi zeriyoshi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
Thanks for the good idea and implementation!

@SakiTakamachi
Copy link
Member Author

Thank you, I'll wait for Tim's opinion :)

Copy link
Member

@Girgias Girgias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also fnd the added complexity to be slightly worrying considering how difficult randomness can be

Comment on lines 217 to 242
/* Bytes swap */
#ifdef _MSC_VER
# include <stdlib.h>
# define RANDOM_BSWAP64(u) _byteswap_uint64(u)
#else
# ifdef __GNUC__
# define RANDOM_BSWAP64(u) __builtin_bswap64(u)
# elif defined(__has_builtin)
# if __has_builtin(__builtin_bswap64)
# define RANDOM_BSWAP64(u) __builtin_bswap64(u)
# endif // __has_builtin(__builtin_bswap64)
# endif // __GNUC__
#endif // _MSC_VER
#ifndef RANDOM_BSWAP64
static inline uint64_t RANDOM_BSWAP64(uint64_t u)
{
return (((u & 0xff00000000000000ULL) >> 56)
| ((u & 0x00ff000000000000ULL) >> 40)
| ((u & 0x0000ff0000000000ULL) >> 24)
| ((u & 0x000000ff00000000ULL) >> 8)
| ((u & 0x00000000ff000000ULL) << 8)
| ((u & 0x0000000000ff0000ULL) << 24)
| ((u & 0x000000000000ff00ULL) << 40)
| ((u & 0x00000000000000ffULL) << 56));
}
#endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering that if this lands, we would be implementing this sort of logic in different extensions, does it make sense to "upstream" this into Zend via the zend_portability.h header?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, that is probably wiser. I'll create a PR for that first.

@SakiTakamachi
Copy link
Member Author

As Tim says, this can be merged even in beta, so I'm in no rush to merge this.

It can be thoroughly verified before being merged.

@SakiTakamachi
Copy link
Member Author

Of course, if there is a more clever solution I'll be happy to close this PR.

@SakiTakamachi
Copy link
Member Author

Split the changes into multiple commits to make it easier to see the changes.
The measurement results for each change will be described in the explanation.
(The code has changed slightly.)

@TimWolla
Copy link
Member

TimWolla commented Aug 4, 2024

I have filed a competing PR at #15228.

@SakiTakamachi
Copy link
Member Author

Tim's changes seem better so I'm going to close my PR :)

@SakiTakamachi SakiTakamachi deleted the refactor_randomizer branch August 5, 2024 10:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants