-
Notifications
You must be signed in to change notification settings - Fork 7.9k
[Random] fix undefined behaviour part2 #9085
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
5753cbe
to
d5ff87e
Compare
Why is macOS CI being cancelled...? |
This is a scheduled macOS-10.15 brownout. The macOS-10.15 environment is deprecated and will be removed on August 30th, 2022. For more details, see actions/runner-images#5583 Oops my macOS (12.4 AArch64) is successfull tests. I will try the macOS update for GH Actions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change in behavior might or might not matter, I don't know much about the logic of this function.
ext/random/random.c
Outdated
uint32_t count = 0; | ||
|
||
result = 0; | ||
total_size = 0; | ||
do { | ||
r = algo->generate(status); | ||
result = (result << (8 * status->last_generated_size)) | r; | ||
shift_size = (8 * status->last_generated_size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that my previous comment was wrong, it looks like gcc and clang both treat x << y
as x << (y % 32)
, so this is a behavior change. See:
https://godbolt.org/z/W1xn847Eh
This would be the equivalent here:
shift_size = (8 * status->last_generated_size) % (8 * sizeof(result));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was about to say the same 😄 https://www.ideone.com/ijVb65
So I think you should revert the changes, then just change status->last_generated_size
to (status->last_generated_size % sizeof(result))
in the shift?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed! It's so complicated 😅
fix again: 2ebbefc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I hadn't seen Ilija's main comment:
The change in behavior might or might not matter, I don't know much about the logic of this function.
Indeed, maybe here you would prefer to handle
result << (8 * status->last_generated_size)
not as
result << (8 * (status->last_generated_size % sizeof(result)))
but rather as
status->last_generated_size >= sizeof(result) ? 0 : result << (8 * status->last_generated_size)
? https://www.ideone.com/Oy1PqS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional thought: whatever the solution, it should be applied to both rand_range32
and rand_range64
.
I see that #9088 does, and also tackles an endianness issue 🙂
This reverts commit d5ff87e.
There's a much simpler fix for this: If the Unless the engine returns differently sized integers, this case can only happen in the very first iteration of the I'll propose a fix after dinner, as it was me who messed this up 😃 |
Couldn't that happen e.g. with this? final class VaryingSizeEngine implements \Random\Engine {
private bool $first = true;
public function generate(): string {
if ($this->first) {
$size = 1;
$this->first = false;
} else {
$size = 4;
}
return random_bytes($size);
}
} |
Yes, exactly. This is something to keep in mind, but not something to optimize for. |
The previous shifting logic is problematic for two reasons: 1. It invokes undefined behavior when the `->last_generated_size` is at least as large as the target integer in `result`, because the shift is larger than the target integer. This was reported in phpGH-9083. 2. It expands the returned bytes in a big-endian fashion: Earlier bytes are shifting into the most-significant position. As all the other logic in the random extension treats byte-strings as little-endian numbers this is inconsistent. By fixing the second issue, we can implicitly fix the first one: Instead of shifting the existing bits by the number of "newly added" bits, we shift the newly added bits by the number of existing bits. As we stop requesting new bits once the total_size reached the size of the target integer we can be sure to never invoke undefined behavior during shifting. The get_int_user.phpt test was adjusted to verify the little-endian behavior. It generates a single byte per call and we expect the first byte generated to appear at the start of the resulting number. see phpGH-9056 for a previous fix in the same area. Fixes phpGH-9083 which reports the undefined behavior. Resolves phpGH-9085 which was an alternative attempt to fix phpGH-9083.
My proposal is in #9088. |
The previous shifting logic is problematic for two reasons: 1. It invokes undefined behavior when the `->last_generated_size` is at least as large as the target integer in `result`, because the shift is larger than the target integer. This was reported in GH-9083. 2. It expands the returned bytes in a big-endian fashion: Earlier bytes are shifting into the most-significant position. As all the other logic in the random extension treats byte-strings as little-endian numbers this is inconsistent. By fixing the second issue, we can implicitly fix the first one: Instead of shifting the existing bits by the number of "newly added" bits, we shift the newly added bits by the number of existing bits. As we stop requesting new bits once the total_size reached the size of the target integer we can be sure to never invoke undefined behavior during shifting. The get_int_user.phpt test was adjusted to verify the little-endian behavior. It generates a single byte per call and we expect the first byte generated to appear at the start of the resulting number. see GH-9056 for a previous fix in the same area. Fixes GH-9083 which reports the undefined behavior. Resolves GH-9085 which was an alternative attempt to fix GH-9083.
In fact, I don't know if this is the best performance-wise. Any better ideas are solicited. For now, the current bugs will be fixed.
related #9056
closes GH-9083