Skip to content

[Random] fix undefined behaviour part2 #9085

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

zeriyoshi
Copy link
Contributor

In fact, I don't know if this is the best performance-wise. Any better ideas are solicited. For now, the current bugs will be fixed.

related #9056
closes GH-9083

@zeriyoshi
Copy link
Contributor Author

Why is macOS CI being cancelled...?

@zeriyoshi
Copy link
Contributor Author

zeriyoshi commented Jul 21, 2022

Why is macOS CI being cancelled...?

This is a scheduled macOS-10.15 brownout. The macOS-10.15 environment is deprecated and will be removed on August 30th, 2022. For more details, see actions/runner-images#5583

Oops my macOS (12.4 AArch64) is successfull tests.

I will try the macOS update for GH Actions

Copy link
Member

@iluuu1994 iluuu1994 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change in behavior might or might not matter, I don't know much about the logic of this function.

uint32_t count = 0;

result = 0;
total_size = 0;
do {
r = algo->generate(status);
result = (result << (8 * status->last_generated_size)) | r;
shift_size = (8 * status->last_generated_size);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that my previous comment was wrong, it looks like gcc and clang both treat x << y as x << (y % 32), so this is a behavior change. See:

https://godbolt.org/z/W1xn847Eh

This would be the equivalent here:

shift_size = (8 * status->last_generated_size) % (8 * sizeof(result));

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was about to say the same 😄 https://www.ideone.com/ijVb65
So I think you should revert the changes, then just change status->last_generated_size to (status->last_generated_size % sizeof(result)) in the shift?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed! It's so complicated 😅
fix again: 2ebbefc

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I hadn't seen Ilija's main comment:

The change in behavior might or might not matter, I don't know much about the logic of this function.

Indeed, maybe here you would prefer to handle
result << (8 * status->last_generated_size) not as
result << (8 * (status->last_generated_size % sizeof(result))) but rather as
status->last_generated_size >= sizeof(result) ? 0 : result << (8 * status->last_generated_size)? https://www.ideone.com/Oy1PqS

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional thought: whatever the solution, it should be applied to both rand_range32 and rand_range64.
I see that #9088 does, and also tackles an endianness issue 🙂

@TimWolla
Copy link
Member

There's a much simpler fix for this:

If the ->last_generated_size is large enough to cause undefined behavior during shifting, then this means that the generated integer is sufficient to completely fill the range.

Unless the engine returns differently sized integers, this case can only happen in the very first iteration of the do-while loop, while result still is 0 and also the loop will exit afterwards.

I'll propose a fix after dinner, as it was me who messed this up 😃

@guilliamxavier
Copy link
Contributor

guilliamxavier commented Jul 21, 2022

Unless the engine returns differently sized integers

Couldn't that happen e.g. with this?

final class VaryingSizeEngine implements \Random\Engine {
    private bool $first = true;
    public function generate(): string {
        if ($this->first) {
            $size = 1;
            $this->first = false;
        } else {
            $size = 4;
        }
        return random_bytes($size);
    }
}

@TimWolla
Copy link
Member

Couldn't that happen e.g. with this?

Yes, exactly. This is something to keep in mind, but not something to optimize for.

TimWolla added a commit to TimWolla/php-src that referenced this pull request Jul 21, 2022
The previous shifting logic is problematic for two reasons:

1. It invokes undefined behavior when the `->last_generated_size` is at least
as large as the target integer in `result`, because the shift is larger than
the target integer. This was reported in phpGH-9083.

2. It expands the returned bytes in a big-endian fashion: Earlier bytes are
shifting into the most-significant position. As all the other logic in the
random extension treats byte-strings as little-endian numbers this is
inconsistent.

By fixing the second issue, we can implicitly fix the first one: Instead of
shifting the existing bits by the number of "newly added" bits, we shift the
newly added bits by the number of existing bits. As we stop requesting new bits
once the total_size reached the size of the target integer we can be sure to
never invoke undefined behavior during shifting.

The get_int_user.phpt test was adjusted to verify the little-endian behavior.
It generates a single byte per call and we expect the first byte generated to
appear at the start of the resulting number.

see phpGH-9056 for a previous fix in the same area.
Fixes phpGH-9083 which reports the undefined behavior.
Resolves phpGH-9085 which was an alternative attempt to fix phpGH-9083.
@TimWolla
Copy link
Member

My proposal is in #9088.

@zeriyoshi zeriyoshi closed this Jul 22, 2022
Girgias pushed a commit that referenced this pull request Jul 22, 2022
The previous shifting logic is problematic for two reasons:

1. It invokes undefined behavior when the `->last_generated_size` is at least
as large as the target integer in `result`, because the shift is larger than
the target integer. This was reported in GH-9083.

2. It expands the returned bytes in a big-endian fashion: Earlier bytes are
shifting into the most-significant position. As all the other logic in the
random extension treats byte-strings as little-endian numbers this is
inconsistent.

By fixing the second issue, we can implicitly fix the first one: Instead of
shifting the existing bits by the number of "newly added" bits, we shift the
newly added bits by the number of existing bits. As we stop requesting new bits
once the total_size reached the size of the target integer we can be sure to
never invoke undefined behavior during shifting.

The get_int_user.phpt test was adjusted to verify the little-endian behavior.
It generates a single byte per call and we expect the first byte generated to
appear at the start of the resulting number.

see GH-9056 for a previous fix in the same area.
Fixes GH-9083 which reports the undefined behavior.
Resolves GH-9085 which was an alternative attempt to fix GH-9083.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Random extension random.c:95:20: runtime error: shift exponent 32 is too large for 32-bit type 'unsigned int'
4 participants