Description
Apologies if this has already been reported.
Let's say I have some code that looks like this (this is a simplified version of some code a friend was writing):
pub fn fast(mut ret: u64) -> u64 {
let mask = (1 << 38) - 1;
for _ in 0..100_000 {
let mut speed = 0.0;
let mut z: f64 = speed;
speed += 0.200000001;
for _ in 2..14 {
z += speed;
if (z.to_bits() >> 8) & mask == 0 {
if z % 0.0625 < 1e-13 {
println!("{}", z % 0.0625);
ret += 1;
}
}
}
}
eprintln!("ret: {ret}");
ret
}
I might be tempted to collapse the if-statement in the middle, since it shouldn't change anything - in fact, clippy will even recommend that I change it to this:
pub fn slow(mut ret: u64) -> u64 {
let mask = (1 << 38) - 1;
for _ in 0..100_000 {
let mut speed = 0.0;
let mut z: f64 = speed;
speed += 0.200000001;
for _ in 2..14 {
z += speed;
if (z.to_bits() >> 8) & mask == 0 && z % 0.0625 < 1e-13 {
println!("{}", z % 0.0625);
ret += 1;
}
}
}
eprintln!("ret: {ret}");
ret
}
However, if I pit these two against each other using criterion, then when I run a bench (on 1.69.0):
➜ cargo bench 2> out.txt
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
slow time: [7.5115 ms 7.5313 ms 7.5583 ms]
Found 4 outliers among 100 measurements (4.00%)
2 (2.00%) high mild
2 (2.00%) high severe
fast time: [577.02 µs 578.91 µs 581.29 µs]
Found 5 outliers among 100 measurements (5.00%)
1 (1.00%) high mild
4 (4.00%) high severe
For some reason, collapsing the if branch leads to a massive performance regression! This is surprising as well since from my testing, where I set z = 0
, the if branch should never run. Putting the two bits of code on Godbolt seems to also show that there's a bit of a difference in terms of assembly generation (fast, slow).
Furthermore, from some testing, commenting out either the eprintln
or the println
on both would result in them having similar performance.
I can set up a repo with my exact setup if that will be helpful. Repo with code and benchmark: https://github.com/ClementTsang/collapse_if_slowdown