Skip to content

Collapsing two if-statements to a single if statement can result in a large performance decrease #111583

Open
@ClementTsang

Description

@ClementTsang

Apologies if this has already been reported.


Let's say I have some code that looks like this (this is a simplified version of some code a friend was writing):

pub fn fast(mut ret: u64) -> u64 {
    let mask = (1 << 38) - 1;

    for _ in 0..100_000 {
        let mut speed = 0.0;
        let mut z: f64 = speed;
        speed += 0.200000001;

        for _ in 2..14 {
            z += speed;

            if (z.to_bits() >> 8) & mask == 0 {
                if z % 0.0625 < 1e-13 {
                    println!("{}", z % 0.0625);
                    ret += 1;
                }
            }
        }
    }

    eprintln!("ret: {ret}");
    ret
}

I might be tempted to collapse the if-statement in the middle, since it shouldn't change anything - in fact, clippy will even recommend that I change it to this:

pub fn slow(mut ret: u64) -> u64 {
    let mask = (1 << 38) - 1;

    for _ in 0..100_000 {
        let mut speed = 0.0;
        let mut z: f64 = speed;
        speed += 0.200000001;

        for _ in 2..14 {
            z += speed;

            if (z.to_bits() >> 8) & mask == 0 && z % 0.0625 < 1e-13 {
                println!("{}", z % 0.0625);
                ret += 1;
            }
        }
    }

    eprintln!("ret: {ret}");
    ret
}

However, if I pit these two against each other using criterion, then when I run a bench (on 1.69.0):

➜ cargo bench 2> out.txt

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s


running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

slow                    time:   [7.5115 ms 7.5313 ms 7.5583 ms]
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe

fast                    time:   [577.02 µs 578.91 µs 581.29 µs]
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) high mild
  4 (4.00%) high severe

For some reason, collapsing the if branch leads to a massive performance regression! This is surprising as well since from my testing, where I set z = 0, the if branch should never run. Putting the two bits of code on Godbolt seems to also show that there's a bit of a difference in terms of assembly generation (fast, slow).

Furthermore, from some testing, commenting out either the eprintln or the println on both would result in them having similar performance.

I can set up a repo with my exact setup if that will be helpful. Repo with code and benchmark: https://github.com/ClementTsang/collapse_if_slowdown

Metadata

Metadata

Assignees

No one assigned

    Labels

    C-optimizationCategory: An issue highlighting optimization opportunities or PRs implementing suchI-slowIssue: Problems and improvements with respect to performance of generated code.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions