Description
Code
I tried this code:
pub fn fna(x: [u8; 1024]) -> u8 {
x.into_iter().max().unwrap()
}
pub fn fnb(x: [u8; 1024]) -> u8 {
*x.iter().max().unwrap()
}
I expected to see this happen:
- Both functions should optimize away the unwrap
- (Bonus) Both functions should inline the respective iterator functions and auto-vectorize
Instead, this happened:
fna
has no unwrap in the assembly output, butfnb
has.- (Bonus) Both inline their respective iterators (according to the MIR output on the playground).
fna
auto-vectorizes, butfnb
doesn't.
fnb
generates a byte-wise max implementation. Interestingly, it keeps some information about the array length around, influencing the codegen. A byte-wise max
implementation for [u8; 1024]
will need to do 1023 comparisons (since the first value of the array will be the initial max
result value), and 1023 is divisible by 3, so the compiler emits a loop that emits a loop that runs 341 times, doing 3 max
computations each. If I change the length of the array to 1025, it will do 4 max
computations each.
Even though the generated code relies on the number of array entries, it still doesn't elide the Option::unwrap
check for the max
result. In Rust 1.45-1.64 fnb
also generates the 3-byte-at-a-time loop, but does remove the Option::unwrap
.
Version it worked on
It most recently worked on: Rust 1.64. (For context: Rust 1.65 added LLVM 15.)
It (initially?) started working in 1.45 (that version added LLVM 10).
Version with regression
Any version since Rust 1.65.
Since I thought this behavior was unexpected, I polled it on mastodon before looking further into it. ATM about half of the respondents thought just like me that both should be optimized (at least the unwrap part):