Open
Description
VecDeque
's iterator seems to never allow llvm to vectorize loops, this could be improved.
Even an enum where it uses the slice iterator for its contiguous case, and a fallback iterator, would allow vectorization for the contiguous case.
The following simple benchmark shows the massive difference between the deque iterator and using its two slice parts. The results are the same whether the deque is discontinuous or not. The summation in sum_deque_2
is an order of magnitude faster, because llvm can autovectorize it.
/* Results
test sum_deque ... bench: 1,301 ns/iter (+/- 111)
test sum_deque_2 ... bench: 71 ns/iter (+/- 2)
*/
#![feature(test)]
extern crate test;
use test::Bencher;
use test::black_box;
use std::collections::VecDeque;
#[bench]
fn sum_deque(b: &mut Bencher) {
let mut dq: VecDeque<_> = (0..1024).collect();
/*
for _ in 0..128 {
let elt = dq.pop_back().unwrap();
dq.push_front(elt);
}
*/
let dq = black_box(dq);
b.iter(|| {
let mut sum = 0;
for elt in &dq {
sum += *elt;
}
sum
})
}
#[bench]
fn sum_deque_2(b: &mut Bencher) {
let mut dq: VecDeque<_> = (0..1024).collect();
/*
for _ in 0..128 {
let elt = dq.pop_back().unwrap();
dq.push_front(elt);
}
*/
let dq = black_box(dq);
b.iter(|| {
let mut sum = 0;
let (a, b) = dq.as_slices();
for elt in a {
sum += *elt;
}
for elt in b {
sum += *elt;
}
sum
})
}