Skip to content

Vec's swap_remove is needlessly slow #52150

Closed
@orlp

Description

@orlp

Currently Vec's swap_remove has the following implementation:

pub fn swap_remove(&mut self, index: usize) -> T {
    let length = self.len();
    self.swap(index, length - 1);
    self.pop().unwrap()
}

This is needlessly slow. The swap does a bounds check, and then pop does
another bounds check that never fails. Furthermore, there is an actual swap
right before the pop - this results in a lot useless moves that don't (seem to)
get optimized out.

It's possible to write a safe implementation that only does one bounds check and
uses two moves with a little unsafe code:

pub fn swap_remove(&mut self, index: usize) -> T {    
    unsafe {
        // Singular bounds check on index ensures safety.
        let hole: *mut T = &mut self[index];
        let value = ptr::read(hole);

        // Since the bounds check on index succeeded we know back >= 0.
        let back = self.len() - 1;
        ptr::copy(self.get_unchecked(back), hole, 1);
        self.set_len(back);
        value        
    }
}

I found it quite hard to benchmark this, but the following benchmark showed on
my machine the new implementation to be faster:

#![feature(test)]
extern crate test;

pub fn swap_remove_opt<T>(v: &mut Vec<T>, idx: usize) -> T {    
    unsafe {
        // Singular bounds check on idx ensures safety.
        let hole: *mut T = &mut v[idx];
        let value = std::ptr::read(hole);
        let back = v.len() - 1;
        std::ptr::copy(v.get_unchecked(back), hole, 1);
        v.set_len(back);
        value        
    }
}

// Tiny LCG RNG.
fn rng_next(s: u64) -> u64 {
    6364136223846793005 * s  + 1
}

#[cfg(test)]
mod tests {
    use super::*;
    use test::Bencher;
    
    #[bench]
    fn bench_1000_overhead(b: &mut Bencher) {
        let mut rng = 43;
        let v: Vec<usize> = (0..1024).collect();
        b.iter(|| {
            let v = v.clone();
            for _ in 0..512 {
                rng = rng_next(rng);
                let idx = (rng >> 32) % 512;
                test::black_box(idx as usize);
            }
            v
        })
    }

    #[bench]
    fn bench_vec_1000_swap_remove(b: &mut Bencher) {
        let mut rng = 43;
        let v: Vec<usize> = (0..1024).collect();
        b.iter(|| {
            let mut v = v.clone();
            for _ in 0..512 {
                rng = rng_next(rng);
                let idx = (rng >> 32) % 512;
                test::black_box(v.swap_remove(idx as usize));
            }
            v
        })
    }

    #[bench]
    fn bench_vec_1000_swap_remove_opt(b: &mut Bencher) {
        let mut rng = 43;
        let v: Vec<usize> = (0..1024).collect();
        b.iter(|| {
            let mut v = v.clone();
            for _ in 0..512 {
                rng = rng_next(rng);
                let idx = (rng >> 32) % 512;
                test::black_box(swap_remove_opt(&mut v, idx as usize));
            }
            v
        })
    }
}
test tests::bench_1000_overhead            ... bench:       1,049 ns/iter (+/- 16)
test tests::bench_vec_1000_swap_remove     ... bench:       1,306 ns/iter (+/- 122)
test tests::bench_vec_1000_swap_remove_opt ... bench:       1,193 ns/iter (+/- 104)

As I said, it's quite hard to benchmark - lots of overhead and a small difference to be observed. Regardless, running multiple times show consistently swap_remove_opt to be faster.

Although I can't run the above benchmark on stable due to test being unstable,
I conjecture that on stable Rust the difference is much bigger. A quick look at
the disassembly should show why: https://godbolt.org/g/fU4Edu

Nightly fares a lot better in the disassembly but as seen above still loses out in the benchmark.

Metadata

Metadata

Assignees

No one assigned

    Labels

    C-enhancementCategory: An issue proposing an enhancement or a PR with one.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions