Manipulating slice through &mut parameter not optimized well

When passing a `&mut &[u8]` to a function that trims elements from the start/end of the slice, the code contains pointless length checks and stack traffic. Writing the function slightly differently, or inlining it, does produce the code I'd expect.

This is a minimal example ([playground](https://play.rust-lang.org/?gist=1f996941900e2fe9ce06&version=nightly) of all following snippets):

``` rust
pub fn trim_in_place(a: &mut &[u8]) {
    while a.first() == Some(&42) {
        *a = &a[1..];
    }
}
```

I expect this to compile to a tight loop that simply decrements the slice length and increments the data pointer in registers, until the length reaches 0 or a byte value of 42 is encountered.

Instead, the loop writes to the stack on every iteration and checks _twice_ whether the length is zero, trying to panic for the second check (presumably comes from the `a[1..]` slicing).

``` assembly
.LBB0_5:
    movzx   edx, byte ptr [rax - 1]
    cmp edx, 42
    jne .LBB0_8 # exit function
    cmp rcx, -1
    je  .LBB0_9 # panic, never taken
    mov qword ptr [rdi], rax
    mov qword ptr [rdi + 8], rcx
    dec rcx
    inc rax
    cmp rcx, -1
    jne .LBB0_5
```

The `.LBB0_9` branch cannot ever be taken, and is indeed optimized out when writing the function in differently. The stack traffic also disappears, though that probably has an entirely different cause. This variant compiles to better code:

``` rust
pub fn trim_functionally(mut a: &[u8]) -> &[u8] {
    while a.first() == Some(&42) {
        a = &a[1..];
    }
    a
}
```

It does all the loop calculations in registers. Also, marking the first function as `inline(always)` and defining the second function in terms of it (`{ trim_in_place(&mut a); a }`) optimizes to the same code.

``` assembly
.LBB1_2:
    movzx   eax, byte ptr [rdi]
    cmp eax, 42
    jne .LBB1_3 # exit function
    inc rdi
    dec rsi
    jne .LBB1_2
```

Both variants check if the slice is empty before entering the loop and return immediately in that case.
# Background

I encountered this problem in the context of my dec2flt work, in a larger and more complicated function that trims two slices on both ends, and the slices are stored in a struct. Applying `inline(always)` to that function gives a small but measurable performance improvement across the board, ~~but increases code size by 10 KiB~~ (Edit: I just didn't measure properly. Size is about the same.). I haven't checked the assembly code (far too much code) but I suspect this means this problem, and solution, carries over to real code.
# Meta

Reproduces on nightly playground. Stable is worse for all variants, presumably because it doesn't have #26411 (yay, progress!). My local `rustc` has the same problem. The exact commit is meaningless because it only exists in my `rust.git` fork. It's based `f9f580936d81ed527115ae86375f69bb77723b1c`.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Manipulating slice through &mut parameter not optimized well #27130

Background

Meta

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Manipulating slice through &mut parameter not optimized well #27130

Description

Background

Meta

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions