Skip to content

Seemingly inefficient code generated to forward a parameter to a function #22891

Closed
@rprichard

Description

@rprichard

The generated code for passing arguments larger than a machine word looks inefficient.

Test case:

#[inline(never)]
pub fn bar(x: &str) { println!("{}", x) }
pub fn foo(x: &str) { bar(x); bar(x); }

On x86_64-unknown-linux-gnu, compiling with rustc test.rs -O -C no-stack-check --crate-type dylib --emit asm, I see this code for foo:

    .section    .text._ZN3foo20hb6f131ac36a30532PaaE,"ax",@progbits
    .globl  _ZN3foo20hb6f131ac36a30532PaaE
    .align  16, 0x90
    .type   _ZN3foo20hb6f131ac36a30532PaaE,@function
_ZN3foo20hb6f131ac36a30532PaaE:
    .cfi_startproc
    pushq   %rbx
.Ltmp4:
    .cfi_def_cfa_offset 16
    subq    $16, %rsp
.Ltmp5:
    .cfi_def_cfa_offset 32
.Ltmp6:
    .cfi_offset %rbx, -16
    movq    %rdi, %rbx
    movups  (%rbx), %xmm0
    movaps  %xmm0, (%rsp)
    leaq    (%rsp), %rdi
    callq   _ZN3bar20hf21270c370b3427feaaE@PLT
    movups  (%rbx), %xmm0
    movaps  %xmm0, (%rsp)
    leaq    (%rsp), %rdi
    callq   _ZN3bar20hf21270c370b3427feaaE@PLT
    addq    $16, %rsp
    popq    %rbx
    retq
.Ltmp7:
    .size   _ZN3foo20hb6f131ac36a30532PaaE, .Ltmp7-_ZN3foo20hb6f131ac36a30532PaaE
    .cfi_endproc

foo receives the address of the &str in %rdi. It copies it into a new stack location for each call, then passes the address of that location to bar.

Could foo forward the address of the &str along without making stack copies?

If I remove one of the bar calls from foo, then the function also ought to become a tail call, but it doesn't. Tail call optimization does occur if I replace the &str types with &&str.

The calling convention for passing &str (and other arguments larger than a machine word?) seems to be:

  1. Make a copy of the argument on the stack.
  2. Pass the address of the copy in the conventional manner (in a register or on the stack).
  3. The callee may modify the copy.

i.e. We seem to be passing values both by-value and by-reference.

With the current convention, I think we could get smaller code by eliding some of the copies. If the copies were instead immutable, I think we could elide more copies.

Compiler version:

rustc 1.0.0-nightly (b47aebe3f 2015-02-26) (built 2015-02-27)
binary: rustc
commit-hash: b47aebe3fc2da06c760fd8ea19f84cbc41d34831
commit-date: 2015-02-26
build-date: 2015-02-27
host: x86_64-unknown-linux-gnu
release: 1.0.0-nightly

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-codegenArea: Code generation

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions