Skip to content

Rustc fails to inline trivial functions #37538

Closed
@ruuda

Description

@ruuda

I read in multiple places that rustc generating worse code than a C++ compiler would do for an equivalent C++ program is a bug. So here we go:

Summary

Rustc fails to inline trivial functions that compile down to just a few instructions, to the point where calling convention overhead is much worse than the actual function itself.

Steps to reproduce

I tried to write a minimal example at play.rust-lang.org, but everything short that I can come up with does not suffer from this issue. So instead I am going to link the project that caused me to discover this issue:

git clone https://github.com/ruuda/claxon
git checkout 2b18a49
cargo build --release --example decode
objdump -Cd target/release/examples/decode | less
# Now search for rice_to_signed or shift_left.

Actual and expected behavior

I’ll outline some of the disassembly below:

# Code for `if shift >= 8 { 0 } else { x << shift }`.
000000000000d3a0 <claxon::input::shift_left::h07bb472717a335da>:
    d3a0:       89 f1                   mov    %esi,%ecx
    d3a2:       80 e1 07                and    $0x7,%cl
    d3a5:       40 d2 e7                shl    %cl,%dil
    d3a8:       48 83 fe 07             cmp    $0x7,%rsi
    d3ac:       76 02                   jbe    d3b0 <claxon::input::shift_left::h07bb472717a335da+0x10>
    d3ae:       31 ff                   xor    %edi,%edi
    d3b0:       89 f8                   mov    %edi,%eax
    d3b2:       c3                      retq   
    d3b3:       66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
    d3ba:       00 00 00 
    d3bd:       0f 1f 00                nopl   (%rax)

# Code for `if x & 1 == 1 { - 1 - x / 2 } else { x / 2 }`.
# It did optimize the branch into two shifts and an xor though!
000000000000d7b0 <claxon::subframe::rice_to_signed::haed2067302c41014>:
    d7b0:       48 89 f8                mov    %rdi,%rax
    d7b3:       48 c1 e8 3f             shr    $0x3f,%rax
    d7b7:       48 01 f8                add    %rdi,%rax
    d7ba:       48 d1 f8                sar    %rax
    d7bd:       48 c1 e7 3f             shl    $0x3f,%rdi
    d7c1:       48 c1 ff 3f             sar    $0x3f,%rdi
    d7c5:       48 31 c7                xor    %rax,%rdi
    d7c8:       48 89 f8                mov    %rdi,%rax
    d7cb:       c3                      retq   
    d7cc:       0f 1f 40 00             nopl   0x0(%rax)

Note that this is not dead code, there are calls to these functions in very hot loops:

79e8:  e8 c3 5d 00 00     callq  d7b0 <claxon::subframe::rice_to_signed::haed2067302c41014>

66a8:  e8 f3 6c 00 00     callq  d3a0 <claxon::input::shift_left::h07bb472717a335da>

I would expect that functions like these would be inlined automatically, but they were not. Note that all of this code is in the same crate.

I encountered about a dozen of these during profiling, where very small functions like the ones above were showing up as hotspots. I’ve been able to speed up my program by as much as 30% just by placing a few #[inline(always)] attributes.

There are also simple getters like Block::len which are not inlined, but these are called from the example program which is a different crate, so that is working as intended I think.

Meta

rustc 1.14.0-nightly (3210fd5c2 2016-10-05)
binary: rustc
commit-hash: 3210fd5c20ffc6da420eb00e60bdc8704577fd3b
commit-date: 2016-10-05
host: x86_64-unknown-linux-gnu
release: 1.14.0-nightly

Metadata

Metadata

Assignees

No one assigned

    Labels

    C-enhancementCategory: An issue proposing an enhancement or a PR with one.I-slowIssue: Problems and improvements with respect to performance of generated code.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions