Skip to content

Bounds checking should be eliminated in obvious case #30112

Closed
@Geal

Description

@Geal

Hi,

Following the benchmarks comparing the nom and chomp parser combinator libraries, I investigated the performance difference and found something interesting: in code that is nearly equivalent, rustc generates a lot more stuff with nom than chomp, and keeps bounds check where the parsers make sure no error should happen.

For the comparison, here are some code examples, in nom:

named!(message_header_value, chain!(
        take_while1!(is_horizontal_space) ~
  data: take_while1!(not_line_ending)     ~
  line_ending,
  || data));

in chomp:

fn message_header_line(i: Input<u8>) -> U8Result<&[u8]> {
    parse!{i;
                   take_while1(is_horizontal_space);
        let line = take_till(is_end_of_line);
                   end_of_line();

        ret line
    }   
}

(we could change take_while1!(not_line_ending) to a take_till!(is_end_of_line) and it would achieve the same code).

Once the macros are processed, it gives the following code in nom: https://gist.github.com/Geal/fa3740cf45530d123023

chomp uses the same approach, with iterators, in its version of take_while1 and take_till`: https://github.com/m4rw3r/chomp/blob/master/src/parsers.rs#L208-L253

Now, the interesting thing is the assembly generated by rustc ( 1.5.0-dev (ea2dabf 2015-10-21), but the version from yesterday has the same issues). Here is the nom version: http://dev.unhandledexpression.com/nom_http.pdf
And the chomp version: http://dev.unhandledexpression.com/chomp_http.pdf

We can see that nom's code is a lot more complex:

  • large blocks of code calling nom's Err destructor (it is expected, but I'd like to improve that as well)
  • 4 bounds checks are still present, while they do not appear in chomp

I would like to know if there is a way to improve code generation. If the issue is in rustc, I can provide as many test cases as you need. If it is in nom, I'm open to any ideas ;)

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-LLVMArea: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.C-enhancementCategory: An issue proposing an enhancement or a PR with one.C-optimizationCategory: An issue highlighting optimization opportunities or PRs implementing suchI-slowIssue: Problems and improvements with respect to performance of generated code.T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions