Description
Hi,
Following the benchmarks comparing the nom and chomp parser combinator libraries, I investigated the performance difference and found something interesting: in code that is nearly equivalent, rustc generates a lot more stuff with nom than chomp, and keeps bounds check where the parsers make sure no error should happen.
For the comparison, here are some code examples, in nom:
named!(message_header_value, chain!(
take_while1!(is_horizontal_space) ~
data: take_while1!(not_line_ending) ~
line_ending,
|| data));
in chomp:
fn message_header_line(i: Input<u8>) -> U8Result<&[u8]> {
parse!{i;
take_while1(is_horizontal_space);
let line = take_till(is_end_of_line);
end_of_line();
ret line
}
}
(we could change take_while1!(not_line_ending)
to a take_till!(is_end_of_line)
and it would achieve the same code).
Once the macros are processed, it gives the following code in nom: https://gist.github.com/Geal/fa3740cf45530d123023
chomp uses the same approach, with iterators, in its version of take_while1
and take_till`: https://github.com/m4rw3r/chomp/blob/master/src/parsers.rs#L208-L253
Now, the interesting thing is the assembly generated by rustc ( 1.5.0-dev (ea2dabf 2015-10-21), but the version from yesterday has the same issues). Here is the nom version: http://dev.unhandledexpression.com/nom_http.pdf
And the chomp version: http://dev.unhandledexpression.com/chomp_http.pdf
We can see that nom's code is a lot more complex:
- large blocks of code calling nom's
Err
destructor (it is expected, but I'd like to improve that as well) - 4 bounds checks are still present, while they do not appear in chomp
I would like to know if there is a way to improve code generation. If the issue is in rustc, I can provide as many test cases as you need. If it is in nom, I'm open to any ideas ;)