Skip to content

Verify character class still non-empty after converting to byte class #304

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

scooter-dangle
Copy link
Contributor

@scooter-dangle scooter-dangle commented Dec 8, 2016

For [^\x00-\xff], while it is still treated as a full Unicode character class, it is not empty. For instance would still be matched.

However, when CharClass::to_byte_class is called on it (as is done when using regex::bytes::Regex::new rather than regex::Regex::new), it is now empty, since it excludes all possible bytes.

This commit adds a test asserting that regex::bytes::Regex::new returns Err for this case (in accordance with #106) and adds an is_empty check to the result of calling CharClass::to_byte_class, which allows the test to pass.

Fixes #303

@scooter-dangle
Copy link
Contributor Author

I wasn't sure if you'd prefer moving the original class.is_empty() check down into the true branch of the if self.flags.unicode conditional to avoid making the call twice when we're parsing a byte class.

I didn't look into the to_byte_class code at all to see if it might rely at some point on the input not being empty.

For `[^\x00-\xff]`, while it is still treated as a full Unicode
character class, it is not empty. For instance `≥` would still be
matched.

However, when `CharClass::to_byte_class` is called on it (as is done
when using `regex::bytes::Regex::new` rather than `regex::Regex::new`),
it _is_ now empty, since it excludes all possible bytes.

This commit adds a test asserting that `regex::bytes::Regex::new`
returns `Err` for this case (in accordance with
rust-lang#106) and adds an
`is_empty` check to the result of calling `CharClass::to_byte_class`,
which allows the test to pass.
@scooter-dangle scooter-dangle force-pushed the panic-on-empty-byte-range-for-bytes-regex branch from 2d1a77d to b96e5cb Compare December 8, 2016 02:20
@BurntSushi BurntSushi merged commit af00721 into rust-lang:master Dec 29, 2016
@BurntSushi
Copy link
Member

I finally had a chance to actually dig into the problem you described and I agree with your analysis and the fix. Thank you so much!

I will add a test to regex-syntax for this as well.

@scooter-dangle
Copy link
Contributor Author

Thanks for looking into it and merging! (Also, thanks for being patient while I was verifying where/how we were seeing the issue.)

@scooter-dangle scooter-dangle deleted the panic-on-empty-byte-range-for-bytes-regex branch December 29, 2016 00:35
BurntSushi added a commit that referenced this pull request Dec 29, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants