Skip to content

Commit 2d1a77d

Browse files
Verify character class still non-empty after converting to byte class
For `[^\x00-\xff]`, while it is still treated as a full Unicode character class, it is not empty. For instance `≥` would still be matched. However, when `CharClass::to_byte_class` is called on it (as is done when using `regex::bytes::Regex::new` rather than `regex::Regex::new`), it _is_ now empty, since it excludes all possible bytes. This commit adds a test asserting that `regex::bytes::Regex::new` for this case (in accordance with #106) and adds an `is_empty` check to the result of calling `CharClass::to_byte_class`, which allows the test to pass.
1 parent 54ae5b6 commit 2d1a77d

File tree

2 files changed

+17
-1
lines changed

2 files changed

+17
-1
lines changed

regex-syntax/src/parser.rs

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -596,7 +596,17 @@ impl Parser {
596596
Ok(Build::Expr(if self.flags.unicode {
597597
Expr::Class(class)
598598
} else {
599-
Expr::ClassBytes(class.to_byte_class())
599+
let byte_class = class.to_byte_class();
600+
601+
// If `class` was only non-empty due to multibyte characters, the
602+
// corresponding byte class will now be empty.
603+
//
604+
// See https://github.com/rust-lang-nursery/regex/issues/303
605+
if byte_class.is_empty() {
606+
return Err(self.err(ErrorKind::EmptyClass));
607+
}
608+
609+
Expr::ClassBytes(byte_class)
600610
}))
601611
}
602612

tests/bytes.rs

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,3 +53,9 @@ matiter!(invalidutf8_anchor3,
5353
r"^|ddp\xff\xffdddddlQd@\x80",
5454
R(b"\x8d#;\x1a\xa4s3\x05foobarX\\\x0f0t\xe4\x9b\xa4"),
5555
(0, 0));
56+
57+
// See https://github.com/rust-lang-nursery/regex/issues/303
58+
#[test]
59+
fn negated_full_byte_range() {
60+
assert!(::regex::bytes::Regex::new(r#"[^\x00-\xff]"#).is_err());
61+
}

0 commit comments

Comments
 (0)