Closed
Description
The “dynamic” matching of CharClass
uses a binary search within char
ranges, which relies on the input being sorted. The input is indeed sorted in its “natural” order, but the comparison function in case-insensitive mode uses a different order.
This leads to incorrect results:
assert!(Regex::new(r"(?i)[a_]+$").unwrap().is_match("A_"));
The above fails, because _
in ASCII is between upper-case and lower-case letters. The comparison function maps the 'a'..'a'
range to its upper case 'A'..'A'
, which has a different order relative to '_'..'_'
.
Compare with e.g. the code below, which succeeds.
assert!(Regex::new(r"(?i)[a=]+$").unwrap().is_match("A="));
The comparison function has a FIXME
to move the case mapping outside of it and have the Vec
of ranges be already mapped. I assume this was intended for performance, but I think it’ll also fix this issue. I’ll try it.
Metadata
Metadata
Assignees
Labels
No labels