Skip to content

Binary search is used with incorrectly-sorted array #75

Closed
@SimonSapin

Description

@SimonSapin

The “dynamic” matching of CharClass uses a binary search within char ranges, which relies on the input being sorted. The input is indeed sorted in its “natural” order, but the comparison function in case-insensitive mode uses a different order.

This leads to incorrect results:

assert!(Regex::new(r"(?i)[a_]+$").unwrap().is_match("A_"));

The above fails, because _ in ASCII is between upper-case and lower-case letters. The comparison function maps the 'a'..'a' range to its upper case 'A'..'A', which has a different order relative to '_'..'_'.

Compare with e.g. the code below, which succeeds.

assert!(Regex::new(r"(?i)[a=]+$").unwrap().is_match("A="));

The comparison function has a FIXME to move the case mapping outside of it and have the Vec of ranges be already mapped. I assume this was intended for performance, but I think it’ll also fix this issue. I’ll try it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions