Skip to content

RegexSet match first [feature request] #259

Closed
@jayanderson

Description

@jayanderson

The common case (for me at least) is to expect only one of the regular expressions in the set to match. I assume the is_match function works by finding the first match. It'd be nice to be able to know which regular expression that was. The (somewhat contrived) benchmark below shows that performing the full match and then getting the first match is slower than is_match. Could adding a function to return the index of the first match have the same performance as is_match? Let me know if I'm off in my assumptions or if there's something else causing the slowdown.

#[cfg(test)]
mod tests {
  use regex::RegexSet;
  use test::Bencher;

  #[bench]
  fn is_match_regex_set(b: &mut Bencher) {
    let regexset = RegexSet::new(&[
      r"^0+",
      r"^1+",
      r"^2+",
      r"^3+",
      r"^4+",
      r"^a+",
      r"^5+",
      r"^6+",
      r"^7+",
      r"^8+",
      r"^9+",
    ]).unwrap();
    b.iter(|| regexset.is_match("aaaaa"));
  }

  #[bench]
  fn match_first_regex_set(b: &mut Bencher) {
    let regexset = RegexSet::new(&[
      r"^0+",
      r"^1+",
      r"^2+",
      r"^3+",
      r"^4+",
      r"^a+",
      r"^5+",
      r"^6+",
      r"^7+",
      r"^8+",
      r"^9+",
    ]).unwrap();
    b.iter(|| regexset.matches("aaaaa").into_iter().next().unwrap());
  }
}
running 2 tests
test tests::is_match_regex_set    ... bench:          13 ns/iter (+/- 1)
test tests::match_first_regex_set ... bench:         320 ns/iter (+/- 8)

test result: ok. 0 passed; 0 failed; 0 ignored; 2 measured

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions