Skip to content

implement regex engine for handling regexes like ^(<match at most one codepoint>){m,n}$ #802

Open
@demurgos

Description

@demurgos

What version of regex are you using?

1.5.4

Describe the bug at a high level.

I use regexes to validate string inputs. Usually the strings are fairly small and there are no issues. Today I wanted to accept any text as long as it is shorter than 10000 unicode codepoints. I expected the following regex to work ^(?s:.){0,10000}$.

This triggered a CompiledTooBig(10485760) error instead.

What are the steps to reproduce the behavior?

Code:

fn main() {
    let re = regex::Regex::new(r"^(?s:.){0,10000}$").unwrap();
    dbg!(re);
}

https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=bf2062cc98fc1a2c2afe61c88ea0cc86

What is the actual behavior?

Output:

   Compiling playground v0.0.1 (/playground)
    Finished dev [unoptimized + debuginfo] target(s) in 3.51s
     Running `target/debug/playground`
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: CompiledTooBig(10485760)', src/main.rs:2:54
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

What is the expected behavior?

The regex should compile and be reasonably small. It looks like the memory requirements grow very large with the maximum string length checked by this regex.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions