Skip to content

regex-syntax Hir should merge alternation of Class & Literal into Class #1001

Open
@rynewang

Description

@rynewang

What version of regex are you using?

0.7.2

Describe the bug at a high level.

When you have a HirKind::Class and a HirKind::Literal and you join the two, I expect it to be a Class to reduce the syntax tree layers.

What are the steps to reproduce the behavior?

  #[test]
    fn test_merge() {
        let let_dig = regex_syntax::parse("[a-zA-Z0-9]").unwrap();
        let hyp = regex_syntax::parse("-").unwrap();
        let let_dig_hyp = Hir::alternation(vec![let_dig, hyp]);

        // Expected: (?:[0-9A-Za-z-])
        // Got: (?:[0-9A-Za-z]|\-)
        println!("{}", let_dig_hyp.to_string());
    }

What is the actual behavior?

It's added into a whole new HirKind::Alternation.

What is the expected behavior?

We already have the optimization of simplifying a|b|c into [abc], but I would like to see (?:(?:a|b)|c) to also be simplified to [abc].

Context

I am writing a "composable regex" library to allow users to combine pieces of regexes with |, +, *, ? to make regexes more maintainable. When I write test cases I realized the output Hir's are not optimal.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions