Open
Description
What version of regex are you using?
0.7.2
Describe the bug at a high level.
When you have a HirKind::Class and a HirKind::Literal and you join the two, I expect it to be a Class to reduce the syntax tree layers.
What are the steps to reproduce the behavior?
#[test]
fn test_merge() {
let let_dig = regex_syntax::parse("[a-zA-Z0-9]").unwrap();
let hyp = regex_syntax::parse("-").unwrap();
let let_dig_hyp = Hir::alternation(vec![let_dig, hyp]);
// Expected: (?:[0-9A-Za-z-])
// Got: (?:[0-9A-Za-z]|\-)
println!("{}", let_dig_hyp.to_string());
}
What is the actual behavior?
It's added into a whole new HirKind::Alternation.
What is the expected behavior?
We already have the optimization of simplifying a|b|c
into [abc]
, but I would like to see (?:(?:a|b)|c)
to also be simplified to [abc]
.
Context
I am writing a "composable regex" library to allow users to combine pieces of regexes with |
, +
, *
, ?
to make regexes more maintainable. When I write test cases I realized the output Hir's are not optimal.