Closed
Description
What version of regex are you using?
1.7.1
Describe the bug at a high level.
The syntax of regex
seems to follow the ECMAScript spec. Upon checking, I noticed that 3 General Categories are not supported that work in other regex engines:
Surrogate
Cs
(abbreviation forSurrogate
)Sc
(abbreviation forCurrency_Symbol
)
But while the lack of support for Surrogate
may be intentional, \p{Sc}
should work, and does work when not abbreviated. It also works when prefixed with gc=
or General_Category=
:
Regex | compiles? |
---|---|
\p{Currency_Symbol} |
yes |
\p{Sc} |
no |
\p{gc=Sc} |
yes |
\p{GC=Sc} |
yes |
\p{General_Category=Sc} |
yes |
This bug might be related to the Script
/sc
property (e.g. \p{sc=Greek}
), which happens to have the same abbreviation as the Currency_Symbol
General Category.
I noticed that Unicode properties are case insensitive in regex, which is in violation of the ECMAScript spec. But since changing it would be a breaking change, it's probably not worth it.
What are the steps to reproduce the behavior?
Try to compile \p{Sc}
What is the actual behavior?
error: regex parse error:
\p{Sc}
^^^^^^
error: Unicode property not found
What is the expected behavior?
It should work.