Skip to content

\p{Lc} fails to compile, but should be equal to \p{Cased_Letter} #965

Closed
@Aloso

Description

@Aloso

What version of regex are you using?

1.7.1

Describe the bug at a high level.

The syntax of regex seems to follow the ECMAScript spec. Upon checking, I noticed that 3 General Categories are not supported that work in other regex engines:

  • Surrogate
  • Cs (abbreviation for Surrogate)
  • Sc (abbreviation for Currency_Symbol)

But while the lack of support for Surrogate may be intentional, \p{Sc} should work, and does work when not abbreviated. It also works when prefixed with gc= or General_Category=:

Regex compiles?
\p{Currency_Symbol} yes
\p{Sc} no
\p{gc=Sc} yes
\p{GC=Sc} yes
\p{General_Category=Sc} yes

This bug might be related to the Script/sc property (e.g. \p{sc=Greek}), which happens to have the same abbreviation as the Currency_Symbol General Category.

I noticed that Unicode properties are case insensitive in regex, which is in violation of the ECMAScript spec. But since changing it would be a breaking change, it's probably not worth it.

What are the steps to reproduce the behavior?

Try to compile \p{Sc}

What is the actual behavior?

error: regex parse error:
    \p{Sc}
    ^^^^^^
error: Unicode property not found

What is the expected behavior?

It should work.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions