Skip to content

Clarify (or change) extended/whitespace mode treatment of spaces in character classes #660

Closed
@mqudsi

Description

@mqudsi

I was bitten pretty hard (my fault!) by a subtle difference in eXtended mode's handling of spaces in character classes. I was expecting (except in a much more complicated context) (?x)[ ] to match a single space as it does with pcre2, but that does not seem to be the case (and doesn't seem to be documented?).

In PCRE2, (?x) enables spurious use of whitespace everywhere except in character classes where it is considered to be a literal value (the same way a . is a literal value in a character class, I suppose). To get whitespace in character classes, eXtra eXtended mode can be used: (?xx)[one two] does not match against a space:

> printf 'hello world' | pcre2grep '(?x)hello[ .]world'
hello world¶
> printf 'hello world' | pcre2grep '(?xx)hello[ .]world'
> # did not match

(You can also refer to https://www.regular-expressions.info/freespacing.html)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions