You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -26,14 +26,22 @@ We also intend to achieve at least Level 1 (**TODO: do we want to promise Level
26
26
27
27
We're proposing the following regular expression syntactic superset for Swift.
28
28
29
+
### Top-level regular expression
30
+
31
+
```
32
+
Regex -> GlobalMatchingOptionSequence? RegexNode
33
+
RegexNode -> '' | Alternation
34
+
```
35
+
36
+
A top-level regular expression may consist of a sequence of global matching options followed by a `RegexNode`, which is the recursive part of the grammar that may be nested within e.g a group.
37
+
29
38
### Alternation
30
39
31
40
```
32
-
Regex -> '' | Alternation
33
41
Alternation -> Concatenation ('|' Concatenation)*
34
42
```
35
43
36
-
This is the operator with the lowest precedence in a regular expression, and checks if any of its branches match the input.
44
+
The `|`operator denotes what is formally called an alternation, or a choice between alternatives. Any number of alternatives may appear, including empty alternatives. This operator has the lowest precedence of all operators in a regex literal.
Implicitly denoted by adjacent expressions, a concatenation matches against a sequence of regular expression patterns. This has a higher precedence than an alternation, so e.g `abc|def` matches against `abc` or `def`. The `ConcatComponent` token varies across engine, but at least matches some form of trivia, e.g comments, quoted sequences e.g `\Q...\E`, and a quantified expression.
53
+
Implicitly denoted by adjacent expressions, a concatenation matches against a sequence of regular expression patterns. This has a higher precedence than an alternation, so e.g `abc|def` matches against `abc` or `def`. The `ConcatComponent` token varies across engine, but at least matches some form of trivia, e.g comments, quoted sequences e.g `\Q...\E`, and a potentially quantified expression.
QuantOperand -> AbsentFunction | Atom | Conditional | CustomCharClass | Group
53
65
```
54
66
55
-
Specifies that the operand may be matched against a certain number of times.
67
+
A quantification consists of an operand optionally followed by a quantifier that specifier how many times it may be matched. An operand without a quantifier is matched once.
68
+
69
+
The quantifiers supported are:
70
+
71
+
-`?`: 0 or 1 matches
72
+
-`*`: 0 or more matches
73
+
-`+`: 1 or more matches
74
+
-`{n,m}`: Between `n` and `m` (inclusive) matches
75
+
-`{n,}`: `n` or more matches
76
+
-`{,m}`: Up to `m` matches
77
+
-`{n}`: Exactly `n` matches
78
+
79
+
A quantifier may optionally followed by `?` or `+`, which apply certain semantics to the quantification. If neither are specified, by default the quantification happens eagerly, meaning that it will try to maximize the number of matches made. However, if `?` is specified, the number of matches will instead be minimized. If `+` is specified, eager matching occurs, but with the additional semantic that it may not be backtracked into to try a different number of matches.
80
+
81
+
### Atom
56
82
57
-
**TODO: Briefly mention each and what it means, noting that options can swap eager/reluctant. Might be a good time to introduce the eager/reluctant/possessive terminology**
83
+
```
84
+
Atom -> Anchor | EscapeSequence | BuiltinCharClass
85
+
```
86
+
87
+
Atoms are the smallest unit of regular expression syntax that cannot be split into smaller syntactic expressions.
Groups define a new scope within which a recursive regular expression pattern may occur. Groups have different semantics depending on how they are introduced, some may capture the nested match, some may match against the input without advancing, some may change the matching options set in the new scope, etc.
74
107
75
108
**TODO: Something like "note that there are other things that may syntactically appear similarly to groups, but are their own constructs. See .... in-line options, backreferences, ... **
76
109
110
+
#### Lookahead and lookbehind
111
+
112
+
#### Script runs
113
+
77
114
#### Balancing groups
78
115
79
116
```
@@ -86,15 +123,39 @@ Introduced by .NET, balancing groups extend the `GroupNameBody` syntax to suppor
Trivia is consumed by the regular expression parser, but has no semantic meaning. Non-semantic whitespace may only occur when the either of the extended syntax matching options `(?x)`, `(?xx)` are enabled.
[UTS#18][uts18] requires intersection and subtraction, and uses the operation spellings `&&` and `--` in its examples, though it doesn't mandate a particular spelling. In particular, conforming implementations could spell the subtraction `[[x]--[y]]` as `[[x]&&[^y]]`. UTS#18 also suggests a symmetric difference operator `~~`, and uses an explicit `||` operator in examples, though doesn't require either operations.
0 commit comments