You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: Documentation/Evolution/RegexSyntax.md
+15-18Lines changed: 15 additions & 18 deletions
Original file line number
Diff line number
Diff line change
@@ -1,20 +1,14 @@
1
1
<!--
2
-
Hello, we want to issue an update to [Regular Expression Literals](https://forums.swift.org/t/pitch-regular-expression-literals/52820) and prepare for a formal proposal. The great delimiter deliberation continues to unfold, so in the meantime, we have a significant amount of surface area to present for review/feedback: the syntax _inside_ a regex literal.
2
+
Hello, we want to issue an update to [Regular Expression Literals](https://forums.swift.org/t/pitch-regular-expression-literals/52820) and prepare for a formal proposal. The great delimiter deliberation continues to unfold, so in the meantime, we have a significant amount of surface area to present for review/feedback: the syntax _inside_ a regex literal. Additionally, this is the syntax accepted from a string used for run-time regex construction, so we're devoting an entire pitch/proposal to the topic of _regex syntax_, distinct from the result builder DSL or the choice of delimiters for literals.
3
3
-->
4
4
5
-
# Regex Literal Interior Syntax
5
+
# Regex Syntax
6
6
7
7
- Authors: Hamish Knight, Michael Ilseman
8
8
9
9
## Introduction
10
10
11
-
Regex literals declare a string processing algorithm using syntax familiar across a variety of languages and tools throughout programming history. Formalizing regex literals in Swift requires:
12
-
13
-
- Choosing a delimiter (e.g. `#/.../#` or `re'...'`).
14
-
- Detailing the "interior syntax" accepted in between delimiters.
15
-
- Specifying actual types and relevant protocols for the literal.
16
-
17
-
We present a detailed and comprehensive treatment of regex literal interior syntax. The syntax we're proposing is large enough for its own dedicated discussion ahead of a full regex literal proposal.
11
+
A regex declares a string processing algorithm using syntax familiar across a variety of languages and tools throughout programming history. Regexes can be created from a string at run time or from a literal at compile time. The contents of that run-time string, or the contents in-between the compile-time literal's delimiters, uses regex syntax. We present a detailed and comprehensive treatment of regex syntax.
18
12
19
13
This is part of a larger effort in supporting regex literals, which in turn is part of a larger effort towards better string processing using regex. See [Pitch and Proposal Status](https://github.com/apple/swift-experimental-string-processing/issues/107), which tracks each relevant piece.
20
14
@@ -23,7 +17,7 @@ This is part of a larger effort in supporting regex literals, which in turn is p
23
17
24
18
Swift aims to be a pragmatic programming language, striking a balance between familiarity, interoperability, and advancing the art. Swift's `String` presents a uniquely Unicode-forward model of string, but currently suffers from limited processing facilities.
25
19
26
-
The full string processing effort includes a literal, a result builder DSL, protocols for intermixing 3rd party industrial-strength parsers with regex declarations, strong types, and a slew of regex-powered algorithms over strings.
20
+
The full string processing effort includes a regex type with strongly typed captures, the ability to create a regex from a string at runtime, a compile-time literal, a result builder DSL, protocols for intermixing 3rd party industrial-strength parsers with regex declarations, and a slew of regex-powered algorithms over strings.
27
21
28
22
This proposal specifically hones in on the _familiarity_ aspect by providing a best-in-class treatment of familiar regex syntax.
29
23
@@ -42,11 +36,11 @@ We also support [UTS#18][uts18]'s full set of character class operators (to our
42
36
43
37
Note that there are minor syntactic incompatibilities and ambiguities involved in this approach. Each is addressed in the relevant sections below
44
38
45
-
Regex literal interior syntax will be part of Swift's source-compatibility story as well as its binary-compatibility story. Thus, we present a detailed and comprehensive design.
39
+
Regex syntax will be part of Swift's source-compatibility story as well as its binary-compatibility story. Thus, we present a detailed and comprehensive design.
46
40
47
41
## Detailed Design
48
42
49
-
We propose the following syntax for use inside Swift regex literals.
A regex literal may be prefixed with a sequence of [global matching options](#pcre-global-matching-options). A literal's contents can be empty or a sequence of alternatives separated by `|`.
76
+
A regex may be prefixed with a sequence of [global matching options](#pcre-global-matching-options). Its contents can be empty or a sequence of alternatives separated by `|`.
83
77
84
78
Alternatives are a series of expressions concatenated together. The concatentation ends with either a `|` denoting the end of the alternative or a `)` denoting the end of a recursively parsed group.
85
79
@@ -471,6 +465,8 @@ These options are specific to the Swift regex matching engine and control the se
471
465
-`u`: Unicode scalar matching.
472
466
-`b`: Byte matching.
473
467
468
+
Further details on these are TBD and outside the scope of this pitch.
469
+
474
470
### References
475
471
476
472
```
@@ -816,9 +812,10 @@ We are deferring runtime support for callouts from regex literals as future work
816
812
817
813
## Alternatives Considered
818
814
819
-
### Skip the literals
820
815
821
-
The top alternative is to just skip regex literals and only ship the result builder DSL. However, doing so would miss out on the familiarity benefits of existing regex syntax.
816
+
### Skip the syntax
817
+
818
+
The top alternative is to just skip regex syntax altogether by only shipping the result builder DSL and forbidding run-time regex construction from strings. However, doing so would miss out on the familiarity benefits of existing regex syntax. Additionally, without support for run-time strings containing regex syntax, important domains would be closed off from better string processing, such as command-line tools and user-input searches. This would land us in a confusing world where NSRegularExpression, even though it operates over a fundamentally different model of string than Swift's `String` and exhibits different behavior than Swift regexes, is still used for these purposes.
822
819
823
820
We consider our proposed direction to be more compelling, especially when coupled with refactoring actions to convert literals into regex DSLs.
824
821
@@ -830,11 +827,11 @@ We are prototyping an "experimental" Swift extended syntax, which is future work
830
827
831
828
### Support a minimal syntactic subset
832
829
833
-
Regex literal interior syntax will become part of Swift's source and binary-compatibility story, so a reasonable alternative is to support the absolute minimal syntactic subset available. However, we would need to ensure that such a minimal approach is extensible far into the future. Because syntax decisions can impact each other, we would want to consider the ramifications of this full syntactic superset ahead of time anyways.
830
+
Regex syntax will become part of Swift's source and binary-compatibility story, so a reasonable alternative is to support the absolute minimal syntactic subset available. However, we would need to ensure that such a minimal approach is extensible far into the future. Because syntax decisions can impact each other, we would want to consider the ramifications of this full syntactic superset ahead of time anyways.
834
831
835
-
Even though it is more work up-front, and creates a longer proposal, it is less risky to support the full intended syntax. The proposed superset maximizes the familiarity benefit of regex literals.
832
+
Even though it is more work up-front and creates a longer proposal, it is less risky to support the full intended syntax. The proposed superset maximizes the familiarity benefit of regex syntax.
836
833
837
-
Note that this proposal regards _syntactic_ support, and does not necessarily mean that everything that can be written will be supported by Swift's run-time in the initial release. Support for more obscure features may appear over time, see [MatchingEngine Capabilities and Roadmap](https://github.com/apple/swift-experimental-string-processing/issues/99) for status.
834
+
Note that this proposal regards _syntactic_ support, and does not necessarily mean that everything that can be written will be supported by Swift's runtime engine in the initial release. Support for more obscure features may appear over time, see [MatchingEngine Capabilities and Roadmap](https://github.com/apple/swift-experimental-string-processing/issues/99) for status.
0 commit comments