Skip to content

Commit 5c64747

Browse files
authored
Rework: syntax for both literals and run time
1 parent a206f99 commit 5c64747

File tree

1 file changed

+15
-18
lines changed

1 file changed

+15
-18
lines changed

Documentation/Evolution/RegexSyntax.md

Lines changed: 15 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,14 @@
11
<!--
2-
Hello, we want to issue an update to [Regular Expression Literals](https://forums.swift.org/t/pitch-regular-expression-literals/52820) and prepare for a formal proposal. The great delimiter deliberation continues to unfold, so in the meantime, we have a significant amount of surface area to present for review/feedback: the syntax _inside_ a regex literal.
2+
Hello, we want to issue an update to [Regular Expression Literals](https://forums.swift.org/t/pitch-regular-expression-literals/52820) and prepare for a formal proposal. The great delimiter deliberation continues to unfold, so in the meantime, we have a significant amount of surface area to present for review/feedback: the syntax _inside_ a regex literal. Additionally, this is the syntax accepted from a string used for run-time regex construction, so we're devoting an entire pitch/proposal to the topic of _regex syntax_, distinct from the result builder DSL or the choice of delimiters for literals.
33
-->
44

5-
# Regex Literal Interior Syntax
5+
# Regex Syntax
66

77
- Authors: Hamish Knight, Michael Ilseman
88

99
## Introduction
1010

11-
Regex literals declare a string processing algorithm using syntax familiar across a variety of languages and tools throughout programming history. Formalizing regex literals in Swift requires:
12-
13-
- Choosing a delimiter (e.g. `#/.../#` or `re'...'`).
14-
- Detailing the "interior syntax" accepted in between delimiters.
15-
- Specifying actual types and relevant protocols for the literal.
16-
17-
We present a detailed and comprehensive treatment of regex literal interior syntax. The syntax we're proposing is large enough for its own dedicated discussion ahead of a full regex literal proposal.
11+
A regex declares a string processing algorithm using syntax familiar across a variety of languages and tools throughout programming history. Regexes can be created from a string at run time or from a literal at compile time. The contents of that run-time string, or the contents in-between the compile-time literal's delimiters, uses regex syntax. We present a detailed and comprehensive treatment of regex syntax.
1812

1913
This is part of a larger effort in supporting regex literals, which in turn is part of a larger effort towards better string processing using regex. See [Pitch and Proposal Status](https://github.com/apple/swift-experimental-string-processing/issues/107), which tracks each relevant piece.
2014

@@ -23,7 +17,7 @@ This is part of a larger effort in supporting regex literals, which in turn is p
2317

2418
Swift aims to be a pragmatic programming language, striking a balance between familiarity, interoperability, and advancing the art. Swift's `String` presents a uniquely Unicode-forward model of string, but currently suffers from limited processing facilities.
2519

26-
The full string processing effort includes a literal, a result builder DSL, protocols for intermixing 3rd party industrial-strength parsers with regex declarations, strong types, and a slew of regex-powered algorithms over strings.
20+
The full string processing effort includes a regex type with strongly typed captures, the ability to create a regex from a string at runtime, a compile-time literal, a result builder DSL, protocols for intermixing 3rd party industrial-strength parsers with regex declarations, and a slew of regex-powered algorithms over strings.
2721

2822
This proposal specifically hones in on the _familiarity_ aspect by providing a best-in-class treatment of familiar regex syntax.
2923

@@ -42,11 +36,11 @@ We also support [UTS#18][uts18]'s full set of character class operators (to our
4236

4337
Note that there are minor syntactic incompatibilities and ambiguities involved in this approach. Each is addressed in the relevant sections below
4438

45-
Regex literal interior syntax will be part of Swift's source-compatibility story as well as its binary-compatibility story. Thus, we present a detailed and comprehensive design.
39+
Regex syntax will be part of Swift's source-compatibility story as well as its binary-compatibility story. Thus, we present a detailed and comprehensive design.
4640

4741
## Detailed Design
4842

49-
We propose the following syntax for use inside Swift regex literals.
43+
We propose the following syntax for regex.
5044

5145
<details><summary>Grammar Notation</summary>
5246

@@ -79,7 +73,7 @@ Alternation -> Concatenation ('|' Concatenation)*
7973
Concatenation -> (!'|' !')' ConcatComponent)*
8074
```
8175

82-
A regex literal may be prefixed with a sequence of [global matching options](#pcre-global-matching-options). A literal's contents can be empty or a sequence of alternatives separated by `|`.
76+
A regex may be prefixed with a sequence of [global matching options](#pcre-global-matching-options). Its contents can be empty or a sequence of alternatives separated by `|`.
8377

8478
Alternatives are a series of expressions concatenated together. The concatentation ends with either a `|` denoting the end of the alternative or a `)` denoting the end of a recursively parsed group.
8579

@@ -471,6 +465,8 @@ These options are specific to the Swift regex matching engine and control the se
471465
- `u`: Unicode scalar matching.
472466
- `b`: Byte matching.
473467

468+
Further details on these are TBD and outside the scope of this pitch.
469+
474470
### References
475471

476472
```
@@ -816,9 +812,10 @@ We are deferring runtime support for callouts from regex literals as future work
816812

817813
## Alternatives Considered
818814

819-
### Skip the literals
820815

821-
The top alternative is to just skip regex literals and only ship the result builder DSL. However, doing so would miss out on the familiarity benefits of existing regex syntax.
816+
### Skip the syntax
817+
818+
The top alternative is to just skip regex syntax altogether by only shipping the result builder DSL and forbidding run-time regex construction from strings. However, doing so would miss out on the familiarity benefits of existing regex syntax. Additionally, without support for run-time strings containing regex syntax, important domains would be closed off from better string processing, such as command-line tools and user-input searches. This would land us in a confusing world where NSRegularExpression, even though it operates over a fundamentally different model of string than Swift's `String` and exhibits different behavior than Swift regexes, is still used for these purposes.
822819

823820
We consider our proposed direction to be more compelling, especially when coupled with refactoring actions to convert literals into regex DSLs.
824821

@@ -830,11 +827,11 @@ We are prototyping an "experimental" Swift extended syntax, which is future work
830827

831828
### Support a minimal syntactic subset
832829

833-
Regex literal interior syntax will become part of Swift's source and binary-compatibility story, so a reasonable alternative is to support the absolute minimal syntactic subset available. However, we would need to ensure that such a minimal approach is extensible far into the future. Because syntax decisions can impact each other, we would want to consider the ramifications of this full syntactic superset ahead of time anyways.
830+
Regex syntax will become part of Swift's source and binary-compatibility story, so a reasonable alternative is to support the absolute minimal syntactic subset available. However, we would need to ensure that such a minimal approach is extensible far into the future. Because syntax decisions can impact each other, we would want to consider the ramifications of this full syntactic superset ahead of time anyways.
834831

835-
Even though it is more work up-front, and creates a longer proposal, it is less risky to support the full intended syntax. The proposed superset maximizes the familiarity benefit of regex literals.
832+
Even though it is more work up-front and creates a longer proposal, it is less risky to support the full intended syntax. The proposed superset maximizes the familiarity benefit of regex syntax.
836833

837-
Note that this proposal regards _syntactic_ support, and does not necessarily mean that everything that can be written will be supported by Swift's run-time in the initial release. Support for more obscure features may appear over time, see [MatchingEngine Capabilities and Roadmap](https://github.com/apple/swift-experimental-string-processing/issues/99) for status.
834+
Note that this proposal regards _syntactic_ support, and does not necessarily mean that everything that can be written will be supported by Swift's runtime engine in the initial release. Support for more obscure features may appear over time, see [MatchingEngine Capabilities and Roadmap](https://github.com/apple/swift-experimental-string-processing/issues/99) for status.
838835

839836

840837

0 commit comments

Comments
 (0)