-
Notifications
You must be signed in to change notification settings - Fork 50
Improved delimiter lexing #194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@swift-ci please test |
enum Delimiter: Hashable, CaseIterable { | ||
case traditional | ||
case experimental | ||
case reSingleQuote |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about rxSingleQuote
for experimental?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooh yes, thanks for reminding me. How does this look?
} | ||
} | ||
|
||
fileprivate struct DelimiterLexer { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the end, we might want something that can feed Source
(or even replace Source
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah would be nice to unify on an implementation that scans and produces unicode scalars for both
To avoid confusion with more general regex lexical analysis.
Introduce a DelimiterLexer type to perform the lexing.
This matches the behavior of the C++ lexer for string literals.
Allow the C++ lexer to form a tok::regex_literal. This avoids generic fallback behavior, and better allows for things like code completion. The test case for this will be in the C++ repo.
If a single quote is encountered with a prefix of either `(?`, `(?(`, `\k`, `\g` or `(?C`, continue to scan ahead to a closing `'`. Such prefixes would not be valid endings for a regex literal anyway, and this lets us handle the single quote variant of their syntax. For the group name cases, further refine this skipping behavior by only skipping over characters that could possibly appear in that case. This improves diagnostic behavior by ensuring we don't go wandering off into Swift code.
b1172a1
to
2325cef
Compare
@swift-ci please test |
Refactor the delimiter lexing logic, and change the behavior such that:
rx'...'
for experimental syntax.Additionally, implement a heuristic that allows skipping over single quotes in
re'...'
literals if the preceding characters are(?
,(?(
,\k
,\g
or(?C
. These would not be valid literal endings anyway, and this allows us to support their single-quoted syntax.This should be able to integrate without any C++ side changes, though I have some extra test cases I want to commit there whenever this gets integrated in.