Skip to content

Improved delimiter lexing #194

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Mar 2, 2022

Conversation

hamishknight
Copy link
Contributor

@hamishknight hamishknight commented Mar 2, 2022

Refactor the delimiter lexing logic, and change the behavior such that:

  • We now diagnose unprintable ASCII characters to match behavior with the C++ lexer.
  • We now allow the C++ lexer to better recover from a missing closing delimiter.
  • We now support rx'...' for experimental syntax.

Additionally, implement a heuristic that allows skipping over single quotes in re'...' literals if the preceding characters are (?, (?(, \k, \g or (?C. These would not be valid literal endings anyway, and this allows us to support their single-quoted syntax.

This should be able to integrate without any C++ side changes, though I have some extra test cases I want to commit there whenever this gets integrated in.

@hamishknight
Copy link
Contributor Author

@swift-ci please test

@hamishknight hamishknight requested a review from milseman March 2, 2022 15:23
enum Delimiter: Hashable, CaseIterable {
case traditional
case experimental
case reSingleQuote
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about rxSingleQuote for experimental?

Copy link
Contributor Author

@hamishknight hamishknight Mar 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooh yes, thanks for reminding me. How does this look?

}
}

fileprivate struct DelimiterLexer {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the end, we might want something that can feed Source (or even replace Source)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah would be nice to unify on an implementation that scans and produces unicode scalars for both

To avoid confusion with more general regex lexical
analysis.
Introduce a DelimiterLexer type to perform the
lexing.
This matches the behavior of the C++ lexer for
string literals.
Allow the C++ lexer to form a tok::regex_literal.
This avoids generic fallback behavior, and better
allows for things like code completion. The test
case for this will be in the C++ repo.
If a single quote is encountered with a prefix of
either `(?`, `(?(`, `\k`, `\g` or `(?C`, continue
to scan ahead to a closing `'`. Such prefixes would
not be valid endings for a regex literal anyway,
and this lets us handle the single quote variant
of their syntax.

For the group name cases, further refine this
skipping behavior by only skipping over characters
that could possibly appear in that case. This
improves diagnostic behavior by ensuring we don't
go wandering off into Swift code.
@hamishknight
Copy link
Contributor Author

@swift-ci please test

@hamishknight hamishknight merged commit d191d8e into swiftlang:main Mar 2, 2022
@hamishknight hamishknight deleted the quoted-in-context branch March 2, 2022 17:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants