From a5f30565241f350982d44d98761fb60f5ca8441b Mon Sep 17 00:00:00 2001
From: Michael Ilseman <michael.ilseman@gmail.com>
Date: Thu, 25 May 2023 06:24:44 -0600
Subject: [PATCH] Update swift main (#674)

* Atomically load the lowered program (#610)

Since we're atomically initializing the compiled program in
`Regex.Program`, we need to pair that with an atomic load.

Resolves #609.

* Add tests for line start/end word boundary diffs (#616)

The `default` and `simple` word boundaries have different behaviors
at the start and end of strings/lines. These tests validate that we
have the correct behavior implemented. Related to issue #613.

* Add tweaks for Android

* Fix documentation typo (#615)

* Fix abstract for Regex.dotMatchesNewlines(_:). (#614)

The old version looks like it was accidentally duplicated from
anchorsMatchLineEndings(_:) just below it.

* Remove `RegexConsumer` and fix its dependencies (#617)

* Remove `RegexConsumer` and fix its dependencies

This eliminates the RegexConsumer type and rewrites its users to call
through to other, existing functionality on Regex or in the Algorithms
implementations. RegexConsumer doesn't take account of the dual
subranges required for matching, so it can produce results that are
inconsistent with matches(of:) and ranges(of:), which were rewritten
earlier.

rdar://102841216

* Remove remaining from-end algorithm methods

This removes methods that are left over from when we were considering
from-end algorithms. These aren't tested and may not have the correct
semantics, so it's safer to remove them entirely.

* Improve StringProcessing and RegexBuilder documentation (#611)

This includes documentation improvements for core types/methods,
RegexBuilder types along with their generated variadic initializers,
and adds some curation. It also includes tests of the documentation
code samples.

* Set availability for inverted character class test (#621)

This feature depends on running with a Swift 5.7 stdlib, and fails
when that isn't available.

* Add type annotations in RegexBuilder tests

These changes work around a change to the way result builders are
compiled that removes the ability for result builder closure outputs
to affect the overload resolution elsewhere in an expression.

Workarounds for rdar://104881395 and rdar://104645543

* Workaround for fileprivate array issue

A recent compiler change results in fileprivate arrays sometimes
not keeping their buffers around long enough. This change avoids that
issue by removing the fileprivate annotations from the affected type.

* Fix an issue where named character classes weren't getting converted in the result builder. <rdar://104480703>

* Stop at end of search string in TwoWaySearcher (#631)

When searching for a substring that doesn't exist, it was possible
for TwoWaySearcher to advance beyond the end of the search string,
causing a crash. This change adds a `limitedBy:` parameter to that
index movement, avoiding the invalid movement.

Fixes rdar://105154010

* Correct misspelling in DSL renderer (#627)

vertial -> vertical

rdar://104602317

* Fix output type mismatch with RegexBuilder (#626)

Some regex literals (and presumably other `Regex` instances) lose
their output type information when used in a RegexBuilder closure due
to the way the concatenating builder calls are overloaded. In
particular, any output type with labeled tuples or where the sum of
tuple components in the accumulated and new output types is greater
than 10 will be ignored.

Regex internals don't make this distinction, however, so there ends up
being a mismatch between what a `Regex.Match` instance tries to
produce and the output type of the outermost regex. For example, this
code results in a crash, because `regex` is a `Regex<Substring>`
but the match tries to produce a `(Substring, number: Substring)`:

    let regex = Regex {
        ZeroOrMore(.whitespace)
        /:(?<number>\d+):/
        ZeroOrMore(.whitespace)
    }
    let match = try regex.wholeMatch(in: " :21: ")
    print(match!.output)

To fix this, we add a new `ignoreCapturesInTypedOutput` DSLTree node
to mark situations where the output type is discarded. This status
is propagated through the capture list into the match's storage,
which lets us produce the correct output type. Note that we can't just
drop the capture groups when building the compiled program because
(1) different parts of the regex might reference the capture group
and (2) all capture groups are available if a developer converts the
output to `AnyRegexOutput`.

    let anyOutput = AnyRegexOutput(match)
    // anyOutput[1] == "21"
    // anyOutput["number"] == Optional("21")

Fixes #625. rdar://104823356

Note: Linux seems to crash on different tests when the two customTest
overloads have `internal` visibility or are called. Switching one of the
functions to be generic over a RegexComponent works around the issue.

* Revert "Merge pull request #628 from apple/result_builder_changes_workaround"

This reverts commit 7e059b7bd4e647bc884a71731fcfdc4fffda0700, reversing
changes made to 3ca8b134e4c93e8c6d3f7b060b0aefcf06b92039.

* Use `some` syntax in variadics

This supports a type checker fix after the change in how result
builder closure parameters are type-checked.

* Type checker workaround: adjust test

* Further refactor to work around type checker regression

* Align availability macro with OS versions (#641)

* Speed up general character class matching (#642)

Short-circuit Character.isASCII checks inside built in character class matching.

Also, make benchmark try a few more times before giving up.

* Test for \s matching CRLF when scalar matching (#648)

* General ascii fast paths for character classes (#644)

General ASCII fast-paths for builtin character classes

* Remove the unsupported `anyScalar` case (#650)

We decided not to support the `anyScalar` character class, which would
match a single Unicode scalar regardless of matching mode. However,
its representation was still included in the various character class
types in the regex engine, leading to unreachable code and unclear
requirements when changing or adding new code. This change removes
that representation where possible.

The `DSLTree.Atom.CharacterClass` enum is left unchanged, since it
is marked `@_spi(RegexBuilder) public`. Any use of that enum case
is handled with a `fatalError("Unsupported")`, and it isn't produced
on any code path.

* Fix range-based quantification fast path (#653)

The fast path for quantification incorrectly discards the last save
position when the quantification used up all possible trips, which is
only possible with range-based quantifications (e.g. `{0,3}`). This
bug shows up when a range-based quantifier matches the maximum - 1
repetitions of the preceding pattern.

For example, the regex `/a{0,2}a/` should succeed as a full match any
of the strings "aa", "aaa", or "aaaa". However, the pattern fails
to match "aaa", since the save point allowing a single "a" to match
the first `a{0,2}` part of the regex is discarded.

This change only discards the last save position when advancing the
quantifier fails due to a failure to match, not maxing out the number
of trips.

* Add in ASCII fast-path for anyNonNewline (#654)

* Avoid long expression type checks (#657)

These changes remove several seconds of type-checking time from the
RegexBuilder test cases, bringing all expressions under 150ms (on
the tested computer).

* Processor cleanup (#655)

Clean up and refactor the processor

* Simplify instruction fetching

* Refactor metrics out, and void their storage in release builds

*Put operations onto String

* Fix `firstRange(of:)` search (#656)

Calls to `ranges(of:)` and `firstRange(of:)` with a string parameter
actually use two different string searching algorithms. `ranges(of:)`
uses the "z-searcher" algorithm, while `firstRange(of:)` uses a
two-way search. Since it's better to align on a single path for these
searches, the z-searcher has lower requirements, and the two-way
search implementation has a correctness bug, this change removes
the two-way search algorithm and uses z-search for `firstRange(of:)`.

The correctness bug in `firstRange(of:)` appears only when searching
for the second (or later) occurrence of a substring, which you have
to be fairly deliberate about. In the example below, the substring
at offsets `7..<12` is missed:

    let text = "ADACBADADACBADACB"
    //          =====  -----=====
    let pattern = "ADACB"
    let firstRange = text.firstRange(of: pattern)!
    // firstRange ~= 0..<5
    let secondRange = text[firstRange.upperBound...].firstRange(of: pattern)!
    // secondRange ~= 12..<17

This change also removes some unrelated, unused code in Split.swift,
in addition to removing an (unused) usage of `TwoWaySearcher`.

rdar://92794248

* Bug fix and hot path for quantified `.` (#658)

Bug fix in newline hot path, and apply hot path to quantified dot

* Run scalar-semantic benchmark variants (#659)

Run scalar semantic benchmarks

* Refactor operations to be on String (#664)

Finish refactoring logic onto String

* Provide unique generic method parameter names (#669)

This is getting warned on in the 5.9 compiler, will be an error
starting in Swift 6.

* Enable quantification optimizations for scalar semantics (#671)

*  Quantified scalar semantic matching

* Remove redundant test

---------

Co-authored-by: Nate Cook <natecook@apple.com>
Co-authored-by: Butta <repo@butta.fastem.com>
Co-authored-by: Ole Begemann <ole@oleb.net>
Co-authored-by: Alex Martini <amartini@apple.com>
Co-authored-by: Alejandro Alonso <alejandro_alonso@apple.com>
Co-authored-by: David Ewing <dewing@apple.com>
Co-authored-by: Dave Ewing <96321608+DaveEwing@users.noreply.github.com>
---
 Sources/RegexBenchmark/Benchmark.swift        |  92 ++---
 Sources/RegexBenchmark/BenchmarkRunner.swift  | 143 ++++++-
 .../Suite/CustomCharacterClasses.swift        |  72 ++--
 Sources/RegexBenchmark/Utils/Stats.swift      |   4 +-
 .../Utility/TypeConstruction.swift            |  12 +-
 .../Algorithms/Algorithms/FirstRange.swift    |   7 +-
 .../Algorithms/Algorithms/Split.swift         |  60 ---
 .../Algorithms/Searchers/TwoWaySearcher.swift | 197 ---------
 Sources/_StringProcessing/ByteCodeGen.swift   |  19 +-
 Sources/_StringProcessing/CMakeLists.txt      |   1 -
 .../Engine/Backtracking.swift                 |  17 +-
 .../Engine/InstPayload.swift                  |  49 ++-
 .../_StringProcessing/Engine/MEBuilder.swift  |  20 +-
 .../_StringProcessing/Engine/MEBuiltins.swift |  81 +++-
 .../_StringProcessing/Engine/MEQuantify.swift |  70 +++-
 .../_StringProcessing/Engine/Metrics.swift    |  67 +++-
 .../_StringProcessing/Engine/Processor.swift  | 378 ++++++++++--------
 .../_StringProcessing/Engine/Tracing.swift    |   5 +
 Sources/_StringProcessing/Executor.swift      |   4 +-
 .../Regex/AnyRegexOutput.swift                |   4 +-
 Sources/_StringProcessing/Unicode/ASCII.swift |  34 +-
 .../Unicode/WordBreaking.swift                |   4 +
 .../Utility/AsciiBitset.swift                 |   4 +-
 .../_StringProcessing/Utility/Protocols.swift |  10 -
 .../_StringProcessing/Utility/Traced.swift    |   2 +-
 .../_CharacterClassModel.swift                |   2 +-
 Tests/RegexBuilderTests/AlgorithmsTests.swift |   2 +-
 Tests/RegexBuilderTests/CustomTests.swift     |   4 +-
 Tests/RegexBuilderTests/RegexDSLTests.swift   |  10 +-
 Tests/RegexTests/AlgorithmsTests.swift        |  11 +-
 Tests/RegexTests/MatchTests.swift             |  80 ++++
 31 files changed, 823 insertions(+), 642 deletions(-)
 delete mode 100644 Sources/_StringProcessing/Algorithms/Searchers/TwoWaySearcher.swift
diff --git a/Sources/RegexBenchmark/Benchmark.swift b/Sources/RegexBenchmark/Benchmark.swift
index 2622639fb..f807a8e55 100644
--- a/Sources/RegexBenchmark/Benchmark.swift
+++ b/Sources/RegexBenchmark/Benchmark.swift
@@ -71,6 +71,13 @@ struct NSBenchmark: RegexBenchmark {
   enum NSMatchType {
     case allMatches
     case first
+
+    init(_ type: Benchmark.MatchType) {
+      switch type {
+      case .whole, .first: self = .first
+      case .allMatches: self = .allMatches
+      }
+    }
   }
   
   func run() {
@@ -126,7 +133,7 @@ struct CrossBenchmark {
   /// The base name of the benchmark
   var baseName: String
 
-  /// The string to compile in differnet engines
+  /// The string to compile in different engines
   var regex: String
 
   /// The text to search
@@ -143,57 +150,32 @@ struct CrossBenchmark {
   /// Whether or not to do firstMatch as well or just allMatches
   var includeFirst: Bool = false
 
+  /// Whether to also run scalar-semantic mode
+  var alsoRunScalarSemantic: Bool = true
+
   func register(_ runner: inout BenchmarkRunner) {
-    let swiftRegex = try! Regex(regex)
-    let nsRegex: NSRegularExpression
     if isWhole {
-      nsRegex = try! NSRegularExpression(pattern: "^" + regex + "$")
+      runner.registerCrossBenchmark(
+        nameBase: baseName,
+        input: input,
+        pattern: regex,
+        .whole,
+        alsoRunScalarSemantic: alsoRunScalarSemantic)
     } else {
-      nsRegex = try! NSRegularExpression(pattern: regex)
-    }
+      runner.registerCrossBenchmark(
+        nameBase: baseName,
+        input: input,
+        pattern: regex,
+        .allMatches,
+        alsoRunScalarSemantic: alsoRunScalarSemantic)
 
-    if isWhole {
-      runner.register(
-        Benchmark(
-          name: baseName + "Whole",
-          regex: swiftRegex,
-          pattern: regex,
-          type: .whole,
-          target: input))
-      runner.register(
-        NSBenchmark(
-          name: baseName + "Whole" + CrossBenchmark.nsSuffix,
-          regex: nsRegex,
-          type: .first,
-          target: input))
-    } else {
-      runner.register(
-        Benchmark(
-          name: baseName + "All",
-          regex: swiftRegex,
-          pattern: regex,
-          type: .allMatches,
-          target: input))
-      runner.register(
-        NSBenchmark(
-          name: baseName + "All" + CrossBenchmark.nsSuffix,
-          regex: nsRegex,
-          type: .allMatches,
-          target: input))
       if includeFirst || runner.includeFirstOverride {
-        runner.register(
-          Benchmark(
-            name: baseName + "First",
-            regex: swiftRegex,
-            pattern: regex,
-            type: .first,
-            target: input))
-        runner.register(
-          NSBenchmark(
-            name: baseName + "First" + CrossBenchmark.nsSuffix,
-            regex: nsRegex,
-            type: .first,
-            target: input))
+        runner.registerCrossBenchmark(
+          nameBase: baseName,
+          input: input,
+          pattern: regex,
+          .first,
+          alsoRunScalarSemantic: alsoRunScalarSemantic)
       }
     }
   }
@@ -209,20 +191,16 @@ struct CrossInputListBenchmark {
 
   /// The list of strings to search
   var inputs: [String]
+
+  /// Also run in scalar-semantic mode
+  var alsoRunScalarSemantic: Bool = true
   
   func register(_ runner: inout BenchmarkRunner) {
-    let swiftRegex = try! Regex(regex)
-    runner.register(InputListBenchmark(
+    runner.registerCrossBenchmark(
       name: baseName,
-      regex: swiftRegex,
+      inputList: inputs,
       pattern: regex,
-      targets: inputs
-    ))
-    runner.register(InputListNSBenchmark(
-      name: baseName + CrossBenchmark.nsSuffix,
-      regex: regex,
-      targets: inputs
-    ))
+      alsoRunScalarSemantic: alsoRunScalarSemantic)
   }
 }
 
diff --git a/Sources/RegexBenchmark/BenchmarkRunner.swift b/Sources/RegexBenchmark/BenchmarkRunner.swift
index 641c03224..b067b9679 100644
--- a/Sources/RegexBenchmark/BenchmarkRunner.swift
+++ b/Sources/RegexBenchmark/BenchmarkRunner.swift
@@ -4,6 +4,16 @@ import Foundation
 /// The number of times to re-run the benchmark if results are too varying
 private var rerunCount: Int { 3 }
 
+extension Benchmark.MatchType {
+  fileprivate var nameSuffix: String {
+    switch self {
+    case .whole: return "_Whole"
+    case .first: return "_First"
+    case .allMatches: return "_All"
+    }
+  }
+}
+
 struct BenchmarkRunner {
   let suiteName: String
   var suite: [any RegexBenchmark] = []
@@ -16,12 +26,141 @@ struct BenchmarkRunner {
   
   // Forcibly include firstMatch benchmarks for all CrossBenchmarks
   let includeFirstOverride: Bool
+
+  // Register a cross-benchmark
+  mutating func registerCrossBenchmark(
+    nameBase: String,
+    input: String,
+    pattern: String,
+    _ type: Benchmark.MatchType,
+    alsoRunScalarSemantic: Bool = true
+  ) {
+    let swiftRegex = try! Regex(pattern)
+    let nsRegex: NSRegularExpression
+    if type == .whole {
+      nsRegex = try! NSRegularExpression(pattern: "^" + pattern + "$")
+    } else {
+      nsRegex = try! NSRegularExpression(pattern: pattern)
+    }
+    let nameSuffix = type.nameSuffix
+
+    register(
+      Benchmark(
+        name: nameBase + nameSuffix,
+        regex: swiftRegex,
+        pattern: pattern,
+        type: type,
+        target: input))
+    register(
+      NSBenchmark(
+        name: nameBase + nameSuffix + CrossBenchmark.nsSuffix,
+        regex: nsRegex,
+        type: .init(type),
+        target: input))
+
+    if alsoRunScalarSemantic {
+      register(
+        Benchmark(
+          name: nameBase + nameSuffix + "_Scalar",
+          regex: swiftRegex.matchingSemantics(.unicodeScalar),
+          pattern: pattern,
+          type: type,
+          target: input))
+      register(
+        NSBenchmark(
+          name: nameBase + nameSuffix + "_Scalar" + CrossBenchmark.nsSuffix,
+          regex: nsRegex,
+          type: .init(type),
+          target: input))
+    }
+  }
+
+  // Register a cross-benchmark list
+  mutating func registerCrossBenchmark(
+    name: String,
+    inputList: [String],
+    pattern: String,
+    alsoRunScalarSemantic: Bool = true
+  ) {
+    let swiftRegex = try! Regex(pattern)
+    register(InputListBenchmark(
+      name: name,
+      regex: swiftRegex,
+      pattern: pattern,
+      targets: inputList
+    ))
+    register(InputListNSBenchmark(
+      name: name + CrossBenchmark.nsSuffix,
+      regex: pattern,
+      targets: inputList
+    ))
+
+    if alsoRunScalarSemantic {
+      register(InputListBenchmark(
+        name: name + "_Scalar",
+        regex: swiftRegex.matchingSemantics(.unicodeScalar),
+        pattern: pattern,
+        targets: inputList
+      ))
+      register(InputListNSBenchmark(
+        name: name + "_Scalar" + CrossBenchmark.nsSuffix,
+        regex: pattern,
+        targets: inputList
+      ))
+    }
+
+  }
+
+  // Register a swift-only benchmark
+  mutating func register(
+    nameBase: String,
+    input: String,
+    pattern: String,
+    _ swiftRegex: Regex<AnyRegexOutput>,
+    _ type: Benchmark.MatchType,
+    alsoRunScalarSemantic: Bool = true
+  ) {
+    let nameSuffix = type.nameSuffix
+
+    register(
+      Benchmark(
+        name: nameBase + nameSuffix,
+        regex: swiftRegex,
+        pattern: pattern,
+        type: type,
+        target: input))
+
+    if alsoRunScalarSemantic {
+      register(
+        Benchmark(
+          name: nameBase + nameSuffix + "_Scalar",
+          regex: swiftRegex,
+          pattern: pattern,
+          type: type,
+          target: input))
+    }
+  }
   
-  mutating func register(_ benchmark: some RegexBenchmark) {
+  private mutating func register(_ benchmark: NSBenchmark) {
     suite.append(benchmark)
   }
   
-  mutating func register(_ benchmark: some SwiftRegexBenchmark) {
+  private mutating func register(_ benchmark: Benchmark) {
+    var benchmark = benchmark
+    if enableTracing {
+      benchmark.enableTracing()
+    }
+    if enableMetrics {
+      benchmark.enableMetrics()
+    }
+    suite.append(benchmark)
+  }
+
+  private mutating func register(_ benchmark: InputListNSBenchmark) {
+    suite.append(benchmark)
+  }
+
+  private mutating func register(_ benchmark: InputListBenchmark) {
     var benchmark = benchmark
     if enableTracing {
       benchmark.enableTracing()
diff --git a/Sources/RegexBenchmark/Suite/CustomCharacterClasses.swift b/Sources/RegexBenchmark/Suite/CustomCharacterClasses.swift
index 61d7b197f..27b2b07b4 100644
--- a/Sources/RegexBenchmark/Suite/CustomCharacterClasses.swift
+++ b/Sources/RegexBenchmark/Suite/CustomCharacterClasses.swift
@@ -12,53 +12,55 @@ extension BenchmarkRunner {
     
     let input = Inputs.graphemeBreakData
     
-    register(Benchmark(
-      name: "BasicCCC",
-      regex: try! Regex(basic),
+    // TODO: Which of these can be cross-benchmarks?
+
+    register(
+      nameBase: "BasicCCC",
+      input: input,
       pattern: basic,
-      type: .allMatches,
-      target: input))
+      try! Regex(basic),
+      .allMatches)
     
-    register(Benchmark(
-      name: "BasicRangeCCC",
-      regex: try! Regex(basicRange),
+    register(
+      nameBase: "BasicRangeCCC",
+      input: input,
       pattern: basicRange,
-      type: .allMatches,
-      target: input))
+      try! Regex(basicRange),
+      .allMatches)
     
-    register(Benchmark(
-      name: "CaseInsensitiveCCC",
-      regex: try! Regex(caseInsensitive),
+    register(
+      nameBase: "CaseInsensitiveCCC",
+      input: input,
       pattern: caseInsensitive,
-      type: .allMatches,
-      target: input))
+      try! Regex(caseInsensitive),
+      .allMatches)
     
-    register(Benchmark(
-      name: "InvertedCCC",
-      regex: try! Regex(inverted),
+    register(
+      nameBase: "InvertedCCC",
+      input: input,
       pattern: inverted,
-      type: .allMatches,
-      target: input))
+      try! Regex(inverted),
+      .allMatches)
     
-    register(Benchmark(
-      name: "SubtractionCCC",
-      regex: try! Regex(subtraction),
+    register(
+      nameBase: "SubtractionCCC",
+      input: input,
       pattern: subtraction,
-      type: .allMatches,
-      target: input))
+      try! Regex(subtraction),
+      .allMatches)
     
-    register(Benchmark(
-      name: "IntersectionCCC",
-      regex: try! Regex(intersection),
+    register(
+      nameBase: "IntersectionCCC",
+      input: input,
       pattern: intersection,
-      type: .allMatches,
-      target: input))
+      try! Regex(intersection),
+      .allMatches)
     
-    register(Benchmark(
-      name: "symDiffCCC",
-      regex: try! Regex(symmetricDifference),
+    register(
+      nameBase: "symDiffCCC",
+      input: input,
       pattern: symmetricDifference,
-      type: .allMatches,
-      target: input))
+      try! Regex(symmetricDifference),
+      .allMatches)
   }
 }
diff --git a/Sources/RegexBenchmark/Utils/Stats.swift b/Sources/RegexBenchmark/Utils/Stats.swift
index 0cc9156a4..175826a0b 100644
--- a/Sources/RegexBenchmark/Utils/Stats.swift
+++ b/Sources/RegexBenchmark/Utils/Stats.swift
@@ -3,8 +3,8 @@ import Foundation
 enum Stats {}
 
 extension Stats {
-  // Maximum allowed standard deviation is 5% of the median runtime
-  static let maxAllowedStdev = 0.05
+  // Maximum allowed standard deviation is 7.5% of the median runtime
+  static let maxAllowedStdev = 0.075
 
   static func tTest(_ a: Measurement, _ b: Measurement) -> Bool {
     // Student's t-test
diff --git a/Sources/_RegexParser/Utility/TypeConstruction.swift b/Sources/_RegexParser/Utility/TypeConstruction.swift
index 4d1765e34..54a7c9263 100644
--- a/Sources/_RegexParser/Utility/TypeConstruction.swift
+++ b/Sources/_RegexParser/Utility/TypeConstruction.swift
@@ -107,15 +107,15 @@ public enum TypeConstruction {
       var currentElementAddressUnaligned = UnsafeMutableRawPointer(baseAddress)
       for element in elements {
         // Open existential on each element type.
-        func initializeElement<T>(_ element: T) {
+        func initializeElement<U>(_ element: U) {
           currentElementAddressUnaligned =
-            currentElementAddressUnaligned.roundedUp(toAlignmentOf: T.self)
+            currentElementAddressUnaligned.roundedUp(toAlignmentOf: U.self)
           currentElementAddressUnaligned.bindMemory(
-            to: T.self, capacity: MemoryLayout<T>.size
+            to: U.self, capacity: MemoryLayout<U>.size
           ).initialize(to: element)
           // Advance to the next element (unaligned).
           currentElementAddressUnaligned =
-            currentElementAddressUnaligned.advanced(by: MemoryLayout<T>.size)
+            currentElementAddressUnaligned.advanced(by: MemoryLayout<U>.size)
         }
         _openExistential(element, do: initializeElement)
       }
@@ -175,8 +175,8 @@ extension MemoryLayout {
     if byteOffset == 0 { return 0 }
     var currentOffset = 0
     for (index, type) in elementTypes.enumerated() {
-      func sizeAndAlignMask<T>(_: T.Type) -> (Int, Int) {
-        (MemoryLayout<T>.size, MemoryLayout<T>.alignment - 1)
+      func sizeAndAlignMask<U>(_: U.Type) -> (Int, Int) {
+        (MemoryLayout<U>.size, MemoryLayout<U>.alignment - 1)
       }
       // The ABI of an offset-based key path only stores the byte offset, so
       // this doesn't work if there's a 0-sized element, e.g. `Void`,
diff --git a/Sources/_StringProcessing/Algorithms/Algorithms/FirstRange.swift b/Sources/_StringProcessing/Algorithms/Algorithms/FirstRange.swift
index 484ff4648..d0fb8673d 100644
--- a/Sources/_StringProcessing/Algorithms/Algorithms/FirstRange.swift
+++ b/Sources/_StringProcessing/Algorithms/Algorithms/FirstRange.swift
@@ -50,11 +50,8 @@ extension BidirectionalCollection where Element: Comparable {
   public func firstRange<C: Collection>(
     of other: C
   ) -> Range<Index>? where C.Element == Element {
-    let searcher = PatternOrEmpty(
-      searcher: TwoWaySearcher<SubSequence>(pattern: Array(other)))
-    let slice = self[...]
-    var state = searcher.state(for: slice, in: startIndex..<endIndex)
-    return searcher.search(slice, &state)
+    let searcher = ZSearcher<SubSequence>(pattern: Array(other), by: ==)
+    return searcher.search(self[...], in: startIndex..<endIndex)
   }
 }
 
diff --git a/Sources/_StringProcessing/Algorithms/Algorithms/Split.swift b/Sources/_StringProcessing/Algorithms/Algorithms/Split.swift
index 6634d2945..5e7ba37b0 100644
--- a/Sources/_StringProcessing/Algorithms/Algorithms/Split.swift
+++ b/Sources/_StringProcessing/Algorithms/Algorithms/Split.swift
@@ -122,31 +122,6 @@ extension Collection {
   }
 }
 
-// MARK: Predicate algorithms
-
-extension Collection {
-  // TODO: Non-escaping and throwing
-  func split(
-    whereSeparator predicate: @escaping (Element) -> Bool,
-    maxSplits: Int,
-    omittingEmptySubsequences: Bool
-  ) -> SplitCollection<PredicateConsumer<Self>> {
-    split(by: PredicateConsumer(predicate: predicate), maxSplits: maxSplits, omittingEmptySubsequences: omittingEmptySubsequences)
-  }
-}
-
-// MARK: Single element algorithms
-
-extension Collection where Element: Equatable {
-  func split(
-    by separator: Element,
-    maxSplits: Int,
-    omittingEmptySubsequences: Bool
-  ) -> SplitCollection<PredicateConsumer<Self>> {
-    split(whereSeparator: { $0 == separator }, maxSplits: maxSplits, omittingEmptySubsequences: omittingEmptySubsequences)
-  }
-}
-
 // MARK: Fixed pattern algorithms
 
 extension Collection where Element: Equatable {
@@ -180,41 +155,6 @@ extension Collection where Element: Equatable {
   }
 }
 
-extension BidirectionalCollection where Element: Equatable {
-  // FIXME
-//  public func splitFromBack<S: Sequence>(
-//    separator: S
-//  ) -> ReversedSplitCollection<ZSearcher<SubSequence>>
-//    where S.Element == Element
-//  {
-//    splitFromBack(separator: ZSearcher(pattern: Array(separator), by: ==))
-//  }
-}
-
-extension BidirectionalCollection where Element: Comparable {
-  func split<C: Collection>(
-    by separator: C,
-    maxSplits: Int,
-    omittingEmptySubsequences: Bool
-  ) -> SplitCollection<PatternOrEmpty<TwoWaySearcher<Self>>>
-    where C.Element == Element
-  {
-    split(
-      by: PatternOrEmpty(searcher: TwoWaySearcher(pattern: Array(separator))),
-    maxSplits: maxSplits, omittingEmptySubsequences: omittingEmptySubsequences)
-  }
-  
-  // FIXME
-//  public func splitFromBack<S: Sequence>(
-//    separator: S
-//  ) -> ReversedSplitCollection<PatternOrEmpty<TwoWaySearcher<SubSequence>>>
-//    where S.Element == Element
-//  {
-//    splitFromBack(separator: PatternOrEmpty(
-//      searcher: TwoWaySearcher(pattern: Array(separator))))
-//  }
-}
-
 // String split overload breakers
 //
 // These are underscored and marked as SPI so that the *actual* public overloads
diff --git a/Sources/_StringProcessing/Algorithms/Searchers/TwoWaySearcher.swift b/Sources/_StringProcessing/Algorithms/Searchers/TwoWaySearcher.swift
deleted file mode 100644
index 2428f89cd..000000000
--- a/Sources/_StringProcessing/Algorithms/Searchers/TwoWaySearcher.swift
+++ /dev/null
@@ -1,197 +0,0 @@
-//===----------------------------------------------------------------------===//
-//
-// This source file is part of the Swift.org open source project
-//
-// Copyright (c) 2021-2022 Apple Inc. and the Swift project authors
-// Licensed under Apache License v2.0 with Runtime Library Exception
-//
-// See https://swift.org/LICENSE.txt for license information
-//
-//===----------------------------------------------------------------------===//
-
-extension BidirectionalCollection where Element: Equatable {
-   fileprivate func _ends<C: BidirectionalCollection>(with suffix: C) -> Bool
-     where C.Element == Element
-   {
-     FixedPatternConsumer(pattern: suffix).consumingBack(self[...]) != nil
-   }
- }
-
-struct TwoWaySearcher<Searched: BidirectionalCollection>
-  where Searched.Element: Comparable
-{
-  // TODO: Be generic over the pattern?
-  let pattern: [Searched.Element]
-  let criticalIndex: Int
-  let period: Int
-  let periodIsExact: Bool
-  
-  init?(pattern: [Searched.Element]) {
-    guard !pattern.isEmpty else { return nil }
-    
-    let (criticalIndex, periodOfSecondPart) = pattern._criticalFactorization(<)
-    let periodIsExact = pattern[criticalIndex...]
-      .prefix(periodOfSecondPart)
-      ._ends(with: pattern[..<criticalIndex])
-    
-    self.pattern = pattern
-    self.criticalIndex = criticalIndex
-    self.period = periodIsExact
-      ? periodOfSecondPart
-      : max(criticalIndex, pattern.count - criticalIndex) + 1
-    self.periodIsExact = periodIsExact
-  }
-}
-
-extension TwoWaySearcher: CollectionSearcher {
-  struct State {
-    let end: Searched.Index
-    var index: Searched.Index
-    var criticalIndex: Searched.Index
-    var memory: (offset: Int, index: Searched.Index)?
-  }
-  
-  func state(
-    for searched: Searched,
-    in range: Range<Searched.Index>
-  ) -> State {
-    // FIXME: Is this 'limitedBy' requirement a sign of error?
-    let criticalIndex = searched.index(
-      range.lowerBound, offsetBy: criticalIndex, limitedBy: range.upperBound)
-      ?? range.upperBound
-    return State(
-      end: range.upperBound,
-      index: range.lowerBound,
-      criticalIndex:
-        criticalIndex,
-      memory: nil)
-  }
-
-  func search(
-    _ searched: Searched,
-    _ state: inout State
-  ) -> Range<Searched.Index>? {
-    while state.criticalIndex != searched.endIndex {
-      if let end = _searchRight(searched, &state),
-         let start = _searchLeft(searched, &state, end)
-      {
-        state.index = end
-        // FIXME: Is this 'limitedBy' requirement a sign of error?
-        state.criticalIndex = searched.index(
-          end, offsetBy: criticalIndex, limitedBy: searched.endIndex)
-          ?? searched.endIndex
-        state.memory = nil
-        return start..<end
-      }
-    }
-    
-    return nil
-  }
-  
-  func _searchRight(
-    _ searched: Searched,
-    _ state: inout State
-  ) -> Searched.Index? {
-    let rStart: Int
-    var rIndex: Searched.Index
-    
-    if let memory = state.memory, memory.offset > criticalIndex {
-      rStart = memory.offset
-      rIndex = memory.index
-    } else {
-      rStart = criticalIndex
-      rIndex = state.criticalIndex
-    }
-    
-    for i in rStart..<pattern.count {
-      if rIndex == searched.endIndex {
-        state.criticalIndex = searched.endIndex
-        return nil
-      }
-      
-      if pattern[i] != searched[rIndex] {
-        state.criticalIndex = searched.index(after: rIndex)
-        state.memory = nil
-        return nil
-      }
-      
-      searched.formIndex(after: &rIndex)
-    }
-    
-    return rIndex
-  }
-  
-  func _searchLeft(
-    _ searched: Searched,
-    _ state: inout State,
-    _ end: Searched.Index
-  ) -> Searched.Index? {
-    let lStart = min(state.memory?.offset ?? 0, criticalIndex)
-    var lIndex = state.criticalIndex
-    
-    for i in (lStart..<criticalIndex).reversed() {
-      searched.formIndex(before: &lIndex)
-      
-      if pattern[i] != searched[lIndex] {
-        _ = searched.formIndex(
-          &state.criticalIndex,
-          offsetBy: period,
-          limitedBy: searched.endIndex)
-        if periodIsExact { state.memory = (pattern.count - period, end) }
-        return nil
-      }
-    }
-    
-    return searched.index(lIndex, offsetBy: -lStart)
-  }
-}
-
-// TODO: implement BackwardCollectionSearcher
-
-extension Array {
-  func _criticalFactorization(
-    _ isOrderedBefore: (Element, Element) -> Bool
-  ) -> (index: Int, periodOfSecondPart: Int) {
-    let less = _maximalSuffix(isOrderedBefore)
-    let greater = _maximalSuffix({ isOrderedBefore($1, $0) })
-    return less.index > greater.index ? less : greater
-  }
-  
-  func _maximalSuffix(
-    _ isOrderedBefore: (Element, Element) -> Bool
-  ) -> (index: Int, periodOfSecondPart: Int) {
-    var left = 0
-    var right = 1
-    var offset = 0
-    var period = 1
-    
-    while right + offset < count {
-      let a = self[right + offset]
-      let b = self[left + offset]
-      
-      if isOrderedBefore(a, b) {
-        // Suffix is smaller, period is entire prefix so far.
-        right += offset + 1
-        offset = 0
-        period = right - left
-      } else if isOrderedBefore(b, a) {
-        // Suffix is larger, start over from current location.
-        left = right
-        right += 1
-        offset = 0
-        period = 1
-      } else {
-        // Advance through repetition of the current period.
-        offset += 1
-        if offset + 1 == period {
-          right += offset + 1
-          offset = 0
-        } else {
-          offset += 1
-        }
-      }
-    }
-    
-    return (left, period)
-  }
-}
diff --git a/Sources/_StringProcessing/ByteCodeGen.swift b/Sources/_StringProcessing/ByteCodeGen.swift
index d4c91bd63..00ce0d5f6 100644
--- a/Sources/_StringProcessing/ByteCodeGen.swift
+++ b/Sources/_StringProcessing/ByteCodeGen.swift
@@ -23,7 +23,9 @@ extension Compiler {
     var hasEmittedFirstMatchableAtom = false
 
     private let compileOptions: _CompileOptions
-    fileprivate var optimizationsEnabled: Bool { !compileOptions.contains(.disableOptimizations) }
+    fileprivate var optimizationsEnabled: Bool {
+      !compileOptions.contains(.disableOptimizations)
+    }
 
     init(
       options: MatchingOptions,
@@ -665,10 +667,10 @@ fileprivate extension Compiler.ByteCodeGen {
     _ minTrips: Int,
     _ extraTrips: Int?
   ) -> Bool {
+    let isScalarSemantics = options.semanticLevel == .unicodeScalar
     guard optimizationsEnabled
             && minTrips <= QuantifyPayload.maxStorableTrips
             && extraTrips ?? 0 <= QuantifyPayload.maxStorableTrips
-            && options.semanticLevel == .graphemeCluster
             && kind != .reluctant else {
       return false
     }
@@ -678,7 +680,7 @@ fileprivate extension Compiler.ByteCodeGen {
       guard let bitset = ccc.asAsciiBitset(options) else {
         return false
       }
-      builder.buildQuantify(bitset: bitset, kind, minTrips, extraTrips)
+      builder.buildQuantify(bitset: bitset, kind, minTrips, extraTrips, isScalarSemantics: isScalarSemantics)
 
     case .atom(let atom):
       switch atom {
@@ -687,17 +689,17 @@ fileprivate extension Compiler.ByteCodeGen {
         guard let val = c._singleScalarAsciiValue else {
           return false
         }
-        builder.buildQuantify(asciiChar: val, kind, minTrips, extraTrips)
+        builder.buildQuantify(asciiChar: val, kind, minTrips, extraTrips, isScalarSemantics: isScalarSemantics)
 
       case .any:
         builder.buildQuantifyAny(
-          matchesNewlines: true, kind, minTrips, extraTrips)
+          matchesNewlines: true, kind, minTrips, extraTrips, isScalarSemantics: isScalarSemantics)
       case .anyNonNewline:
         builder.buildQuantifyAny(
-          matchesNewlines: false, kind, minTrips, extraTrips)
+          matchesNewlines: false, kind, minTrips, extraTrips, isScalarSemantics: isScalarSemantics)
       case .dot:
         builder.buildQuantifyAny(
-          matchesNewlines: options.dotMatchesNewline, kind, minTrips, extraTrips)
+          matchesNewlines: options.dotMatchesNewline, kind, minTrips, extraTrips, isScalarSemantics: isScalarSemantics)
 
       case .characterClass(let cc):
         // Custom character class that consumes a single grapheme
@@ -706,7 +708,8 @@ fileprivate extension Compiler.ByteCodeGen {
           model: model,
           kind,
           minTrips,
-          extraTrips)
+          extraTrips,
+          isScalarSemantics: isScalarSemantics)
       default:
         return false
       }
diff --git a/Sources/_StringProcessing/CMakeLists.txt b/Sources/_StringProcessing/CMakeLists.txt
index ef4aeb6ef..f7ccd7ab9 100644
--- a/Sources/_StringProcessing/CMakeLists.txt
+++ b/Sources/_StringProcessing/CMakeLists.txt
@@ -23,7 +23,6 @@ add_library(_StringProcessing
   Algorithms/Searchers/NaivePatternSearcher.swift
   Algorithms/Searchers/PatternOrEmpty.swift
   Algorithms/Searchers/PredicateSearcher.swift
-  Algorithms/Searchers/TwoWaySearcher.swift
   Algorithms/Searchers/ZSearcher.swift
   Engine/Backtracking.swift
   Engine/Consume.swift
diff --git a/Sources/_StringProcessing/Engine/Backtracking.swift b/Sources/_StringProcessing/Engine/Backtracking.swift
index 3ebb060c9..48470ce91 100644
--- a/Sources/_StringProcessing/Engine/Backtracking.swift
+++ b/Sources/_StringProcessing/Engine/Backtracking.swift
@@ -16,6 +16,11 @@ extension Processor {
     // Quantifiers may store a range of positions to restore to
     var rangeStart: Position?
     var rangeEnd: Position?
+
+    // FIXME: refactor, for now this field is only used for quantifier save
+    //        points. We should try to separate out the concerns better.
+    var isScalarSemantics: Bool
+
     // The end of the call stack, so we can slice it off
     // when failing inside a call.
     //
@@ -68,7 +73,11 @@ extension Processor {
         rangeStart = nil
         rangeEnd = nil
       } else {
-        input.formIndex(before: &rangeEnd!)
+        if isScalarSemantics {
+          input.unicodeScalars.formIndex(before: &rangeEnd!)
+        } else {
+          input.formIndex(before: &rangeEnd!)
+        }
       }
     }
   }
@@ -82,19 +91,23 @@ extension Processor {
       pos: addressOnly ? nil : currentPosition,
       rangeStart: nil,
       rangeEnd: nil,
+      isScalarSemantics: false, // FIXME: refactor away
       stackEnd: .init(callStack.count),
       captureEnds: storedCaptures,
       intRegisters: registers.ints,
       posRegisters: registers.positions)
   }
   
-  func startQuantifierSavePoint() -> SavePoint {
+  func startQuantifierSavePoint(
+    isScalarSemantics: Bool
+  ) -> SavePoint {
     // Restores to the instruction AFTER the current quantifier instruction
     SavePoint(
       pc: controller.pc + 1,
       pos: nil,
       rangeStart: nil,
       rangeEnd: nil,
+      isScalarSemantics: isScalarSemantics,
       stackEnd: .init(callStack.count),
       captureEnds: storedCaptures,
       intRegisters: registers.ints,
diff --git a/Sources/_StringProcessing/Engine/InstPayload.swift b/Sources/_StringProcessing/Engine/InstPayload.swift
index f6d5bfcc7..a0e849851 100644
--- a/Sources/_StringProcessing/Engine/InstPayload.swift
+++ b/Sources/_StringProcessing/Engine/InstPayload.swift
@@ -370,6 +370,10 @@ extension Instruction.Payload {
   }
 }
 
+// TODO: Consider switching all quantification to a quantification
+//       instruction, where the general path has an instruction list (i.e. a
+//       slice of a list)
+
 // MARK: Struct definitions
 struct QuantifyPayload: RawRepresentable {
   let rawValue: UInt64
@@ -380,9 +384,12 @@ struct QuantifyPayload: RawRepresentable {
     case builtin = 4
   }
 
+  // TODO: figure out how to better organize this...
+
   // Future work: optimize this layout -> payload type should be a fast switch
   // The top 8 bits are reserved for the opcode so we have 56 bits to work with
-  // b55-b38 - Unused
+  // b55-b39 - Unused
+  // b39-b38 - isScalarSemantics
   // b38-b35 - Payload type (one of 4 types, stored on 3 bits)
   // b35-b27 - minTrips (8 bit int)
   // b27-b18 - extraTrips (8 bit value, one bit for nil)
@@ -393,6 +400,7 @@ struct QuantifyPayload: RawRepresentable {
   static var minTripsShift: UInt64    { 27 }
   static var typeShift: UInt64        { 35 }
   static var maxStorableTrips: UInt64 { (1 << 8) - 1 }
+  static var isScalarSemanticsBit: UInt64 { 1 &<< 38 }
 
   var quantKindMask: UInt64  { 3 }
   var extraTripsMask: UInt64 { 0x1FF }
@@ -404,7 +412,8 @@ struct QuantifyPayload: RawRepresentable {
     _ kind: AST.Quantification.Kind,
     _ minTrips: Int,
     _ extraTrips: Int?,
-    _ type: PayloadType
+    _ type: PayloadType,
+    isScalarSemantics: Bool
   ) -> UInt64 {
     let kindVal: UInt64
     switch kind {
@@ -415,11 +424,14 @@ struct QuantifyPayload: RawRepresentable {
     case .possessive:
       kindVal = 2
     }
+    // TODO: refactor / reimplement
     let extraTripsVal: UInt64 = extraTrips == nil ? 1 : UInt64(extraTrips!) << 1
-    return (kindVal << QuantifyPayload.quantKindShift) +
-    (extraTripsVal << QuantifyPayload.extraTripsShift) +
-    (UInt64(minTrips) << QuantifyPayload.minTripsShift) +
-    (type.rawValue << QuantifyPayload.typeShift)
+    let scalarSemanticsBit = isScalarSemantics ? Self.isScalarSemanticsBit : 0
+    return (kindVal << QuantifyPayload.quantKindShift) |
+    (extraTripsVal << QuantifyPayload.extraTripsShift) |
+    (UInt64(minTrips) << QuantifyPayload.minTripsShift) |
+    (type.rawValue << QuantifyPayload.typeShift) |
+    scalarSemanticsBit
   }
 
   init(rawValue: UInt64) {
@@ -431,46 +443,49 @@ struct QuantifyPayload: RawRepresentable {
     bitset: AsciiBitsetRegister,
     _ kind: AST.Quantification.Kind,
     _ minTrips: Int,
-    _ extraTrips: Int?
+    _ extraTrips: Int?,
+    isScalarSemantics: Bool
   ) {
     assert(bitset.bits <= _payloadMask)
     self.rawValue = bitset.bits
-      + QuantifyPayload.packInfoValues(kind, minTrips, extraTrips, .bitset)
+      + QuantifyPayload.packInfoValues(kind, minTrips, extraTrips, .bitset, isScalarSemantics: isScalarSemantics)
   }
 
   init(
     asciiChar: UInt8,
     _ kind: AST.Quantification.Kind,
     _ minTrips: Int,
-    _ extraTrips: Int?
+    _ extraTrips: Int?,
+    isScalarSemantics: Bool
   ) {
     self.rawValue = UInt64(asciiChar)
-      + QuantifyPayload.packInfoValues(kind, minTrips, extraTrips, .asciiChar)
+      + QuantifyPayload.packInfoValues(kind, minTrips, extraTrips, .asciiChar, isScalarSemantics: isScalarSemantics)
   }
 
   init(
     matchesNewlines: Bool,
     _ kind: AST.Quantification.Kind,
     _ minTrips: Int,
-    _ extraTrips: Int?
+    _ extraTrips: Int?,
+    isScalarSemantics: Bool
   ) {
     self.rawValue = (matchesNewlines ? 1 : 0)
-      + QuantifyPayload.packInfoValues(kind, minTrips, extraTrips, .any)
+      + QuantifyPayload.packInfoValues(kind, minTrips, extraTrips, .any, isScalarSemantics: isScalarSemantics)
   }
 
   init(
     model: _CharacterClassModel,
     _ kind: AST.Quantification.Kind,
     _ minTrips: Int,
-    _ extraTrips: Int?
+    _ extraTrips: Int?,
+    isScalarSemantics: Bool
   ) {
     assert(model.cc.rawValue < 0xFF)
-    assert(model.matchLevel != .unicodeScalar)
     let packedModel = model.cc.rawValue
       + (model.isInverted ? 1 << 9 : 0)
       + (model.isStrictASCII ? 1 << 10 : 0)
     self.rawValue = packedModel
-      + QuantifyPayload.packInfoValues(kind, minTrips, extraTrips, .builtin)
+      + QuantifyPayload.packInfoValues(kind, minTrips, extraTrips, .builtin, isScalarSemantics: isScalarSemantics)
   }
 
   var type: PayloadType {
@@ -500,6 +515,10 @@ struct QuantifyPayload: RawRepresentable {
     }
   }
 
+  var isScalarSemantics: Bool {
+    rawValue & Self.isScalarSemanticsBit != 0
+  }
+
   var bitset: AsciiBitsetRegister {
     TypedInt(self.rawValue & payloadMask)
   }
diff --git a/Sources/_StringProcessing/Engine/MEBuilder.swift b/Sources/_StringProcessing/Engine/MEBuilder.swift
index 4b623fbda..93801aeec 100644
--- a/Sources/_StringProcessing/Engine/MEBuilder.swift
+++ b/Sources/_StringProcessing/Engine/MEBuilder.swift
@@ -222,44 +222,48 @@ extension MEProgram.Builder {
     bitset: DSLTree.CustomCharacterClass.AsciiBitset,
     _ kind: AST.Quantification.Kind,
     _ minTrips: Int,
-    _ extraTrips: Int?
+    _ extraTrips: Int?,
+    isScalarSemantics: Bool
   ) {
     instructions.append(.init(
       .quantify,
-      .init(quantify: .init(bitset: makeAsciiBitset(bitset), kind, minTrips, extraTrips))))
+      .init(quantify: .init(bitset: makeAsciiBitset(bitset), kind, minTrips, extraTrips, isScalarSemantics: isScalarSemantics))))
   }
 
   mutating func buildQuantify(
     asciiChar: UInt8,
     _ kind: AST.Quantification.Kind,
     _ minTrips: Int,
-    _ extraTrips: Int?
+    _ extraTrips: Int?,
+    isScalarSemantics: Bool
   ) {
     instructions.append(.init(
       .quantify,
-      .init(quantify: .init(asciiChar: asciiChar, kind, minTrips, extraTrips))))
+      .init(quantify: .init(asciiChar: asciiChar, kind, minTrips, extraTrips, isScalarSemantics: isScalarSemantics))))
   }
 
   mutating func buildQuantifyAny(
     matchesNewlines: Bool,
     _ kind: AST.Quantification.Kind,
     _ minTrips: Int,
-    _ extraTrips: Int?
+    _ extraTrips: Int?,
+    isScalarSemantics: Bool
   ) {
     instructions.append(.init(
       .quantify,
-      .init(quantify: .init(matchesNewlines: matchesNewlines, kind, minTrips, extraTrips))))
+      .init(quantify: .init(matchesNewlines: matchesNewlines, kind, minTrips, extraTrips, isScalarSemantics: isScalarSemantics))))
   }
 
   mutating func buildQuantify(
     model: _CharacterClassModel,
     _ kind: AST.Quantification.Kind,
     _ minTrips: Int,
-    _ extraTrips: Int?
+    _ extraTrips: Int?,
+    isScalarSemantics: Bool
   ) {
     instructions.append(.init(
       .quantify,
-      .init(quantify: .init(model: model,kind, minTrips, extraTrips))))
+      .init(quantify: .init(model: model,kind, minTrips, extraTrips, isScalarSemantics: isScalarSemantics))))
   }
 
   mutating func buildAccept() {
diff --git a/Sources/_StringProcessing/Engine/MEBuiltins.swift b/Sources/_StringProcessing/Engine/MEBuiltins.swift
index a3d864165..5e446b472 100644
--- a/Sources/_StringProcessing/Engine/MEBuiltins.swift
+++ b/Sources/_StringProcessing/Engine/MEBuiltins.swift
@@ -15,7 +15,7 @@ extension Processor {
     isStrictASCII: Bool,
     isScalarSemantics: Bool
   ) -> Bool {
-    guard let next = input._matchBuiltinCC(
+    guard let next = input.matchBuiltinCC(
       cc,
       at: currentPosition,
       isInverted: isInverted,
@@ -30,6 +30,7 @@ extension Processor {
   }
 
   func isAtStartOfLine(_ payload: AssertionPayload) -> Bool {
+    // TODO: needs benchmark coverage
     if currentPosition == subjectBounds.lowerBound { return true }
     switch payload.semanticLevel {
     case .graphemeCluster:
@@ -40,6 +41,7 @@ extension Processor {
   }
 
   func isAtEndOfLine(_ payload: AssertionPayload) -> Bool {
+    // TODO: needs benchmark coverage
     if currentPosition == subjectBounds.upperBound { return true }
     switch payload.semanticLevel {
     case .graphemeCluster:
@@ -50,6 +52,8 @@ extension Processor {
   }
 
   mutating func builtinAssert(by payload: AssertionPayload) throws -> Bool {
+    // TODO: needs benchmark coverage
+
     // Future work: Optimize layout and dispatch
     switch payload.kind {
     case .startOfSubject: return currentPosition == subjectBounds.lowerBound
@@ -59,10 +63,10 @@ extension Processor {
       switch payload.semanticLevel {
       case .graphemeCluster:
         return input.index(after: currentPosition) == subjectBounds.upperBound
-         && input[currentPosition].isNewline
+        && input[currentPosition].isNewline
       case .unicodeScalar:
         return input.unicodeScalars.index(after: currentPosition) == subjectBounds.upperBound
-         && input.unicodeScalars[currentPosition].isNewline
+        && input.unicodeScalars[currentPosition].isNewline
       }
 
     case .endOfSubject: return currentPosition == subjectBounds.upperBound
@@ -115,12 +119,75 @@ extension Processor {
   }
 }
 
-// MARK: Built-in character class matching
+// MARK: Matching `.`
+extension String {
+  // TODO: Should the below have a `limitedBy` parameter?
+
+  func matchAnyNonNewline(
+    at currentPosition: String.Index,
+    isScalarSemantics: Bool
+  ) -> String.Index? {
+    guard currentPosition < endIndex else {
+      return nil
+    }
+    if case .definite(let result) = _quickMatchAnyNonNewline(
+      at: currentPosition,
+      isScalarSemantics: isScalarSemantics
+    ) {
+      assert(result == _thoroughMatchAnyNonNewline(
+        at: currentPosition,
+        isScalarSemantics: isScalarSemantics))
+      return result
+    }
+    return _thoroughMatchAnyNonNewline(
+      at: currentPosition,
+      isScalarSemantics: isScalarSemantics)
+  }
+
+  @inline(__always)
+  private func _quickMatchAnyNonNewline(
+    at currentPosition: String.Index,
+    isScalarSemantics: Bool
+  ) -> QuickResult<String.Index?> {
+    assert(currentPosition < endIndex)
+    guard let (asciiValue, next, isCRLF) = _quickASCIICharacter(
+      at: currentPosition
+    ) else {
+      return .unknown
+    }
+    switch asciiValue {
+    case (._lineFeed)...(._carriageReturn):
+      return .definite(nil)
+    default:
+      assert(!isCRLF)
+      return .definite(next)
+    }
+  }
+
+  @inline(never)
+  private func _thoroughMatchAnyNonNewline(
+    at currentPosition: String.Index,
+    isScalarSemantics: Bool
+  ) -> String.Index? {
+    assert(currentPosition < endIndex)
+    if isScalarSemantics {
+      let scalar = unicodeScalars[currentPosition]
+      guard !scalar.isNewline else { return nil }
+      return unicodeScalars.index(after: currentPosition)
+    }
+
+    let char = self[currentPosition]
+    guard !char.isNewline else { return nil }
+    return index(after: currentPosition)
+  }
+}
 
+// MARK: - Built-in character class matching
 extension String {
+  // TODO: Should the below have a `limitedBy` parameter?
 
   // Mentioned in ProgrammersManual.md, update docs if redesigned
-  func _matchBuiltinCC(
+  func matchBuiltinCC(
     _ cc: _CharacterClassModel.Representation,
     at currentPosition: String.Index,
     isInverted: Bool,
@@ -155,7 +222,7 @@ extension String {
 
   // Mentioned in ProgrammersManual.md, update docs if redesigned
   @inline(__always)
-  func _quickMatchBuiltinCC(
+  private func _quickMatchBuiltinCC(
     _ cc: _CharacterClassModel.Representation,
     at currentPosition: String.Index,
     isInverted: Bool,
@@ -173,7 +240,7 @@ extension String {
 
   // Mentioned in ProgrammersManual.md, update docs if redesigned
   @inline(never)
-  func _thoroughMatchBuiltinCC(
+  private func _thoroughMatchBuiltinCC(
     _ cc: _CharacterClassModel.Representation,
     at currentPosition: String.Index,
     isInverted: Bool,
diff --git a/Sources/_StringProcessing/Engine/MEQuantify.swift b/Sources/_StringProcessing/Engine/MEQuantify.swift
index fa68b8b76..1ff734ccd 100644
--- a/Sources/_StringProcessing/Engine/MEQuantify.swift
+++ b/Sources/_StringProcessing/Engine/MEQuantify.swift
@@ -1,26 +1,45 @@
 extension Processor {
   func _doQuantifyMatch(_ payload: QuantifyPayload) -> Input.Index? {
-    var next: Input.Index?
+    let isScalarSemantics = payload.isScalarSemantics
+
     switch payload.type {
     case .bitset:
-      next = _doMatchBitset(registers[payload.bitset])
+      return input.matchBitset(
+        registers[payload.bitset],
+        at: currentPosition,
+        limitedBy: end,
+        isScalarSemantics: isScalarSemantics)
     case .asciiChar:
-      next = _doMatchScalar(
-        UnicodeScalar.init(_value: UInt32(payload.asciiChar)), true)
+      return input.matchScalar(
+        UnicodeScalar.init(_value: UInt32(payload.asciiChar)),
+        at: currentPosition,
+        limitedBy: end,
+        boundaryCheck: !isScalarSemantics,
+        isCaseInsensitive: false)
     case .builtin:
+      // FIXME: bounds check? endIndex or end?
+
       // We only emit .quantify if it consumes a single character
-      next = input._matchBuiltinCC(
+      return input.matchBuiltinCC(
         payload.builtin,
         at: currentPosition,
         isInverted: payload.builtinIsInverted,
         isStrictASCII: payload.builtinIsStrict,
-        isScalarSemantics: false)
+        isScalarSemantics: isScalarSemantics)
     case .any:
-      let matched = currentPosition != input.endIndex
-        && (!input[currentPosition].isNewline || payload.anyMatchesNewline)
-      next = matched ? input.index(after: currentPosition) : nil
+      // FIXME: endIndex or end?
+      guard currentPosition < input.endIndex else { return nil }
+
+      if payload.anyMatchesNewline {
+        if isScalarSemantics {
+          return input.unicodeScalars.index(after: currentPosition)
+        }
+        return input.index(after: currentPosition)
+      }
+
+      return input.matchAnyNonNewline(
+        at: currentPosition, isScalarSemantics: isScalarSemantics)
     }
-    return next
   }
 
   /// Generic quantify instruction interpreter
@@ -29,8 +48,10 @@ extension Processor {
   mutating func runQuantify(_ payload: QuantifyPayload) -> Bool {
     var trips = 0
     var extraTrips = payload.extraTrips
-    var savePoint = startQuantifierSavePoint()
-    
+    var savePoint = startQuantifierSavePoint(
+      isScalarSemantics: payload.isScalarSemantics
+    )
+
     while true {
       if trips >= payload.minTrips {
         if extraTrips == 0 { break }
@@ -40,7 +61,14 @@ extension Processor {
         }
       }
       let next = _doQuantifyMatch(payload)
-      guard let idx = next else { break }
+      guard let idx = next else {
+        if !savePoint.rangeIsEmpty {
+          // The last save point has saved the current, non-matching position,
+          // so it's unneeded.
+          savePoint.shrinkRange(input)
+        }
+        break
+      }
       currentPosition = idx
       trips += 1
     }
@@ -50,12 +78,8 @@ extension Processor {
       return false
     }
 
-    if payload.quantKind == .eager && !savePoint.rangeIsEmpty {
-      // The last save point has saved the current position, so it's unneeded
-      savePoint.shrinkRange(input)
-      if !savePoint.rangeIsEmpty {
-        savePoints.append(savePoint)
-      }
+    if !savePoint.rangeIsEmpty {
+      savePoints.append(savePoint)
     }
     return true
   }
@@ -65,7 +89,9 @@ extension Processor {
     assert(payload.quantKind == .eager
            && payload.minTrips == 0
            && payload.extraTrips == nil)
-    var savePoint = startQuantifierSavePoint()
+    var savePoint = startQuantifierSavePoint(
+      isScalarSemantics: payload.isScalarSemantics
+    )
 
     while true {
       savePoint.updateRange(newEnd: currentPosition)
@@ -87,7 +113,9 @@ extension Processor {
     assert(payload.quantKind == .eager
            && payload.minTrips == 1
            && payload.extraTrips == nil)
-    var savePoint = startQuantifierSavePoint()
+    var savePoint = startQuantifierSavePoint(
+      isScalarSemantics: payload.isScalarSemantics
+    )
     while true {
       let next = _doQuantifyMatch(payload)
       guard let idx = next else { break }
diff --git a/Sources/_StringProcessing/Engine/Metrics.swift b/Sources/_StringProcessing/Engine/Metrics.swift
index 753c3c3d1..372a7e1b4 100644
--- a/Sources/_StringProcessing/Engine/Metrics.swift
+++ b/Sources/_StringProcessing/Engine/Metrics.swift
@@ -1,13 +1,71 @@
 extension Processor {
+#if PROCESSOR_MEASUREMENTS_ENABLED
   struct ProcessorMetrics {
     var instructionCounts: [Instruction.OpCode: Int] = [:]
     var backtracks: Int = 0
     var resets: Int = 0
+    var cycleCount: Int = 0
+
+    var isTracingEnabled: Bool = false
+    var shouldMeasureMetrics: Bool = false
+
+    init(isTracingEnabled: Bool, shouldMeasureMetrics: Bool) {
+      self.isTracingEnabled = isTracingEnabled
+      self.shouldMeasureMetrics = shouldMeasureMetrics
+    }
   }
-  
+#else
+  struct ProcessorMetrics {
+    var isTracingEnabled: Bool { false }
+    var shouldMeasureMetrics: Bool { false }
+    var cycleCount: Int { 0 }
+
+    init(isTracingEnabled: Bool, shouldMeasureMetrics: Bool) { }
+  }
+#endif
+}
+
+extension Processor {
+
+  mutating func startCycleMetrics() {
+#if PROCESSOR_MEASUREMENTS_ENABLED
+    if metrics.cycleCount == 0 {
+      trace()
+      measureMetrics()
+    }
+#endif
+  }
+
+  mutating func endCycleMetrics() {
+#if PROCESSOR_MEASUREMENTS_ENABLED
+    metrics.cycleCount += 1
+    trace()
+    measureMetrics()
+    _checkInvariants()
+#endif
+  }
+}
+
+extension Processor.ProcessorMetrics {
+
+  mutating func addReset() {
+#if PROCESSOR_MEASUREMENTS_ENABLED
+    self.resets += 1
+#endif
+  }
+
+  mutating func addBacktrack() {
+#if PROCESSOR_MEASUREMENTS_ENABLED
+    self.backtracks += 1
+#endif
+  }
+}
+
+extension Processor {
+#if PROCESSOR_MEASUREMENTS_ENABLED
   func printMetrics() {
     print("===")
-    print("Total cycle count: \(cycleCount)")
+    print("Total cycle count: \(metrics.cycleCount)")
     print("Backtracks: \(metrics.backtracks)")
     print("Resets: \(metrics.resets)")
     print("Instructions:")
@@ -21,7 +79,7 @@ extension Processor {
   }
 
   mutating func measure() {
-    let (opcode, _) = fetch().destructure
+    let (opcode, _) = fetch()
     if metrics.instructionCounts.keys.contains(opcode) {
       metrics.instructionCounts[opcode]! += 1
     } else {
@@ -30,8 +88,9 @@ extension Processor {
   }
   
   mutating func measureMetrics() {
-    if shouldMeasureMetrics {
+    if metrics.shouldMeasureMetrics {
       measure()
     }
   }
+#endif
 }
diff --git a/Sources/_StringProcessing/Engine/Processor.swift b/Sources/_StringProcessing/Engine/Processor.swift
index 25cad5c8c..0350a37db 100644
--- a/Sources/_StringProcessing/Engine/Processor.swift
+++ b/Sources/_StringProcessing/Engine/Processor.swift
@@ -35,7 +35,7 @@ struct Processor {
   /// of the search. `input` can be a "supersequence" of the subject, while
   /// `input[subjectBounds]` is the logical entity that is being searched.
   let input: Input
-  
+
   /// The bounds of the logical subject in `input`.
   ///
   /// `subjectBounds` represents the bounds of the string or substring that a
@@ -46,7 +46,7 @@ struct Processor {
   /// `subjectBounds` is always equal to or a subrange of
   /// `input.startIndex..<input.endIndex`.
   let subjectBounds: Range<Position>
-  
+
   /// The bounds within the subject for an individual search.
   ///
   /// `searchBounds` is equal to `subjectBounds` in some cases, but can be a
@@ -62,7 +62,7 @@ struct Processor {
   let instructions: InstructionList<Instruction>
 
   // MARK: Resettable state
-  
+
   /// The current search position while processing.
   ///
   /// `currentPosition` must always be in the range `subjectBounds` or equal
@@ -81,16 +81,12 @@ struct Processor {
 
   var wordIndexCache: Set<String.Index>? = nil
   var wordIndexMaxIndex: String.Index? = nil
-  
+
   var state: State = .inProgress
 
   var failureReason: Error? = nil
 
-  // MARK: Metrics, debugging, etc.
-  var cycleCount = 0
-  var isTracingEnabled: Bool
-  let shouldMeasureMetrics: Bool
-  var metrics: ProcessorMetrics = ProcessorMetrics()
+  var metrics: ProcessorMetrics
 }
 
 extension Processor {
@@ -116,14 +112,17 @@ extension Processor {
     self.subjectBounds = subjectBounds
     self.searchBounds = searchBounds
     self.matchMode = matchMode
-    self.isTracingEnabled = isTracingEnabled
-    self.shouldMeasureMetrics = shouldMeasureMetrics
+
+    self.metrics = ProcessorMetrics(
+      isTracingEnabled: isTracingEnabled,
+      shouldMeasureMetrics: shouldMeasureMetrics)
+
     self.currentPosition = searchBounds.lowerBound
 
     // Initialize registers with end of search bounds
     self.registers = Registers(program, searchBounds.upperBound)
     self.storedCaptures = Array(
-       repeating: .init(), count: program.registerInfo.captures)
+      repeating: .init(), count: program.registerInfo.captures)
 
     _checkInvariants()
   }
@@ -144,8 +143,8 @@ extension Processor {
 
     self.state = .inProgress
     self.failureReason = nil
-    
-    if shouldMeasureMetrics { metrics.resets += 1 }
+
+    metrics.addReset()
     _checkInvariants()
   }
 
@@ -160,23 +159,22 @@ extension Processor {
 }
 
 extension Processor {
+  func fetch() -> (Instruction.OpCode, Instruction.Payload) {
+    instructions[controller.pc].destructure
+  }
+
   var slice: Input.SubSequence {
     // TODO: Should we whole-scale switch to slices, or
     // does that depend on options for some anchors?
     input[searchBounds]
   }
 
-  // Advance in our input, without any checks or failure signalling
-  mutating func _uncheckedForcedConsumeOne() {
-    assert(currentPosition != end)
-    input.formIndex(after: &currentPosition)
-  }
-
   // Advance in our input
   //
   // Returns whether the advance succeeded. On failure, our
   // save point was restored
   mutating func consume(_ n: Distance) -> Bool {
+    // TODO: needs benchmark coverage
     guard let idx = input.index(
       currentPosition, offsetBy: n.rawValue, limitedBy: end
     ) else {
@@ -186,9 +184,10 @@ extension Processor {
     currentPosition = idx
     return true
   }
-  
+
   // Advances in unicode scalar view
   mutating func consumeScalar(_ n: Distance) -> Bool {
+    // TODO: needs benchmark coverage
     guard let idx = input.unicodeScalars.index(
       currentPosition, offsetBy: n.rawValue, limitedBy: end
     ) else {
@@ -220,29 +219,29 @@ extension Processor {
   func load() -> Element? {
     currentPosition < end ? input[currentPosition] : nil
   }
-  func load(count: Int) -> Input.SubSequence? {
-    let slice = self.slice[currentPosition...].prefix(count)
-    guard slice.count == count else { return nil }
-    return slice
-  }
+
+  // MARK: Match functions
+  //
+  // TODO: refactor these such that `cycle()` calls the corresponding String
+  //       method directly, and all the step, signalFailure, and
+  //       currentPosition logic is collected into a single place inside
+  //       cycle().
 
   // Match against the current input element. Returns whether
   // it succeeded vs signaling an error.
-  mutating func match(_ e: Element) -> Bool {
-    guard let cur = load(), cur == e else {
-      signalFailure()
-      return false
-    }
-    _uncheckedForcedConsumeOne()
-    return true
-  }
-
-  mutating func matchCaseInsensitive(_ e: Element) -> Bool {
-    guard let cur = load(), cur.lowercased() == e.lowercased() else {
+  mutating func match(
+    _ e: Element, isCaseInsensitive: Bool
+  ) -> Bool {
+    guard let next = input.match(
+      e,
+      at: currentPosition,
+      limitedBy: end,
+      isCaseInsensitive: isCaseInsensitive
+    ) else {
       signalFailure()
       return false
     }
-    _uncheckedForcedConsumeOne()
+    currentPosition = next
     return true
   }
 
@@ -250,81 +249,54 @@ extension Processor {
   // it succeeded vs signaling an error.
   mutating func matchSeq(
     _ seq: Substring,
-    isScalarMode: Bool
+    isScalarSemantics: Bool
   ) -> Bool  {
-    if isScalarMode {
-      for s in seq.unicodeScalars {
-        guard matchScalar(s, boundaryCheck: false) else { return false }
-      }
-      return true
-    }
-
-    for e in seq {
-      guard match(e) else { return false }
-    }
-    return true
-  }
-
-  func loadScalar() -> Unicode.Scalar? {
-    currentPosition < end ? input.unicodeScalars[currentPosition] : nil
-  }
-  
-  func _doMatchScalar(_ s: Unicode.Scalar, _ boundaryCheck: Bool) -> Input.Index? {
-    if s == loadScalar(),
-       let idx = input.unicodeScalars.index(
-        currentPosition,
-        offsetBy: 1,
-        limitedBy: end),
-       (!boundaryCheck || input.isOnGraphemeClusterBoundary(idx)) {
-      return idx
-    } else {
-      return nil
-    }
-  }
-  
-  mutating func matchScalar(_ s: Unicode.Scalar, boundaryCheck: Bool) -> Bool {
-    guard let next = _doMatchScalar(s, boundaryCheck) else {
+    guard let next = input.matchSeq(
+      seq,
+      at: currentPosition,
+      limitedBy: end,
+      isScalarSemantics: isScalarSemantics
+    ) else {
       signalFailure()
       return false
     }
+
     currentPosition = next
     return true
   }
 
-  mutating func matchScalarCaseInsensitive(
+  mutating func matchScalar(
     _ s: Unicode.Scalar,
-    boundaryCheck: Bool
+    boundaryCheck: Bool,
+    isCaseInsensitive: Bool
   ) -> Bool {
-    guard let curScalar = loadScalar(),
-          s.properties.lowercaseMapping == curScalar.properties.lowercaseMapping,
-          let idx = input.unicodeScalars.index(
-            currentPosition,
-            offsetBy: 1,
-            limitedBy: end),
-          (!boundaryCheck || input.isOnGraphemeClusterBoundary(idx))
-    else {
+    guard let next = input.matchScalar(
+      s,
+      at: currentPosition,
+      limitedBy: end,
+      boundaryCheck: boundaryCheck,
+      isCaseInsensitive: isCaseInsensitive
+    ) else {
       signalFailure()
       return false
     }
-    currentPosition = idx
+    currentPosition = next
     return true
   }
 
-  func _doMatchBitset(_ bitset: DSLTree.CustomCharacterClass.AsciiBitset) -> Input.Index? {
-    if let cur = load(), bitset.matches(char: cur) {
-      return input.index(after: currentPosition)
-    } else {
-      return nil
-    }
-  }
-
   // If we have a bitset we know that the CharacterClass only matches against
   // ascii characters, so check if the current input element is ascii then
   // check if it is set in the bitset
   mutating func matchBitset(
-    _ bitset: DSLTree.CustomCharacterClass.AsciiBitset
+    _ bitset: DSLTree.CustomCharacterClass.AsciiBitset,
+    isScalarSemantics: Bool
   ) -> Bool {
-    guard let next = _doMatchBitset(bitset) else {
+    guard let next = input.matchBitset(
+      bitset,
+      at: currentPosition,
+      limitedBy: end,
+      isScalarSemantics: isScalarSemantics
+    ) else {
       signalFailure()
       return false
     }
@@ -332,37 +304,18 @@ extension Processor {
     return true
   }
 
-  // Equivalent of matchBitset but emitted when in unicode scalar semantic mode
-  mutating func matchBitsetScalar(
-    _ bitset: DSLTree.CustomCharacterClass.AsciiBitset
+  // Matches the next character/scalar if it is not a newline
+  mutating func matchAnyNonNewline(
+    isScalarSemantics: Bool
   ) -> Bool {
-    guard let curScalar = loadScalar(),
-          bitset.matches(scalar: curScalar),
-          let idx = input.unicodeScalars.index(currentPosition, offsetBy: 1, limitedBy: end) else {
-      signalFailure()
-      return false
-    }
-    currentPosition = idx
-    return true
-  }
-
-  // Matches the next character if it is not a newline
-  mutating func matchAnyNonNewline() -> Bool {
-    guard let c = load(), !c.isNewline else {
-      signalFailure()
-      return false
-    }
-    _uncheckedForcedConsumeOne()
-    return true
-  }
-  
-  // Matches the next scalar if it is not a newline
-  mutating func matchAnyNonNewlineScalar() -> Bool {
-    guard let s = loadScalar(), !s.isNewline else {
+    guard let next = input.matchAnyNonNewline(
+      at: currentPosition,
+      isScalarSemantics: isScalarSemantics
+    ) else {
       signalFailure()
       return false
     }
-    input.unicodeScalars.formIndex(after: &currentPosition)
+    currentPosition = next
     return true
   }
 
@@ -401,8 +354,8 @@ extension Processor {
     storedCaptures = capEnds
     registers.ints = intRegisters
     registers.positions = posRegisters
-    
-    if shouldMeasureMetrics { metrics.backtracks += 1 }
+
+    metrics.addBacktrack()
   }
 
   mutating func abort(_ e: Error? = nil) {
@@ -436,25 +389,15 @@ extension Processor {
     // TODO: What should we do here?
     fatalError("Invalid code: Tried to clear save points when empty")
   }
-  
+
   mutating func cycle() {
     _checkInvariants()
     assert(state == .inProgress)
 
-#if PROCESSOR_MEASUREMENTS_ENABLED
-    if cycleCount == 0 {
-      trace()
-      measureMetrics()
-    }
-    defer {
-      cycleCount += 1
-      trace()
-      measureMetrics()
-      _checkInvariants()
-    }
-#endif
+    startCycleMetrics()
+    defer { endCycleMetrics() }
 
-    let (opcode, payload) = fetch().destructure
+    let (opcode, payload) = fetch()
     switch opcode {
     case .invalid:
       fatalError("Invalid program")
@@ -535,50 +478,30 @@ extension Processor {
         }
       }
     case .matchAnyNonNewline:
-      if payload.isScalar {
-        if matchAnyNonNewlineScalar() {
-          controller.step()
-        }
-      } else {
-        if matchAnyNonNewline() {
-          controller.step()
-        }
+      if matchAnyNonNewline(isScalarSemantics: payload.isScalar) {
+        controller.step()
       }
     case .match:
       let (isCaseInsensitive, reg) = payload.elementPayload
-      if isCaseInsensitive {
-        if matchCaseInsensitive(registers[reg]) {
-          controller.step()
-        }
-      } else {
-        if match(registers[reg]) {
-          controller.step()
-        }
+      if match(registers[reg], isCaseInsensitive: isCaseInsensitive) {
+        controller.step()
       }
 
     case .matchScalar:
       let (scalar, caseInsensitive, boundaryCheck) = payload.scalarPayload
-      if caseInsensitive {
-        if matchScalarCaseInsensitive(scalar, boundaryCheck: boundaryCheck) {
-          controller.step()
-        }
-      } else {
-        if matchScalar(scalar, boundaryCheck: boundaryCheck) {
-          controller.step()
-        }
+      if matchScalar(
+        scalar,
+        boundaryCheck: boundaryCheck,
+        isCaseInsensitive: caseInsensitive
+      ) {
+        controller.step()
       }
 
     case .matchBitset:
       let (isScalar, reg) = payload.bitsetPayload
       let bitset = registers[reg]
-      if isScalar {
-        if matchBitsetScalar(bitset) {
-          controller.step()
-        }
-      } else {
-        if matchBitset(bitset) {
-          controller.step()
-        }
+      if matchBitset(bitset, isScalarSemantics: isScalar) {
+        controller.step()
       }
 
     case .matchBuiltin:
@@ -669,7 +592,7 @@ extension Processor {
         signalFailure()
         return
       }
-      if matchSeq(input[range], isScalarMode: isScalarMode) {
+      if matchSeq(input[range], isScalarSemantics: isScalarMode) {
         controller.step()
       }
 
@@ -714,3 +637,122 @@ extension Processor {
     }
   }
 }
+
+// MARK: String matchers
+//
+// TODO: Refactor into separate file, formalize patterns
+
+extension String {
+
+  func match(
+    _ char: Character,
+    at pos: Index,
+    limitedBy end: String.Index,
+    isCaseInsensitive: Bool
+  ) -> Index? {
+    // TODO: This can be greatly sped up with string internals
+    // TODO: This is also very much quick-check-able
+    assert(end <= endIndex)
+
+    guard pos < end else { return nil }
+
+    if isCaseInsensitive {
+      guard self[pos].lowercased() == char.lowercased() else { return nil }
+    } else {
+      guard self[pos] == char else { return nil }
+    }
+
+    let idx = index(after: pos)
+    guard idx <= end else { return nil }
+
+    return idx
+  }
+
+  func matchSeq(
+    _ seq: Substring,
+    at pos: Index,
+    limitedBy end: Index,
+    isScalarSemantics: Bool
+  ) -> Index? {
+    // TODO: This can be greatly sped up with string internals
+    // TODO: This is also very much quick-check-able
+    assert(end <= endIndex)
+
+    var cur = pos
+
+    if isScalarSemantics {
+      for e in seq.unicodeScalars {
+        guard cur < end, unicodeScalars[cur] == e else { return nil }
+        self.unicodeScalars.formIndex(after: &cur)
+      }
+    } else {
+      for e in seq {
+        guard cur < end, self[cur] == e else { return nil }
+        self.formIndex(after: &cur)
+      }
+    }
+
+    guard cur <= end else { return nil }
+    return cur
+  }
+
+  func matchScalar(
+    _ scalar: Unicode.Scalar,
+    at pos: Index,
+    limitedBy end: String.Index,
+    boundaryCheck: Bool,
+    isCaseInsensitive: Bool
+  ) -> Index? {
+    // TODO: extremely quick-check-able
+    // TODO: can be sped up with string internals
+    assert(end <= endIndex)
+
+    guard pos < end else { return nil }
+    let curScalar = unicodeScalars[pos]
+
+    if isCaseInsensitive {
+      guard curScalar.properties.lowercaseMapping == scalar.properties.lowercaseMapping
+      else {
+        return nil
+      }
+    } else {
+      guard curScalar == scalar else { return nil }
+    }
+
+    let idx = unicodeScalars.index(after: pos)
+    guard idx <= end else { return nil }
+
+    if boundaryCheck && !isOnGraphemeClusterBoundary(idx) {
+      return nil
+    }
+
+    return idx
+  }
+
+  func matchBitset(
+    _ bitset: DSLTree.CustomCharacterClass.AsciiBitset,
+    at pos: Index,
+    limitedBy end: Index,
+    isScalarSemantics: Bool
+  ) -> Index? {
+    // TODO: extremely quick-check-able
+    // TODO: can be sped up with string internals
+    assert(end <= endIndex)
+
+    guard pos < end else { return nil }
+
+    let idx: String.Index
+    if isScalarSemantics {
+      guard bitset.matches(unicodeScalars[pos]) else { return nil }
+      idx = unicodeScalars.index(after: pos)
+    } else {
+      guard bitset.matches(self[pos]) else { return nil }
+      idx = index(after: pos)
+    }
+
+    guard idx <= end else { return nil }
+    return idx
+  }
+
+
+}
diff --git a/Sources/_StringProcessing/Engine/Tracing.swift b/Sources/_StringProcessing/Engine/Tracing.swift
index 725319b00..b0ce67555 100644
--- a/Sources/_StringProcessing/Engine/Tracing.swift
+++ b/Sources/_StringProcessing/Engine/Tracing.swift
@@ -9,7 +9,12 @@
 //
 //===----------------------------------------------------------------------===//
 
+
+// TODO: Remove this protocol (and/or reuse it for something like a FastProcessor)
 extension Processor: TracedProcessor {
+  var cycleCount: Int { metrics.cycleCount }
+  var isTracingEnabled: Bool { metrics.isTracingEnabled }
+
   var isFailState: Bool { state == .fail }
   var isAcceptState: Bool { state == .accept }
 
diff --git a/Sources/_StringProcessing/Executor.swift b/Sources/_StringProcessing/Executor.swift
index 253858d1f..0453fcd80 100644
--- a/Sources/_StringProcessing/Executor.swift
+++ b/Sources/_StringProcessing/Executor.swift
@@ -31,7 +31,7 @@ struct Executor {
       subjectBounds: subjectBounds,
       searchBounds: searchBounds)
 #if PROCESSOR_MEASUREMENTS_ENABLED
-    defer { if cpu.shouldMeasureMetrics { cpu.printMetrics() } }
+    defer { if cpu.metrics.shouldMeasureMetrics { cpu.printMetrics() } }
 #endif
     var low = searchBounds.lowerBound
     let high = searchBounds.upperBound
@@ -60,7 +60,7 @@ struct Executor {
     var cpu = engine.makeProcessor(
       input: input, bounds: subjectBounds, matchMode: mode)
 #if PROCESSOR_MEASUREMENTS_ENABLED
-    defer { if cpu.shouldMeasureMetrics { cpu.printMetrics() } }
+    defer { if cpu.metrics.shouldMeasureMetrics { cpu.printMetrics() } }
 #endif
     return try _match(input, from: subjectBounds.lowerBound, using: &cpu)
   }
diff --git a/Sources/_StringProcessing/Regex/AnyRegexOutput.swift b/Sources/_StringProcessing/Regex/AnyRegexOutput.swift
index 243c1ba01..57bf06dae 100644
--- a/Sources/_StringProcessing/Regex/AnyRegexOutput.swift
+++ b/Sources/_StringProcessing/Regex/AnyRegexOutput.swift
@@ -283,7 +283,7 @@ extension Regex where Output == AnyRegexOutput {
   ///
   /// - Parameter regex: A regular expression to convert to use a dynamic
   ///   capture list.
-  public init<Output>(_ regex: Regex<Output>) {
+  public init<OtherOutput>(_ regex: Regex<OtherOutput>) {
     self.init(node: regex.root)
   }
 }
@@ -299,7 +299,7 @@ extension Regex.Match where Output == AnyRegexOutput {
   ///
   /// - Parameter match: A regular expression match to convert to a match with
   ///   type-erased captures.
-  public init<Output>(_ match: Regex<Output>.Match) {
+  public init<OtherOutput>(_ match: Regex<OtherOutput>.Match) {
     self.init(
       anyRegexOutput: match.anyRegexOutput,
       range: match.range
diff --git a/Sources/_StringProcessing/Unicode/ASCII.swift b/Sources/_StringProcessing/Unicode/ASCII.swift
index 5150e18cc..de13e340a 100644
--- a/Sources/_StringProcessing/Unicode/ASCII.swift
+++ b/Sources/_StringProcessing/Unicode/ASCII.swift
@@ -9,26 +9,25 @@
 //
 //===----------------------------------------------------------------------===//
 
-private var _lineFeed: UInt8 { 0x0A }
-private var _carriageReturn: UInt8 { 0x0D }
-private var _lineTab: UInt8 { 0x0B }
-private var _formFeed: UInt8 { 0x0C }
-private var _space: UInt8 { 0x20 }
-private var _tab: UInt8 { 0x09 }
+extension UInt8 {
+  static var _lineFeed: UInt8 { 0x0A }
+  static var _carriageReturn: UInt8 { 0x0D }
+  static var _lineTab: UInt8 { 0x0B }
+  static var _formFeed: UInt8 { 0x0C }
+  static var _space: UInt8 { 0x20 }
+  static var _tab: UInt8 { 0x09 }
+
+  static var _underscore: UInt8 { 0x5F }
+}
 
 private var _0: UInt8 { 0x30 }
 private var _9: UInt8 { 0x39 }
-private func _isASCIINumber(_ x: UInt8) -> Bool {
-  return (_0..._9).contains(x)
-}
 
 private var _a: UInt8 { 0x61 }
 private var _z: UInt8 { 0x7A }
 private var _A: UInt8 { 0x41 }
 private var _Z: UInt8 { 0x5A }
 
-private var _underscore: UInt8 { 0x5F }
-
 extension UInt8 {
   var _isASCII: Bool { self < 0x80 }
 
@@ -43,14 +42,14 @@ extension UInt8 {
   /// Assuming we're ASCII, whether we match `\h`
   var _asciiIsHorizontalWhitespace: Bool {
     assert(_isASCII)
-    return self == _space || self == _tab
+    return self == ._space || self == ._tab
   }
 
   /// Assuming we're ASCII, whether we match `\v`
   var _asciiIsVerticalWhitespace: Bool {
     assert(_isASCII)
     switch self {
-    case _lineFeed, _carriageReturn, _lineTab, _formFeed:
+    case ._lineFeed, ._carriageReturn, ._lineTab, ._formFeed:
       return true
     default:
       return false
@@ -61,7 +60,7 @@ extension UInt8 {
   var _asciiIsWhitespace: Bool {
     assert(_isASCII)
     switch self {
-    case _space, _tab, _lineFeed, _lineTab, _formFeed, _carriageReturn:
+    case ._space, ._tab, ._lineFeed, ._lineTab, ._formFeed, ._carriageReturn:
       return true
     default:
       return false
@@ -77,11 +76,13 @@ extension UInt8 {
   /// Assuming we're ASCII, whether we match `\w`
   var _asciiIsWord: Bool {
     assert(_isASCII)
-    return _asciiIsDigit || _asciiIsLetter || self == _underscore
+    return _asciiIsDigit || _asciiIsLetter || self == ._underscore
   }
 }
 
 extension String {
+  /// TODO: better to take isScalarSemantics parameter, we can return more results
+  /// and we can give the right `next` index, not requiring the caller to re-adjust it
   /// TODO: detailed description of nuanced semantics
   func _quickASCIICharacter(
     at idx: Index
@@ -107,7 +108,7 @@ extension String {
     guard tail._isSub300StartingByte else { return nil }
 
     // Handle CR-LF:
-    if base == _carriageReturn && tail == _lineFeed {
+    if base == ._carriageReturn && tail == ._lineFeed {
       utf8.formIndex(after: &next)
       guard next == endIndex || utf8[next]._isSub300StartingByte else {
         return nil
@@ -165,5 +166,6 @@ extension String {
       return (next, asciiValue._asciiIsWord)
     }
   }
+
 }
 
diff --git a/Sources/_StringProcessing/Unicode/WordBreaking.swift b/Sources/_StringProcessing/Unicode/WordBreaking.swift
index 50da079f6..f1f9573c1 100644
--- a/Sources/_StringProcessing/Unicode/WordBreaking.swift
+++ b/Sources/_StringProcessing/Unicode/WordBreaking.swift
@@ -12,6 +12,7 @@
 @_spi(_Unicode)
 import Swift
 
+// TODO: Sink onto String
 extension Processor {
   func atSimpleBoundary(
     _ usesAsciiWord: Bool,
@@ -20,9 +21,11 @@ extension Processor {
     func matchesWord(at i: Input.Index) -> Bool {
       switch semanticLevel {
       case .graphemeCluster:
+        // TODO: needs benchmark coverage
         let c = input[i]
         return c.isWordCharacter && (c.isASCII || !usesAsciiWord)
       case .unicodeScalar:
+        // TODO: needs benchmark coverage
         let c = input.unicodeScalars[i]
         return (c.properties.isAlphabetic || c == "_") && (c.isASCII || !usesAsciiWord)
       }
@@ -51,6 +54,7 @@ extension String {
     using cache: inout Set<String.Index>?,
     _ maxIndex: inout String.Index?
   ) -> Bool {
+    // TODO: needs benchmark coverage
     guard i != startIndex, i != endIndex else {
       return true
     }
diff --git a/Sources/_StringProcessing/Utility/AsciiBitset.swift b/Sources/_StringProcessing/Utility/AsciiBitset.swift
index ad3159820..e063447a0 100644
--- a/Sources/_StringProcessing/Utility/AsciiBitset.swift
+++ b/Sources/_StringProcessing/Utility/AsciiBitset.swift
@@ -57,7 +57,7 @@ extension DSLTree.CustomCharacterClass {
       }
     }
 
-    internal func matches(char: Character) -> Bool {
+    internal func matches(_ char: Character) -> Bool {
       let matched: Bool
       if let val = char._singleScalarAsciiValue {
         matched = matches(val)
@@ -71,7 +71,7 @@ extension DSLTree.CustomCharacterClass {
       return matched
     }
 
-    internal func matches(scalar: Unicode.Scalar) -> Bool {
+    internal func matches(_ scalar: Unicode.Scalar) -> Bool {
       let matched: Bool
       if scalar.isASCII {
         let val = UInt8(ascii: scalar)
diff --git a/Sources/_StringProcessing/Utility/Protocols.swift b/Sources/_StringProcessing/Utility/Protocols.swift
index 7542a17dd..24ffbcf70 100644
--- a/Sources/_StringProcessing/Utility/Protocols.swift
+++ b/Sources/_StringProcessing/Utility/Protocols.swift
@@ -44,13 +44,3 @@ protocol ProcessorProtocol {
   var registers: Registers { get }
 }
 
-extension ProcessorProtocol {
-  func fetch() -> Instruction {
-    instructions[currentPC]
-  }
-
-  var callStack: Array<InstructionAddress> { [] }
-//  var savePoints: Array<SavePoint> { [] }
-  var registers: Array<Registers> { [] }
-
-}
diff --git a/Sources/_StringProcessing/Utility/Traced.swift b/Sources/_StringProcessing/Utility/Traced.swift
index 112a601b1..198564fe1 100644
--- a/Sources/_StringProcessing/Utility/Traced.swift
+++ b/Sources/_StringProcessing/Utility/Traced.swift
@@ -13,7 +13,7 @@
 // TODO: Place shared formatting and trace infrastructure here
 
 protocol Traced {
-  var isTracingEnabled: Bool { get set }
+  var isTracingEnabled: Bool { get }
 }
 
 protocol TracedProcessor: ProcessorProtocol, Traced {
diff --git a/Sources/_StringProcessing/_CharacterClassModel.swift b/Sources/_StringProcessing/_CharacterClassModel.swift
index cdee66ddb..c053b31e4 100644
--- a/Sources/_StringProcessing/_CharacterClassModel.swift
+++ b/Sources/_StringProcessing/_CharacterClassModel.swift
@@ -79,7 +79,7 @@ struct _CharacterClassModel: Hashable {
 
     let isScalarSemantics = matchLevel == .unicodeScalar
 
-    return input._matchBuiltinCC(
+    return input.matchBuiltinCC(
       cc,
       at: currentPosition,
       isInverted: isInverted,
diff --git a/Tests/RegexBuilderTests/AlgorithmsTests.swift b/Tests/RegexBuilderTests/AlgorithmsTests.swift
index 115941070..b85395d19 100644
--- a/Tests/RegexBuilderTests/AlgorithmsTests.swift
+++ b/Tests/RegexBuilderTests/AlgorithmsTests.swift
@@ -16,7 +16,7 @@ import RegexBuilder
 @available(SwiftStdlib 5.7, *)
 class RegexConsumerTests: XCTestCase {
   func testMatches() {
-    let regex = Capture(OneOrMore(.digit)) { 2 * Int($0)! }
+    let regex = Capture<(Substring, Int)>(OneOrMore(.digit)) { 2 * Int($0)! }
     let str = "foo 160 bar 99 baz"
     XCTAssertEqual(str.matches(of: regex).map(\.output.1), [320, 198])
   }
diff --git a/Tests/RegexBuilderTests/CustomTests.swift b/Tests/RegexBuilderTests/CustomTests.swift
index 26746d613..848ef4626 100644
--- a/Tests/RegexBuilderTests/CustomTests.swift
+++ b/Tests/RegexBuilderTests/CustomTests.swift
@@ -64,7 +64,7 @@ private struct IntParser: CustomConsumingRegexComponent {
     guard index != bounds.upperBound else { return nil }
 
     let r = Regex {
-      Capture(OneOrMore(.digit)) { Int($0) }
+      Capture<(Substring, Int?)>(OneOrMore(.digit)) { Int($0) }
     }
 
     guard let match = input[index..<bounds.upperBound].prefixMatch(of: r),
@@ -694,7 +694,7 @@ class CustomRegexComponentTests: XCTestCase {
       OneOrMore {
         CharacterClass("A"..."Z")
         OneOrMore(CharacterClass("a"..."z"))
-        Capture(Repeat(.digit, count: 2)) { Int($0) }
+        Capture<(Substring, Int?)>(Repeat(.digit, count: 2)) { Int($0) }
       }
     }
 
diff --git a/Tests/RegexBuilderTests/RegexDSLTests.swift b/Tests/RegexBuilderTests/RegexDSLTests.swift
index 5e85ad26c..19ac675dc 100644
--- a/Tests/RegexBuilderTests/RegexDSLTests.swift
+++ b/Tests/RegexBuilderTests/RegexDSLTests.swift
@@ -1091,7 +1091,7 @@ class RegexDSLTests: XCTestCase {
       OneOrMore("a")
       Capture {
         TryCapture("b", transform: { Int($0) })
-        ZeroOrMore(
+        ZeroOrMore<(Substring, Double?)>(
           TryCapture("c", transform: { Double($0) })
         )
         Optionally("e")
@@ -1542,12 +1542,12 @@ class RegexDSLTests: XCTestCase {
       in bounds: Range<String.Index>
     ) throws -> (upperBound: String.Index, output: SemanticVersion)? {
       let regex = Regex {
-        TryCapture(OneOrMore(.digit)) { Int($0) }
+        TryCapture<(Substring, Int)>(OneOrMore(.digit)) { Int($0) }
         "."
-        TryCapture(OneOrMore(.digit)) { Int($0) }
+        TryCapture<(Substring, Int)>(OneOrMore(.digit)) { Int($0) }
         Optionally {
           "."
-          TryCapture(OneOrMore(.digit)) { Int($0) }
+          TryCapture<(Substring, Int)>(OneOrMore(.digit)) { Int($0) }
         }
         Optionally {
           "-"
@@ -1876,7 +1876,7 @@ extension RegexDSLTests {
       ":"
       regexWithTooManyCaptures
       ":"
-      TryCapture(OneOrMore(.word)) { Int($0) }
+      TryCapture<(Substring, Int)>(OneOrMore(.word)) { Int($0) }
       #/:(\d+):/#
     }
     XCTAssert(type(of: dslWithTooManyCaptures).self
diff --git a/Tests/RegexTests/AlgorithmsTests.swift b/Tests/RegexTests/AlgorithmsTests.swift
index d5419fe9c..60548a0a2 100644
--- a/Tests/RegexTests/AlgorithmsTests.swift
+++ b/Tests/RegexTests/AlgorithmsTests.swift
@@ -165,8 +165,12 @@ class AlgorithmTests: XCTestCase {
       let actualCol: [Range<Int>] = input.ranges(of: pattern)[...].map(input.offsets(of:))
       XCTAssertEqual(actualCol, expected, file: file, line: line)
       
-      let firstRange = input.firstRange(of: pattern).map(input.offsets(of:))
-      XCTAssertEqual(firstRange, expected.first, file: file, line: line)
+      let firstRange = input.firstRange(of: pattern)
+      XCTAssertEqual(firstRange.map(input.offsets(of:)), expected.first, file: file, line: line)
+      if let upperBound = firstRange?.upperBound, !pattern.isEmpty {
+        let secondRange = input[upperBound...].firstRange(of: pattern).map(input.offsets(of:))
+        XCTAssertEqual(secondRange, expected.dropFirst().first, file: file, line: line)
+      }
     }
 
     expectRanges("", "", [0..<0])
@@ -176,6 +180,9 @@ class AlgorithmTests: XCTestCase {
     expectRanges("abcde", "bcd", [1..<4])
     expectRanges("ababacabababa", "abababa", [6..<13])
     expectRanges("ababacabababa", "aba", [0..<3, 6..<9, 10..<13])
+    
+    // Test for rdar://92794248
+    expectRanges("ADACBADADACBADACB", "ADACB", [0..<5, 7..<12, 12..<17])
   }
   
   // rdar://105154010
diff --git a/Tests/RegexTests/MatchTests.swift b/Tests/RegexTests/MatchTests.swift
index a1bf0e76f..3fc547e34 100644
--- a/Tests/RegexTests/MatchTests.swift
+++ b/Tests/RegexTests/MatchTests.swift
@@ -620,6 +620,50 @@ extension RegexTests {
     // TODO: After captures, easier to test these
   }
 
+  func testQuantificationScalarSemantics() {
+    // TODO: We want more thorough testing here, including "a{n,m}", "a?", etc.
+
+    firstMatchTest("a*", input: "aaa\u{301}", match: "aa")
+    firstMatchTest("a*", input: "aaa\u{301}", match: "aaa", semanticLevel: .unicodeScalar)
+    firstMatchTest("a+", input: "aaa\u{301}", match: "aa")
+    firstMatchTest("a+", input: "aaa\u{301}", match: "aaa", semanticLevel: .unicodeScalar)
+    firstMatchTest("a?", input: "a\u{301}", match: "")
+    firstMatchTest("a?", input: "a\u{301}", match: "a", semanticLevel: .unicodeScalar)
+
+    firstMatchTest("[ab]*", input: "abab\u{301}", match: "aba")
+    firstMatchTest("[ab]*", input: "abab\u{301}", match: "abab", semanticLevel: .unicodeScalar)
+    firstMatchTest("[ab]+", input: "abab\u{301}", match: "aba")
+    firstMatchTest("[ab]+", input: "abab\u{301}", match: "abab", semanticLevel: .unicodeScalar)
+    firstMatchTest("[ab]?", input: "b\u{301}", match: "")
+    firstMatchTest("[ab]?", input: "b\u{301}", match: "b", semanticLevel: .unicodeScalar)
+
+    firstMatchTest(#"\s*"#, input: "  \u{301}", match: "  \u{301}")
+    firstMatchTest(#"\s*"#, input: "  \u{301}", match: "  ", semanticLevel: .unicodeScalar)
+    firstMatchTest(#"\s+"#, input: "  \u{301}", match: "  \u{301}")
+    firstMatchTest(#"\s+"#, input: "  \u{301}", match: "  ", semanticLevel: .unicodeScalar)
+    firstMatchTest(#"\s?"#, input: " \u{301}", match: " \u{301}")
+    firstMatchTest(#"\s?"#, input: " \u{301}", match: " ", semanticLevel: .unicodeScalar)
+
+    firstMatchTest(#".*?a"#, input: "xxa\u{301}xaZ", match: "xxa\u{301}xa")
+    firstMatchTest(#".*?a"#, input: "xxa\u{301}xaZ", match: "xxa", semanticLevel: .unicodeScalar)
+    firstMatchTest(#".+?a"#, input: "xxa\u{301}xaZ", match: "xxa\u{301}xa")
+    firstMatchTest(#".+?a"#, input: "xxa\u{301}xaZ", match: "xxa", semanticLevel: .unicodeScalar)
+    firstMatchTest(#".?a"#, input: "e\u{301}aZ", match: "e\u{301}a")
+    firstMatchTest(#".?a"#, input: "e\u{301}aZ", match: "\u{301}a", semanticLevel: .unicodeScalar)
+
+    firstMatchTest(#".+\u{301}"#, input: "aa\u{301}Z", match: nil)
+    firstMatchTest(#".+\u{301}"#, input: "aa\u{301}Z", match: "aa\u{301}", semanticLevel: .unicodeScalar)
+    firstMatchTest(#".*\u{301}"#, input: "\u{301}Z", match: "\u{301}")
+    firstMatchTest(#".*\u{301}"#, input: "\u{301}Z", match: "\u{301}", semanticLevel: .unicodeScalar)
+
+    firstMatchTest(#".?\u{301}"#, input: "aa\u{302}\u{301}Z", match: nil)
+    firstMatchTest(#".?\u{301}.?Z"#, input: "aa\u{302}\u{301}Z", match: "\u{302}\u{301}Z", semanticLevel: .unicodeScalar)
+    firstMatchTest(#".?.?\u{301}.?Z"#, input: "aa\u{302}\u{301}Z", match: "a\u{302}\u{301}Z", semanticLevel: .unicodeScalar)
+
+
+    // TODO: other test cases?
+  }
+
   func testMatchCharacterClasses() {
     // Must have new stdlib for character class ranges and word boundaries.
     guard ensureNewStdlib() else { return }
@@ -1891,6 +1935,11 @@ extension RegexTests {
   func testSingleLineMode() {
     firstMatchTest(#".+"#, input: "a\nb", match: "a")
     firstMatchTest(#"(?s:.+)"#, input: "a\nb", match: "a\nb")
+
+    // We recognize LF, line tab, FF, and CR as newlines by default
+    firstMatchTest(#"."#, input: "\u{A}\u{B}\u{C}\u{D}\nb", match: "b")
+    firstMatchTest(#".+"#, input: "\u{A}\u{B}\u{C}\u{D}\nbb", match: "bb")
+
   }
 
   func testMatchNewlines() {
@@ -2574,4 +2623,35 @@ extension RegexTests {
   func testFuzzerArtifacts() throws {
     expectCompletion(regex: #"(b?)\1*"#, in: "a")
   }
+  
+  func testIssue640() throws {
+    // Original report from https://github.com/apple/swift-experimental-string-processing/issues/640
+    let original = try Regex("[1-9][0-9]{0,2}(?:,?[0-9]{3})*")
+    XCTAssertNotNil("36,769".wholeMatch(of: original))
+    XCTAssertNotNil("36769".wholeMatch(of: original))
+
+    // Simplified case
+    let simplified = try Regex("a{0,2}a")
+    XCTAssertNotNil("aaa".wholeMatch(of: simplified))
+
+    for max in 1...8 {
+      let patternEager = "a{0,\(max)}a"
+      let regexEager = try Regex(patternEager)
+      let patternReluctant = "a{0,\(max)}?a"
+      let regexReluctant = try Regex(patternReluctant)
+      for length in 1...(max + 1) {
+        let str = String(repeating: "a", count: length)
+        if str.wholeMatch(of: regexEager) == nil {
+          XCTFail("Didn't match '\(patternEager)' in '\(str)' (\(max),\(length)).")
+        }
+        if str.wholeMatch(of: regexReluctant) == nil {
+          XCTFail("Didn't match '\(patternReluctant)' in '\(str)' (\(max),\(length)).")
+        }
+      }
+      
+      let possessiveRegex = try Regex("a{0,\(max)}+a")
+      let str = String(repeating: "a", count: max + 1)
+      XCTAssertNotNil(str.wholeMatch(of: possessiveRegex))
+    }
+  }
 }