Skip to content

Drop arrays and unzip tuples for quantified captures. #162

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 15, 2022

Conversation

rxwei
Copy link
Contributor

@rxwei rxwei commented Feb 15, 2022

To bridge the typing gap between regex literals and regex builder, this patch introduces the following changes:

  1. Repeating quantifiers no longer produce an array capture. Instead, produce an optional capture when the lower bound is 0, otherwise an atom capture. After matching, this capture corresponds to the last occurrence in the match. A history-saving quantifier variant will be introduced at a later time.
  2. Applying a quantifier on a tuple-capturing regex will map quantification onto every tuple element, resulting in a tuple of quantified captures (unzipped).

let literal = /a((b)(b)+)*(c)+(d)?/
//           0 ^~~~~~~~~~~~~~~~
//            1 ^~~~~~
//             2 ^~~
//                3 ^~~
//                   4 ^~~
//                       5 ^~~
// => Regex<(Substring, Substring?, Substring?, Substring?, Substring, Substring?)>

// Equivalent regex using regex builder:
let dsl = Regex { // 0
  "a"
  zeroOrMore {
    capture { // 1
      capture { // 2
        "b"
      }
      oneOrMore {
        capture { // 3
          "b"
        }
      }
    }
  }
  oneOrMore {
    capture { // 4
      "c"
    }
  }
  optionally {
    capture { // 5
      "d"
    }
  }
}

// => Regex<(Substring, Substring?, Substring?, Substring?, Substring, Substring?)>

@rxwei
Copy link
Contributor Author

rxwei commented Feb 15, 2022

@swift-ci please test Linux

@rxwei rxwei requested a review from milseman February 15, 2022 12:31
@milseman
Copy link
Member

Outside scope of this PR, but a capture span pretty-printer for the command-line tools would be helpful, so that it can support:

let literal = /a((b)+)*(c)+(d)?/
//           0 ^~~~~~~~~~~~~~~~
//            1 ^~~~~~
//             2 ^~~
//                   3 ^~~
//                       4 ^~~
// => Regex<(Substring, Substring?, Substring?, Substring, Substring?)>

Copy link
Member

@milseman milseman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM, but I'm still unsure about * producing optional.

Also, while some of the capture tests have equivalent versions for * and +, I eventually stopped making copies for + since it was the same. Now that it's different, we may want to introduce it back in.

On the other hand, with the dropping of Array, so many things should start working so we can still overhaul those tests.

To bridge the typing gap between regex literals and regex builder, this patch introduces the following changes:

1. Repeating quantifiers no longer produce an array capture. Instead, produce an optional capture when the lower bound is `0`, otherwise an atom capture. After matching, this capture corresponds to the last occurrence in the match. A history-saving quantifier variant will be introduced at a later time.
2. Applying a quantifier on a tuple-capturing regex will map quantification onto every tuple element, resulting in a tuple of quantified captures (unzipped).

-----

```swift
let literal = /a((b)(b)+)*(c)+(d)?/
//           0 ^~~~~~~~~~~~~~~~
//            1 ^~~~~~
//             2 ^~~
//                3 ^~~
//                   4 ^~~
//                       5 ^~~
// => Regex<(Substring, Substring?, Substring?, Substring?, Substring, Substring?)>

// Equivalent regex using regex builder:
let dsl = Regex { // 0
  "a"
  zeroOrMore {
    capture { // 1
      capture { // 2
        "b"
      }
      oneOrMore {
        capture { // 3
          "b"
        }
      }
    }
  }
  oneOrMore {
    capture { // 4
      "c"
    }
  }
  optionally {
    capture { // 5
      "d"
    }
  }
}

// => Regex<(Substring, Substring?, Substring?, Substring?, Substring, Substring?)>
```
@rxwei
Copy link
Contributor Author

rxwei commented Feb 15, 2022

@swift-ci please test Linux

@rxwei
Copy link
Contributor Author

rxwei commented Feb 15, 2022

but I'm still unsure about * producing optional.

Discussed in person and converged on producing optional as PCRE produces null on empty match. Going to merge this to unblock future work. We can add more quantifier tests in a different PR.

@rxwei rxwei merged commit b7a0196 into swiftlang:main Feb 15, 2022
@rxwei rxwei deleted the drop-array branch February 15, 2022 20:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants