Skip to content

proposal: regexp: add iterator forms of matching methods #61902

Open
@rsc

Description

@rsc

We propose to add methods to regexp that allow iterating over matches instead of having to accumulate all the matches into a slice.

This is one of a collection of proposals updating the standard library for the new 'range over function' feature (#61405). It would only be accepted if that proposal is accepted. See #61897 for a list of related proposals.

Regexp has a lot of methods that return slices of all matches (the “FindAll*” methods). Each should have an iterator equivalent that doesn’t build the slice. They can be named by removing the “Find” prefix. The docs would change as follows. (Plain text is unchanged; strikethrough is removed, bold is added):

There are 16 24 methods of Regexp that match a regular expression and identify the matched text. Their names are matched by this regular expression:

(Find|All|FindAll)?(String)?(Submatch)?(Index)?

If 'All' is present, the routine matches successive non-overlapping matches of the entire expression.
The ‘Find’ form returns the first match. The ‘All’ form returns an iterator over all matches.
Empty matches abutting a preceding match are ignored.
The For ‘FindAll’, the return value is a slice containing the successive return values of the corresponding non-’All’ non-‘Find’ routine. These The ‘FindAll’ routines take an extra integer argument, ...

Instead of enumerating all eight methods here, let’s just show one example.
FindAllString currently reads:

// FindAllString is the 'All' version of FindString; it returns a slice of all
// successive matches of the expression, as defined by the 'All' description in
// the package comment. A return value of nil indicates no match.
func (re *Regexp) FindAllString(s string, n int) []string

This would change to become a pair of methods:

// FindAllString is the 'All' 'FindAll' version of FindString; it returns a slice of all
// successive matches of the expression, as defined by the 'All' 'FindAll' description in
// the package comment. A return value of nil indicates no match.
func (re *Regexp) FindAllString(s string, n int) []string

// AllString is the ‘All’ version of ‘FindString’; it returns an iterator over all
// successive matches of the expression, as defined by the ‘All’ description in
// the package comment.
func (re *Regexp) AllString(s string) iter.Seq[[]string]

The full list is:

// All is the ‘All’ version of ‘Find’: it returns an iterator over all ...
func (re *Regexp) All(b []byte) iter.Seq[[]byte]

// AllIndex is the ‘All’ version of ‘FindIndex’: it returns an iterator over all ...
func (re *Regexp) AllIndex(b []byte) iter.Seq[[]int]

// AllString is the ‘All’ version of ‘FindString’: it returns an iterator over all ...
func (re *Regexp) AllString(s string) iter.Seq[string]

// AllStringIndex is the ‘All’ version of ‘FindStringIndex’: it returns an iterator over all ...
func (re *Regexp) AllStringIndex(s string) iter.Seq[[]int]

// AllStringSubmatch is the ‘All’ version of ‘FindStringSubmatch’: it returns an iterator ...
func (re *Regexp) AllStringSubmatch(s string) iter.Seq[[]string]

// AllStringSubmatchIndex is the ‘All’ version of ‘FindStringSubmatchIndex’: it returns ...
func (re *Regexp) AllStringSubmatchIndex(s string) iter.Seq[[]int]

// AllSubmatch is the ‘All’ version of ‘FindSubmatch’: it returns an iterator over all ...
func (re *Regexp) AllSubmatch(b []byte) iter.Seq[[][]byte]

// AllSubmatchIndex is the ‘All’ version of ‘FindSubmatchIndex’: it returns an iterator ...
func (re *Regexp) AllSubmatchIndex(b []byte) iter.Seq[[]int]

There would also be a new SplitSeq method alongside regexp.Regexp.Split, completing the analogy with strings.Split and strings.SplitSeq.

// SplitSeq returns an iterator over substrings of s separated by the expression.
func (re *Regexp) SplitSeq(s string) iter.Seq[string]

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    Hold

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions