-
Notifications
You must be signed in to change notification settings - Fork 1
Groups
Groups are used in a regex for one (or both) of two purposes:
- To group a number of elements together so a quantifier can be applied to the whole group.
- To "remember" part of the text matched by the regex so we can extract it later using the
group()
method of thejava.util.regex.Matcher
class.
Grouping for quantifiers is pretty self-explanatory. A quantifier passed to the endGroup()
method will apply to the whole group. For example, this is a very simple regex to match a normal sentence:
Pattern regex = new RegexBuilder()
.startGroup()
.wordCharacter(RegexQuantifier.oneOrMore())
.whitespace(RegexQuantifier.oneOrMore())
.endGroup(RegexQuantifier.oneOrMore())
.wordCharacter(RegexQuantifier.oneOrMore())
.text(".")
.buildRegex();
Say we want to match a person's name (two consecutive words each beginning with a capital letter) and then greet them by their first name, we could build a regex like this:
Pattern regex = new RegexBuilder()
.wordBoundary()
.startGroup()
.uppercaseLetter()
.lowercaseLetter(RegexQuantifier.oneOrMore())
.endGroup()
.whitespace()
.uppercaseLetter()
.lowercaseLetter(RegexQuantifier.oneOrMore())
.wordBoundary()
.buildRegex();
We can then extract the first name from a successful match like this:
Matcher matcher = regex.matcher(inputString)
if (matcher.find())
{
String firstName = matcher.group(1);
}
Note that group()
is indexed from 1, not 0. For reasons documented elsewhere, group(0)
will return the whole matched string.
If you prefer to avoid numerical indices altogether you can also define named groups which are then indexed by name. Using named groups, our code would look like this:
Pattern regex = new RegexBuilder()
.wordBoundary()
.startNamedGroup("firstName")
.uppercaseLetter()
.lowercaseLetter(RegexQuantifier.oneOrMore())
.endGroup()
.whitespace()
.uppercaseLetter()
.lowercaseLetter(RegexQuantifier.oneOrMore())
.wordBoundary()
.buildRegex();
Matcher matcher = regex.matcher(inputString)
if (matcher.find())
{
String firstName = matcher.group("firstName");
}
As with raw regexes, RegexBuilder
allows you to nest groups to arbitrary depth. If you use capturing groups, Matcher.group(1)
will refer to the first started group, and so on. For example:
Pattern regex = new RegexBuilder()
.wordBoundary()
.startGroup() // start of group 1
.startGroup() // start of group 2
.uppercaseLetter()
.endGroup() // end of group 2
.lowercaseLetter(RegexQuantifier.oneOrMore())
.endGroup(RegexQuantifier.oneOrMore()) // end of group 1
.wordBoundary()
.buildRegex();
Matcher matcher = regex.Match("sorry Dave, I can't do that");
if (matcher.find())
{
String name = matcher.group(1); // "Dave"
String initial = matcher.group(2); // "D"
}
Method | Description | Raw regex equivalent |
---|---|---|
startGroup() |
Start a group which can be extracted later by calling Matcher.group(int) . |
( |
startNamedGroup(String name) |
Start a group which can be extracted later by calling Matcher.group(String) . |
(?< name>
|
startNonCapturingGroup() |
Start a group which cannot be extracted later by calling Matcher.group() . This can be useful if you have more than one group in a regex, and you don't want to a group that's purely for quantifiers to disrupt the indices of your capturing groups. |
(?: |
endGroup() |
End the current group (the innermost group in the case of nested groups), optionally specifying a quantifier for the group. | ) |
RegexToolbox: Now you can be a hero without knowing regular expressions.