Skip to content

preg_match is inconsistent in how it handles unmatched capturing groups #17934

Open
@mikethea1

Description

@mikethea1

Description

The following code:

<?php
$pattern = '/A(?<a>a)|B(?<b>b)/';
preg_match($pattern, 'Aa', $matches);
echo json_encode($matches)."\n";
preg_match($pattern, 'Bb', $matches);
echo json_encode($matches)."\n";

Resulted in this output:

{"0":"Aa","a":"a","1":"a"}
{"0":"Bb","a":"","1":"","b":"b","2":"b"}

But I expected this output instead:

{"0":"Aa","a":"a","1":"a"}
{"0":"Bb","b":"b","2":"b"}

I would expect that a capturing group only appears in the match if the pattern captured that group. However, it seems like in some cases groups on the left side of an alternation will appear with an empty string as the value while groups on the right side are omitted when they aren't captured.

This makes it difficult to easily ask "Did group N get captured?" because, depending on the structure of the regex, sometimes "not captured" will report as empty string and sometimes it will report as an omitted key. The problem is even more confusing if empty string was a possible capture for the group; in that case there's no way to tell what happened without using PREG_OFFSET_CAPTURE which gives -1 for the extraneous matches.

PHP Version

PHP 8.2.12

Operating System

Windows 11

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions