Skip to content

ext/pcre: Add "/r" modifier #13583

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions UPGRADING
Original file line number Diff line number Diff line change
Expand Up @@ -210,6 +210,10 @@ PHP 8.4 UPGRADE NOTES
As a consequence, LoongArch JIT support has been added, spaces
are now allowed between braces in Perl-compatible items, and
variable-length lookbehind assertions are now supported.
. Added support for the "r" (PCRE2_EXTRA_CASELESS_RESTRICT) modifier, as well
as the (?r) mode modifier. When enabled along with the case-insensitive
modifier ("i"), the expression locks out mixing of ASCII and non-ASCII
characters.

- PDO:
. Added support for driver-specific subclasses.
Expand Down
3 changes: 3 additions & 0 deletions UPGRADING.INTERNALS
Original file line number Diff line number Diff line change
Expand Up @@ -185,6 +185,9 @@ PHP 8.4 INTERNALS UPGRADE NOTES
When flags should be ignored, pass 0 to the flags argument.
- php_pcre_match_impl() and pcre_get_compiled_regex_cache_ex() now use
proper boolean argument types instead of integer types.
- pcre_get_compiled_regex_cache_ex() now provides an option to collect extra
options (from modifiers used in the expression, for example), and calls
pcre2_set_compile_extra_options() with those options.

========================
4. OpCode changes
Expand Down
4 changes: 4 additions & 0 deletions ext/pcre/php_pcre.c
Original file line number Diff line number Diff line change
Expand Up @@ -592,6 +592,7 @@ PHPAPI pcre_cache_entry* pcre_get_compiled_regex_cache_ex(zend_string *regex, bo
#else
uint32_t coptions = 0;
#endif
uint32_t eoptions = PHP_PCRE_DEFAULT_EXTRA_COPTIONS;
PCRE2_UCHAR error[128];
PCRE2_SIZE erroffset;
int errnumber;
Expand Down Expand Up @@ -722,6 +723,7 @@ PHPAPI pcre_cache_entry* pcre_get_compiled_regex_cache_ex(zend_string *regex, bo
/* PCRE specific options */
case 'A': coptions |= PCRE2_ANCHORED; break;
case 'D': coptions |= PCRE2_DOLLAR_ENDONLY;break;
case 'r': eoptions |= PCRE2_EXTRA_CASELESS_RESTRICT; break;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might need to be done conditionally, break with --with-external-pcre with older pcre2 version.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch; do you prepare a PR or do I?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feel free :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nielsdos did this in #13662, thank you!

case 'S': /* Pass. */ break;
case 'X': /* Pass. */ break;
case 'U': coptions |= PCRE2_UNGREEDY; break;
Expand Down Expand Up @@ -776,6 +778,8 @@ PHPAPI pcre_cache_entry* pcre_get_compiled_regex_cache_ex(zend_string *regex, bo
}
pcre2_set_character_tables(cctx, tables);

pcre2_set_compile_extra_options(cctx, eoptions);

/* Compile pattern and display a warning if compilation failed. */
re = pcre2_compile((PCRE2_SPTR)pattern, pattern_len, coptions, &errnumber, &erroffset, cctx);

Expand Down
101 changes: 101 additions & 0 deletions ext/pcre/tests/preg_match_caseless_restrict.phpt
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
--TEST--
testing /r modifier in preg_* functions
--FILE--
<?php
echo "SK substitute matching" . PHP_EOL;
var_dump(preg_match('/AskZ/iur', 'AskZ')); // match
var_dump(preg_match('/AskZ/iur', 'aSKz')); // match
var_dump(preg_match('/AskZ/iur', "A\u{17f}kZ")); // no match
var_dump(preg_match('/AskZ/iur', "As\u{212a}Z")); // no match
var_dump(preg_match('/AskZ/iu', 'AskZ')); // match
var_dump(preg_match('/AskZ/iu', 'aSKz')); // match
var_dump(preg_match('/AskZ/iu', "A\u{17f}kZ")); // match
var_dump(preg_match('/AskZ/iu', "As\u{212a}Z")); // match

echo "K substitute matching" . PHP_EOL;
var_dump(preg_match('/k/iu', "\u{212A}"));
var_dump(preg_match('/k/iur', "\u{212A}"));

echo "non-ASCII in expressions" . PHP_EOL;
var_dump(preg_match('/A\x{17f}\x{212a}Z/iu', 'AskZ')); // match
var_dump(preg_match('/A\x{17f}\x{212a}Z/iur', 'AskZ')); // no match

echo "Character sets" . PHP_EOL;
var_dump(preg_match('/[AskZ]+/iur', 'AskZ')); // match
var_dump(preg_match('/[AskZ]+/iur', 'aSKz')); // match
var_dump(preg_match('/[AskZ]+/iur', "A\u{17f}kZ")); // match
var_dump(preg_match('/[AskZ]+/iur', "As\u{212a}Z")); // match
var_dump(preg_match('/[AskZ]+/iu', 'AskZ')); // match
var_dump(preg_match('/[AskZ]+/iu', 'aSKz')); // match
var_dump(preg_match('/[AskZ]+/iu', "A\u{17f}kZ")); // match
var_dump(preg_match('/[AskZ]+/iu', "As\u{212a}Z")); // match

echo "non-ASCII in character sets" . PHP_EOL;
var_dump(preg_match('/[\x{17f}\x{212a}]+/iur', 'AskZ')); // no match
var_dump(preg_match('/[\x{17f}\x{212a}]+/iu', 'AskZ')); // match

echo "Meta characters and negate character sets". PHP_EOL;
var_dump(preg_match('/[^s]+/iur', "A\u{17f}Z")); // match
var_dump(preg_match('/[^s]+/iu', "A\u{17f}Z")); // match
var_dump(preg_match('/[^s]+/iu', "A\u{17f}Z")); // match
var_dump(preg_match('/[^k]+/iur', "A\u{212a}Z")); // match
var_dump(preg_match('/[^k]+/iu', "A\u{212a}Z")); // match
var_dump(preg_match('/[^sk]+/iur', "A\u{17f}\u{212a}Z")); // match
var_dump(preg_match('/[^sk]+/iu', "A\u{17f}\u{212a}Z")); // match
var_dump(preg_match('/[^\x{17f}]+/iur', "AsSZ")); // match
var_dump(preg_match('/[^\x{17f}]+/iu', "AsSZ")); // match

echo "Modifier used within the expression" . PHP_EOL;
var_dump(preg_match('/s(?r)s(?-r)s(?r:s)s/iu', "\u{17f}S\u{17f}S\u{17f}")); // match
var_dump(preg_match('/s(?r)s(?-r)s(?r:s)s/iu', "\u{17f}\u{17f}\u{17f}S\u{17f}")); // no match
var_dump(preg_match('/s(?r)s(?-r)s(?r:s)s/iu', "\u{17f}S\u{17f}\u{17f}\u{17f}")); // no match
var_dump(preg_match('/k(?^i)k/iur', "K\u{212a}")); // match
var_dump(preg_match('/k(?^i)k/iur', "\u{212a}\u{212a}")); // no match

echo "Done";
?>
--EXPECT--
SK substitute matching
int(1)
int(1)
int(0)
int(0)
int(1)
int(1)
int(1)
int(1)
K substitute matching
int(1)
int(0)
non-ASCII in expressions
int(1)
int(0)
Character sets
int(1)
int(1)
int(1)
int(1)
int(1)
int(1)
int(1)
int(1)
non-ASCII in character sets
int(0)
int(1)
Meta characters and negate character sets
int(1)
int(1)
int(1)
int(1)
int(1)
int(1)
int(1)
int(1)
int(1)
Modifier used within the expression
int(1)
int(0)
int(0)
int(1)
int(0)
Done