-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Fix #77726: Allow null character in regex patterns #8114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
In 8b3c1a3, this was disallowed to fix #55856, which was a security issue caused by the /e modifier. The fix that was made was the "Easier fix" as described in the original report. With this fix, pattern strings are no longer treated as null terminated, so null characters can be placed inside and matched against with regex patterns without security problems, so there is no longer a reason to give the error. Allowing this is consistent with the behaviour of many other languages, including JavaScript, and thanks to PCRE2[0], it does not require manually escaping null characters. Now that we can avoid the error here without the cost of escaping characters, there is really no need anymore to stray here from the conventional behaviour. Currently, null characters are still disallowed before the first delimiter and in the options section at the end of a regex string, but these error messages have been updated. [0] Since PCRE2, pattern strings no longer have to be null terminated, and raw null characters match as normal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR! This looks like a sensible improvement to me, but I'd target the "master" branch only, (a) to avoid the minor BC break in a revision due to changing error messages, and (b) to give users time to check their regexs if they accept user supplied input (some code may reject "\0", but may not cater to NUL bytes directly).
Thanks for the feedback! I have retargeted the pull request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! This looks good to me (you may want to address my comments below; nothing serious, though).
ext/pcre/php_pcre.c
Outdated
php_error_docref(NULL, E_WARNING, delimiter == '\0' ? | ||
"Delimiter must not be a null character" : | ||
"Delimiter must not be alphanumeric or backslash"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me that there is no real need for different error messages:
php_error_docref(NULL, E_WARNING, delimiter == '\0' ? | |
"Delimiter must not be a null character" : | |
"Delimiter must not be alphanumeric or backslash"); | |
php_error_docref(NULL, E_WARNING, "Delimiter must not be alphanumeric, backslash or null character"); |
(oh, and do we have precedence for "null character" – I prefer "NUL character", or just "NUL")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have been thinking whether it might be better to simply treat NUL as whitespace here. That would avoid the need for changing or adding error messages, and instead the only change would be the removal of the messages related to the issue. Other functions, such as trim, treat it as whitespace: https://www.php.net/manual/en/function.trim.php.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, it might be better security wise to not treat NUL bytes as whitespace here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright. I have updated the warning messages.
Avoid extra warning
@cmb69 I have also realised that there is another warning emitted here, which got missed out. It is meant to be emitted by the Line 2440 in 0f96e6a
|
This warning is only triggered if one of the strings passed in the array is a null pointer, which cannot happen. If it is null, the "empty regular expression" warning is printed later. Also, this does not have anything to do with the delimiter anyway.
@cmb69 The only way that other warning is triggered is if one of the strings passed in the patterns argument of I've removed the check, but I've added extra testing for the error cases of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, sorry, completely forgot about this. Look good. Thanks for the work, @tobilsk!
Many thanks! |
Link to #77726
In 8b3c1a3, this was disallowed to fix #55856, which was a security
issue caused by the /e modifier. The fix that was made was the
"Easier fix" as described in the original report.
With this fix, pattern strings are no longer treated as null terminated,
so null characters can be placed inside and matched against with regex
patterns without security problems, so there is no longer a reason to
give the error. Allowing this is consistent with the behaviour of many
other languages, including JavaScript, and thanks to PCRE2[0], it does
not require manually escaping null characters. Now that we can avoid the
error here without the cost of escaping characters, there is really no
need anymore to stray here from the conventional behaviour.
Currently, null characters are still disallowed before the first
delimiter and in the options section at the end of a regex string, but
these error messages have been updated. (This could be changed to
simply ignore them, like how whitespace is treated?)
[0] Since PCRE2, pattern strings no longer have to be null terminated,
and raw null characters match as normal.