Skip to content

Fix #77726: Allow null character in regex patterns #8114

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from

Conversation

tobil4sk
Copy link
Contributor

@tobil4sk tobil4sk commented Feb 17, 2022

Link to #77726

In 8b3c1a3, this was disallowed to fix #55856, which was a security
issue caused by the /e modifier. The fix that was made was the
"Easier fix" as described in the original report.

With this fix, pattern strings are no longer treated as null terminated,
so null characters can be placed inside and matched against with regex
patterns without security problems, so there is no longer a reason to
give the error. Allowing this is consistent with the behaviour of many
other languages, including JavaScript, and thanks to PCRE2[0], it does
not require manually escaping null characters. Now that we can avoid the
error here without the cost of escaping characters, there is really no
need anymore to stray here from the conventional behaviour.

Currently, null characters are still disallowed before the first
delimiter and in the options section at the end of a regex string, but
these error messages have been updated. (This could be changed to
simply ignore them, like how whitespace is treated?)

[0] Since PCRE2, pattern strings no longer have to be null terminated,
and raw null characters match as normal.

In 8b3c1a3, this was disallowed to fix #55856, which was a security
issue caused by the /e modifier. The fix that was made was the
"Easier fix" as described in the original report.

With this fix, pattern strings are no longer treated as null terminated,
so null characters can be placed inside and matched against with regex
patterns without security problems, so there is no longer a reason to
give the error. Allowing this is consistent with the behaviour of many
other languages, including JavaScript, and thanks to PCRE2[0], it does
not require manually escaping null characters. Now that we can avoid the
error here without the cost of escaping characters, there is really no
need anymore to stray here from the conventional behaviour.

Currently, null characters are still disallowed before the first
delimiter and in the options section at the end of a regex string, but
these error messages have been updated.

[0] Since PCRE2, pattern strings no longer have to be null terminated,
and raw null characters match as normal.
Copy link
Member

@cmb69 cmb69 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR! This looks like a sensible improvement to me, but I'd target the "master" branch only, (a) to avoid the minor BC break in a revision due to changing error messages, and (b) to give users time to check their regexs if they accept user supplied input (some code may reject "\0", but may not cater to NUL bytes directly).

@tobil4sk tobil4sk changed the base branch from PHP-8.0 to master February 22, 2022 17:08
@tobil4sk
Copy link
Contributor Author

Thanks for the feedback! I have retargeted the pull request.

Copy link
Member

@cmb69 cmb69 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! This looks good to me (you may want to address my comments below; nothing serious, though).

Comment on lines 669 to 671
php_error_docref(NULL, E_WARNING, delimiter == '\0' ?
"Delimiter must not be a null character" :
"Delimiter must not be alphanumeric or backslash");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that there is no real need for different error messages:

Suggested change
php_error_docref(NULL, E_WARNING, delimiter == '\0' ?
"Delimiter must not be a null character" :
"Delimiter must not be alphanumeric or backslash");
php_error_docref(NULL, E_WARNING, "Delimiter must not be alphanumeric, backslash or null character");

(oh, and do we have precedence for "null character" – I prefer "NUL character", or just "NUL")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have been thinking whether it might be better to simply treat NUL as whitespace here. That would avoid the need for changing or adding error messages, and instead the only change would be the removal of the messages related to the issue. Other functions, such as trim, treat it as whitespace: https://www.php.net/manual/en/function.trim.php.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, it might be better security wise to not treat NUL bytes as whitespace here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright. I have updated the warning messages.

@tobil4sk
Copy link
Contributor Author

tobil4sk commented Mar 8, 2022

@cmb69 I have also realised that there is another warning emitted here, which got missed out. It is meant to be emitted by the preg_replace_callback_array function, however, this specific case is not covered by the tests and I'm not quite sure what the check is for, as using invalid delimiters seems to be handled by the other check anyway.

php_error_docref(NULL, E_WARNING, "Delimiter must not be alphanumeric or backslash");

This warning is only triggered if one of the strings passed in the array
is a null pointer, which cannot happen. If it is null, the
"empty regular expression" warning is printed later. Also, this does not
have anything to do with the delimiter anyway.
@tobil4sk
Copy link
Contributor Author

@cmb69 The only way that other warning is triggered is if one of the strings passed in the patterns argument of preg_replace_callback_array is a null pointer, which seems completely impossible. If it's null or an empty string then that case is already handled elsewhere. In any case, it's definitely unrelated to the delimiter. I've also looked at git blame and it's been there since the function was originally added, so it doesn't seem like it was added to fix any specific issue. preg_replace_callback doesn't have anything like this either.

I've removed the check, but I've added extra testing for the error cases of preg_replace_callback_array just to be extra safe.

@ramsey ramsey added this to the PHP 8.2 milestone May 25, 2022
Copy link
Member

@cmb69 cmb69 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, sorry, completely forgot about this. Look good. Thanks for the work, @tobilsk!

@cmb69 cmb69 closed this in 5bb3e23 Jun 17, 2022
@tobil4sk tobil4sk deleted the bug-77726 branch June 19, 2022 22:48
@tobil4sk
Copy link
Contributor Author

@cmb69 Hi, sorry to bother again, would it be possible to have my github username next to the line in the NEWS file? Thanks.

php-src/NEWS

Line 349 in d2b7d67

. Implemented FR #77726 (Allow null character in regex patterns). (cmb)

cmb69 added a commit that referenced this pull request Oct 7, 2022
@cmb69
Copy link
Member

cmb69 commented Oct 7, 2022

@tobil4sk, sorry, my bad, I shouldn't have been mentioned there. Thanks for the ping. I've just fixed that: 22db5aa

@tobil4sk
Copy link
Contributor Author

tobil4sk commented Oct 7, 2022

Many thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants