-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Warning promotion / Consistent handling of NULL bytes in file paths (WIP) #6216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I will try to fix the test failures after getting your opinion on this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The approach is sound as we want consistency in this regard. 👍
e26c8da
to
8c838b6
Compare
8c838b6
to
9926710
Compare
Any particular reason this is still marked as a draft? |
Not anymore. ;) |
@@ -1633,8 +1634,8 @@ static void _dom_document_schema_validate(INTERNAL_FUNCTION_PARAMETERS, int type | |||
switch (type) { | |||
case DOM_LOAD_FILE: | |||
if (CHECK_NULL_PATH(source, source_len)) { | |||
php_error_docref(NULL, E_WARNING, "Invalid Schema file source"); | |||
RETURN_FALSE; | |||
zend_argument_value_error(1, "must not contain any null bytes"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In all three cases, you can also convert the source_len == 0
case to throw must not be empty
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I was already planning to target these as well. Since this PR is specifically about handling of NULL bytes I did not want to add them now.
I will add them in a separate commit, that will not hurt.
@@ -1734,8 +1735,8 @@ static void _dom_document_relaxNG_validate(INTERNAL_FUNCTION_PARAMETERS, int typ | |||
switch (type) { | |||
case DOM_LOAD_FILE: | |||
if (CHECK_NULL_PATH(source, source_len)) { | |||
php_error_docref(NULL, E_WARNING, "Invalid RelaxNG file source"); | |||
RETURN_FALSE; | |||
zend_argument_value_error(1, "must not contain any null bytes"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the source_len == 0 case above this one is still left.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I missed that one, thanks. While giving it a second look, I also noticed other invalid path checks that should throw IMHO. I added these in a separate commit.
Also, I added some unit tests to improve coverage.
Not all extensions consistently throw exceptions when the user passes a path name containing null bytes. Also, some extensions would throw a ValueError while others would throw a TypeError. Error messages also varied. Now a ValueError is thrown after all failed path checks, at least for as far as these occur in functions that are exposed to userland.
8275214
to
9df1e34
Compare
php_error_docref(NULL, E_WARNING, "Invalid RelaxNG file source"); | ||
RETURN_FALSE; | ||
zend_argument_value_error(1, "is not a valid path"); | ||
RETURN_THROWS(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do I understand right that this triggers for too long paths? In that case, we decided that this case should not throw, and rather be treated at the same error level as a non-existing file (grep "exceeds the maximum allowed length").
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I'm not really clear on what other things this could be checking. Path length is one of them, but going down the expand_filepath and virtual_file_ex rabbit hole, there's probably more to it than that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _dom_get_valid_file_path()
method also appears to be doing platform specific checks for invalid paths, such as Windows systems that do not allow any *
or ?
in paths. Hence the generic "is not a valid path" message.
My rationale for making this throw is that it is not fundamentally different from throwing on null bytes. I generalized it to "nonsensical path names must throw".
I could add a separate check for long paths and make those warnings. It does feel like a strange special case to me though. The question is: why is a path containing illegal characters more nonsensical than a path that is too long?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could add a separate check for long paths and make those warnings. It does feel like a strange special case to me though. The question is: why is a path containing illegal characters more nonsensical than a path that is too long?
It isn't. Neither a too long path, nor invalid characters in the path should throw. These are environment error conditions and do not necessarily indicate programmer error. (Stepping away from DOM: If the user provides a path on the CLI, the programmer should not be forced to validate it against some platform-specific requirements that are probably not even documented anywhere before passing it to fopen, in order to avoid an exception in some edge case.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the only argument that can be made here is that null bytes shouldn't be throwing either, which is certainly a possibility. Null bytes are somewhat special in that they indicate active malicious intent, and in that they are implicitly excluded from certain input sources. E.g. if you accept a file name as a command line argument, it cannot contain null bytes in the first place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. The RFC mainly covers promotion of programming errors, which has a rather large gray area surrounding it. I do agree that null bytes are unlikely to originate from programming errors. I will adjust the PR to make these generate a warning in stead. Likewise for long or otherwise invalid paths.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One issue regarding null bytes: As @Girgias pointed out, the p
and P
argument parsing formats also throw on null bytes. I'm not sure for how long this has been the case and what to do with it now.
Not all extensions consistently throw exceptions when the user passes
a path name containing null bytes. Also, some extensions would throw
a ValueError while others would throw a TypeError. Error messages
also varied.
Now a ValueError is thrown after all failed path checks, at least for
as far as these occur in functions that are exposed to userland.