Skip to content

Warning promotion / Consistent handling of NULL bytes in file paths (WIP) #6216

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

dtakken
Copy link

@dtakken dtakken commented Sep 25, 2020

Not all extensions consistently throw exceptions when the user passes
a path name containing null bytes. Also, some extensions would throw
a ValueError while others would throw a TypeError. Error messages
also varied.

Now a ValueError is thrown after all failed path checks, at least for
as far as these occur in functions that are exposed to userland.

@dtakken
Copy link
Author

dtakken commented Sep 25, 2020

I will try to fix the test failures after getting your opinion on this.

@dtakken dtakken marked this pull request as draft September 25, 2020 14:43
Copy link
Member

@Girgias Girgias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The approach is sound as we want consistency in this regard. 👍

@dtakken dtakken force-pushed the consistent_null_path branch from e26c8da to 8c838b6 Compare September 28, 2020 10:39
@dtakken dtakken force-pushed the consistent_null_path branch from 8c838b6 to 9926710 Compare September 28, 2020 11:09
@nikic
Copy link
Member

nikic commented Sep 28, 2020

Any particular reason this is still marked as a draft?

@dtakken dtakken marked this pull request as ready for review September 28, 2020 12:49
@dtakken
Copy link
Author

dtakken commented Sep 28, 2020

Any particular reason this is still marked as a draft?

Not anymore. ;)

@@ -1633,8 +1634,8 @@ static void _dom_document_schema_validate(INTERNAL_FUNCTION_PARAMETERS, int type
switch (type) {
case DOM_LOAD_FILE:
if (CHECK_NULL_PATH(source, source_len)) {
php_error_docref(NULL, E_WARNING, "Invalid Schema file source");
RETURN_FALSE;
zend_argument_value_error(1, "must not contain any null bytes");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In all three cases, you can also convert the source_len == 0 case to throw must not be empty.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I was already planning to target these as well. Since this PR is specifically about handling of NULL bytes I did not want to add them now.

I will add them in a separate commit, that will not hurt.

@@ -1734,8 +1735,8 @@ static void _dom_document_relaxNG_validate(INTERNAL_FUNCTION_PARAMETERS, int typ
switch (type) {
case DOM_LOAD_FILE:
if (CHECK_NULL_PATH(source, source_len)) {
php_error_docref(NULL, E_WARNING, "Invalid RelaxNG file source");
RETURN_FALSE;
zend_argument_value_error(1, "must not contain any null bytes");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the source_len == 0 case above this one is still left.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I missed that one, thanks. While giving it a second look, I also noticed other invalid path checks that should throw IMHO. I added these in a separate commit.

Also, I added some unit tests to improve coverage.

Dik Takken added 4 commits September 28, 2020 21:36
Not all extensions consistently throw exceptions when the user passes
a path name containing null bytes. Also, some extensions would throw
a ValueError while others would throw a TypeError. Error messages
also varied.

Now a ValueError is thrown after all failed path checks, at least for
as far as these occur in functions that are exposed to userland.
@dtakken dtakken force-pushed the consistent_null_path branch from 8275214 to 9df1e34 Compare September 28, 2020 19:43
php_error_docref(NULL, E_WARNING, "Invalid RelaxNG file source");
RETURN_FALSE;
zend_argument_value_error(1, "is not a valid path");
RETURN_THROWS();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand right that this triggers for too long paths? In that case, we decided that this case should not throw, and rather be treated at the same error level as a non-existing file (grep "exceeds the maximum allowed length").

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I'm not really clear on what other things this could be checking. Path length is one of them, but going down the expand_filepath and virtual_file_ex rabbit hole, there's probably more to it than that.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _dom_get_valid_file_path() method also appears to be doing platform specific checks for invalid paths, such as Windows systems that do not allow any * or ? in paths. Hence the generic "is not a valid path" message.

My rationale for making this throw is that it is not fundamentally different from throwing on null bytes. I generalized it to "nonsensical path names must throw".

I could add a separate check for long paths and make those warnings. It does feel like a strange special case to me though. The question is: why is a path containing illegal characters more nonsensical than a path that is too long?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could add a separate check for long paths and make those warnings. It does feel like a strange special case to me though. The question is: why is a path containing illegal characters more nonsensical than a path that is too long?

It isn't. Neither a too long path, nor invalid characters in the path should throw. These are environment error conditions and do not necessarily indicate programmer error. (Stepping away from DOM: If the user provides a path on the CLI, the programmer should not be forced to validate it against some platform-specific requirements that are probably not even documented anywhere before passing it to fopen, in order to avoid an exception in some edge case.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the only argument that can be made here is that null bytes shouldn't be throwing either, which is certainly a possibility. Null bytes are somewhat special in that they indicate active malicious intent, and in that they are implicitly excluded from certain input sources. E.g. if you accept a file name as a command line argument, it cannot contain null bytes in the first place.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. The RFC mainly covers promotion of programming errors, which has a rather large gray area surrounding it. I do agree that null bytes are unlikely to originate from programming errors. I will adjust the PR to make these generate a warning in stead. Likewise for long or otherwise invalid paths.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One issue regarding null bytes: As @Girgias pointed out, the p and P argument parsing formats also throw on null bytes. I'm not sure for how long this has been the case and what to do with it now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants