Skip to content

Fix #79423: copy command is limited to size of file it can copy #5319

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

cmb69
Copy link
Member

@cmb69 cmb69 commented Mar 29, 2020

Passing NULL as lpFileSizeHigh to GetFileSize() gives wrong
results for files larger than 0xFFFFFFFF bytes. We fix this by using
GetFileSizeEx(), and proceed with MIN(file_size, SIZE_MAX).

@cmb69 cmb69 changed the base branch from master to PHP-7.3 March 29, 2020 12:57
@cmb69
Copy link
Member Author

cmb69 commented Mar 29, 2020

While I think this fix makes generally sense, or at least there should be an assertion regarding the size limit, we may want to catch this earlier:

/* For now, we impose an arbitrary limit to avoid
* runaway swapping when large files are passed through. */
if (length > 4 * 1024 * 1024) {
return NULL;
}

In this case length == 0 which could easily be added to the condition. However, that would eschew mmapping even for small files.

Also, I'm not sure about the test case, which runs for quite a while.

@cmb69 cmb69 added the Bug label Mar 30, 2020
# else
if (file_size.HighPart) {
/* file is too large; process as much as possible */
size = SIZE_MAX;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this still copy the rest as a separate chunk, or produce a partial copy?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would copy only SIZE_MAX bytes, but copy() returns false in that case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would happen on a 32bit POSIX system? Would stat() fail for 4GiB files?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least according to POSIX it should fail (likely for >= 2 GiB already), so I matched the implementation; that doesn't make a real differrence in practise, though.

@nikic
Copy link
Member

nikic commented Mar 30, 2020

While I think this fix makes generally sense, or at least there should be an assertion regarding the size limit, we may want to catch this earlier:

Note that this check is gone in PHP >= 7.4.

Generally, this code is flawed. We should not make the assumption that we can map the whole file at once. On 32-bit we are likely to run out of virtual memory even before reaching SIZE_MAX.

What we should be doing here is to perform the copy using multiple chunks that are large, but still significantly below SIZE_MAX (say 1G). That would of course require changes to the stream mmap interface, as we'd have to distinguish between whether the file was fully mapped or not.

@cmb69
Copy link
Member Author

cmb69 commented Mar 30, 2020

I agree that we should not attempt to mmap whole huge files here. However, where that fails, it's apparently not that bad; it's worse for systems where it succeeds, but causes swapping (I tried to copy a 4GB file on a machine which doesn't have 4GB free memory, and the x86 build was way faster than the x64 build). I think that this is a different issue from getting the wrong file size.

That would of course require changes to the stream mmap interface, as we'd have to distinguish between whether the file was fully mapped or not.

Sorry, I do not understand. Wouldn't we rather have to call php_stream_mmap_range() repeatedly to get the individual chunks?

@nikic
Copy link
Member

nikic commented Mar 30, 2020

Sorry, I do not understand. Wouldn't we rather have to call php_stream_mmap_range() repeatedly to get the individual chunks?

What I had in mind here is that we're passing length=0 to let the mmap code determine the appropriate size. But I guess we could just as well pass 1G in as length, and the implementation would truncate this down to the actually available size, in which case we would determine whether to perform another mmap by checking whether the actual size matches the passed size.

I agree that we should not attempt to mmap whole huge files here. However, where that fails, it's apparently not that bad; it's worse for systems where it succeeds, but causes swapping (I tried to copy a 4GB file on a machine which doesn't have 4GB free memory, and the x86 build was way faster than the x64 build). I think that this is a different issue from getting the wrong file size.

That's pretty odd, is this a Windows-ism? I removed the size check on the assumption that doing a non-private mmap with physical file backing would not swap.

@cmb69
Copy link
Member Author

cmb69 commented Mar 30, 2020

But I guess we could just as well pass 1G in as length, and the implementation would truncate this down to the actually available size, in which case we would determine whether to perform another mmap by checking whether the actual size matches the passed size.

Yes, that would work. And at least _php_stream_copy_to_stream_ex likely already knows the size of the file.

That's pretty odd, is this a Windows-ism?

May very well be. Anyhow, that check would not have helped (because maxlen == 0).

Passing `NULL` as `lpFileSizeHigh` to `GetFileSize()` gives wrong
results for files larger than 0xFFFFFFFF bytes.  We fix this by using
`GetFileSizeEx()`, and proceed with `MIN(file_size, SIZE_MAX)`.
@cmb69
Copy link
Member Author

cmb69 commented Sep 4, 2020

Generally, this code is flawed. We should not make the assumption that we can map the whole file at once. On 32-bit we are likely to run out of virtual memory even before reaching SIZE_MAX.

I've finally managed to address this issue; in my opinion it is not related to bug 79423 (which this PR addresses), so I've filed it as https://bugs.php.net/80061, and submitted PR #6078.

The test is too resource hungry for CI, and also not unlikely for other
test environments, since we're creating a 4GB file which is fully
copied.
POSIX mandates that `stat()` should fail, if the file size in bytes
cannot be correctly represented in the stat buffer's `st_size` member.
We match that behavior, except that we basically treat `st_size` as
unsigned here.

We also fix a temporary resource leak, introduced by a recent commit.
@cmb69
Copy link
Member Author

cmb69 commented Sep 22, 2020

Any objections to merge that?

@php-pulls php-pulls closed this in 4000780 Sep 22, 2020
@cmb69 cmb69 deleted the cmb/79423 branch September 22, 2020 11:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants