-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Fix #79423: copy command is limited to size of file it can copy #5319
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
While I think this fix makes generally sense, or at least there should be an assertion regarding the size limit, we may want to catch this earlier: Lines 32 to 36 in 4e330d3
In this case Also, I'm not sure about the test case, which runs for quite a while. |
main/streams/plain_wrapper.c
Outdated
# else | ||
if (file_size.HighPart) { | ||
/* file is too large; process as much as possible */ | ||
size = SIZE_MAX; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this still copy the rest as a separate chunk, or produce a partial copy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would copy only SIZE_MAX
bytes, but copy() returns false in that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would happen on a 32bit POSIX system? Would stat()
fail for 4GiB files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least according to POSIX it should fail (likely for >= 2 GiB already), so I matched the implementation; that doesn't make a real differrence in practise, though.
Note that this check is gone in PHP >= 7.4. Generally, this code is flawed. We should not make the assumption that we can map the whole file at once. On 32-bit we are likely to run out of virtual memory even before reaching SIZE_MAX. What we should be doing here is to perform the copy using multiple chunks that are large, but still significantly below SIZE_MAX (say 1G). That would of course require changes to the stream mmap interface, as we'd have to distinguish between whether the file was fully mapped or not. |
I agree that we should not attempt to mmap whole huge files here. However, where that fails, it's apparently not that bad; it's worse for systems where it succeeds, but causes swapping (I tried to copy a 4GB file on a machine which doesn't have 4GB free memory, and the x86 build was way faster than the x64 build). I think that this is a different issue from getting the wrong file size.
Sorry, I do not understand. Wouldn't we rather have to call |
What I had in mind here is that we're passing length=0 to let the mmap code determine the appropriate size. But I guess we could just as well pass 1G in as length, and the implementation would truncate this down to the actually available size, in which case we would determine whether to perform another mmap by checking whether the actual size matches the passed size.
That's pretty odd, is this a Windows-ism? I removed the size check on the assumption that doing a non-private mmap with physical file backing would not swap. |
Yes, that would work. And at least _php_stream_copy_to_stream_ex likely already knows the size of the file.
May very well be. Anyhow, that check would not have helped (because |
Passing `NULL` as `lpFileSizeHigh` to `GetFileSize()` gives wrong results for files larger than 0xFFFFFFFF bytes. We fix this by using `GetFileSizeEx()`, and proceed with `MIN(file_size, SIZE_MAX)`.
I've finally managed to address this issue; in my opinion it is not related to bug 79423 (which this PR addresses), so I've filed it as https://bugs.php.net/80061, and submitted PR #6078. |
The test is too resource hungry for CI, and also not unlikely for other test environments, since we're creating a 4GB file which is fully copied.
POSIX mandates that `stat()` should fail, if the file size in bytes cannot be correctly represented in the stat buffer's `st_size` member. We match that behavior, except that we basically treat `st_size` as unsigned here. We also fix a temporary resource leak, introduced by a recent commit.
Any objections to merge that? |
Passing
NULL
aslpFileSizeHigh
toGetFileSize()
gives wrongresults for files larger than 0xFFFFFFFF bytes. We fix this by using
GetFileSizeEx()
, and proceed withMIN(file_size, SIZE_MAX)
.