Skip to content

Fix #69279: Compressed ZIP Phar extractTo() creates garbage files #6599

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

cmb69
Copy link
Member

@cmb69 cmb69 commented Jan 12, 2021

When extracting compressed files from an uncompressed Phar, we must not
use the direct file pointer, but rather get an uncompressed file
pointer.

This also fixes #79912, which appears to be a duplicate of #69279.

When extracting compressed files from an uncompressed Phar, we must not
use the direct file pointer, but rather get an uncompressed file
pointer.

This also fixes #79912, which appears to be a duplicate of #69279.
@cmb69 cmb69 added the Bug label Jan 12, 2021
Copy link
Contributor

@afilina afilina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read both bug reports. I only have one small comment change suggestion, as it seemed like a forgotten copy-paste artifact. Otherwise, LGTM.

Co-authored-by: Anna Filina <afilina@gmail.com>
@cmb69
Copy link
Member Author

cmb69 commented Jan 18, 2021

@bishopb, what do you think about this?

@nikic
Copy link
Member

nikic commented Jan 18, 2021

To check my understanding of how these APIs work: phar_get_efp() can return either a compressed or uncompressed FP. Calling phar_open_entry_fp() will convert it into an uncompressed FP. Is that right?

@cmb69
Copy link
Member Author

cmb69 commented Jan 18, 2021

phar_get_efp() can return either a compressed or uncompressed FP. Calling phar_open_entry_fp() will convert it into an uncompressed FP. Is that right?

Yes.

@nikic
Copy link
Member

nikic commented Jan 18, 2021

Heh, that seems like ... not so great API design. Can we run into the reverse issue afterwards, where some code expects phar_get_efp() to return the compressed FP but will now get the uncompressed one?

@cmb69
Copy link
Member Author

cmb69 commented Jan 18, 2021

To verify this, I created a simple ZIP with a deflated entry bar.txt followed by a stored entry foo.txt. Extracting produced a correctly uncompressed bar.txt, but a garbage foo.txt. Took me a while to find out the foo.txt was extracted from a wrong file offset. The offset would also be wrong without this patch. I think I need to look into this issue first.

@cmb69 cmb69 marked this pull request as draft January 18, 2021 14:03
@cmb69
Copy link
Member Author

cmb69 commented Jan 18, 2021

The offset would also be wrong without this patch. I think I need to look into this issue first.

I found that is due to different extra field lengths in the local file header and the central directory header for (some?) ZIPs created by 7z or Info-ZIP's zip (which may be valid or not according to the specification). The offset of the file data in the archive is determined by the extra field length of the central directory header, so may not actually be correct. I don't know why the lengths differ, but unless someone files a bug report about that, I won't investigate further.

phar_get_efp() can return either a compressed or uncompressed FP. Calling phar_open_entry_fp() will convert it into an uncompressed FP. Is that right?

Not quite; I was wrong, previouly. Actually, at least for ZIPs Phar starts with an PHAR_FP (which handles uncompressed data). Calling phar_open_entry_fp() will create an additional PHAR_UFP (which uncompresses), if the entry has to be uncompressed. For further entries, phar_get_efp() returns the PHAR_FP again, and phar_open_entry_fp() will then return the already opened PHAR_UFP (if compression needs to be done).

I've added a test which extracts a deflated, a stored and a deflated entry.

@cmb69 cmb69 marked this pull request as ready for review January 18, 2021 17:12
Copy link
Member

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the thorough investigation, this makes more sense to me now.

@php-pulls php-pulls closed this in 68f5289 Jan 19, 2021
@cmb69 cmb69 deleted the cmb/69279 branch January 19, 2021 09:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants