Skip to content

Fix #81243: Too much memory is allocated for preg_replace() #7231

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

cmb69
Copy link
Member

@cmb69 cmb69 commented Jul 12, 2021

If new_len > alloc_len, we had a serious issue anyway. The
comparison only makes some sense if new_len == alloc_len, but in that
case we don't need to re-allocate. Only if new_len == 0 we need to
allocate an empty zend_string. However, trimming a potentially over-
allocated string appears to be reasonable, so we drop the condition
altogether.

If `new_len > alloc_len`, we had a serious issue anyway.  The
comparison only makes some sense if `new_len == alloc_len`, but in that
case we don't need to *re*-allocate.  Only if `new_len == 0` we need to
allocate an empty zend_string.  However, trimming a potentially over-
allocated string appears to be reasonable, so we drop the condition
altogether.
@cmb69 cmb69 added the Bug label Jul 12, 2021
@cmb69
Copy link
Member Author

cmb69 commented Jul 12, 2021

Oh, I just noticed that php_pcre_replace_func_impl() has the same issue.

@nikic
Copy link
Member

nikic commented Jul 12, 2021

If new_len > alloc_len, we had a serious issue anyway. The
comparison only makes some sense if new_len == alloc_len, but in that
case we don't need to re-allocate. Only if new_len == 0 we need to
allocate an empty zend_string.

I don't really follow. Why would this only happen if new_len == alloc_len? This should happen whenever the last (non-replaced) part of the string becomes larger than the allocation.

However, trimming a potentially over-
allocated string appears to be reasonable, so we drop the condition
altogether.

Right. I wonder if we should leave some leeway though? It's not so important to shrink the string if it's already close enough in size. From a quick look I see size < alloc_size/2 and alloc_size-size > 16 being used as truncation criteria occasionally. Not sure how relevant this would be in practice though. Maybe just always reallocating is fine.

I also want to note that the reallocation scheme in preg_replace looks pretty broken. It does alloc_size = alloc_size + 2*new_len, which means that memory usage is approximately tripled on each reallocation, rather than the doubling that was probably intended. At the same time, this always builds up the allocation from zero, while it would probably be more reasonable to base it off the original string size.

@cmb69
Copy link
Member Author

cmb69 commented Jul 12, 2021

I don't really follow.

Sorry, that was plain nonsense.

From a quick look I see size < alloc_size/2 and alloc_size-size > 16 being used as truncation criteria occasionally.

I wouldn't be happy with the former, since that may still be quite wasteful. The latter seems to be okay, but might not be that relevant in practice.

It does alloc_size = alloc_size + 2*new_len, which means that memory usage is approximately tripled on each reallocation, rather than the doubling that was probably intended.

Right; that should be fixed.

At the same time, this always builds up the allocation from zero, while it would probably be more reasonable to base it off the original string size.

I'm not sure about that. What if the result is much shorter than the original string? It probably boils down to whether it's better to have more smaller allocations or to have fewer larger allocations. Maybe leave this for now?

Copy link
Member

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's go with this for now.

@cmb69 cmb69 closed this in a6b4308 Jul 12, 2021
@cmb69 cmb69 deleted the cmb/81243 branch July 12, 2021 16:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants