Skip to content

Fix UAF issues with PCRE after request shutdown #15205

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

nielsdos
Copy link
Member

@nielsdos nielsdos commented Aug 2, 2024

There are two related issues, each tested.

First problem:
What happens is that on the CLI SAPI we have a per-request pcre cache, and on there the request shutdown for the pcre module happens prior to the remaining live object destruction. So when the SPL object wants to clean up the regular expression object it gets a use-after-free.

Second problem:
Very similarly, the non-persistent resources are destroyed after request shutdown, so on the CLI SAPI the pcre request cache is already gone, but if a userspace stream references a regex in the pcre cache, this breaks.

Two things that come immediately to mind:

  • We could fix it by no longer treating the CLI SAPI special and just use the same lifecycle as the module. This simplifies the pcre module code a bit too. I wonder why we even have the separation in the first place. This was here in the first place to be able to store non-interned request strings in the table. The downside here is that we're using more the system allocator than Zend's allocator for cache entries.
  • We could modify the shutdown code to not remove regular expressions with a refcount>0 and modify php_pcre_pce_decref code such that it becomes php_pcre_pce_decref's job to clean up when the refcount becomes 0 during shutdown. However, this gets nasty quickly.

I chose the first solution here as it should be reliable and simple.

Closes GH-15064.

There are two related issues, each tested.

First problem:
What happens is that on the CLI SAPI we have a per-request pcre cache,
and on there the request shutdown for the pcre module happens prior to
the remaining live object destruction. So when the SPL object wants to
clean up the regular expression object it gets a use-after-free.

Second problem:
Very similarly, the non-persistent resources are destroyed after request
shutdown, so on the CLI SAPI the pcre request cache is already gone, but
if a userspace stream references a regex in the pcre cache, this breaks.

Two things that come immediately to mind:
  -  We could fix it by no longer treating the CLI SAPI special and just use
     the same lifecycle as the module. This simplifies the pcre module code
     a bit too. I wonder why we even have the separation in the first place.
     The downside here is that we're using more the system allocator
     than Zend's allocator for cache entries.
  -  We could modify the shutdown code to not remove regular expressions
     with a refcount>0 and modify php_pcre_pce_decref code such that it
     becomes php_pcre_pce_decref's job to clean up when the refcount
     becomes 0 during shutdown. However, this gets nasty quickly.

I chose the first solution here as it should be reliable and simple.

Closes phpGH-15064.
@nielsdos
Copy link
Member Author

nielsdos commented Aug 3, 2024

I made a script that fills the cache with request-allocated strings:

<?php
$f = fopen('in.php', 'w');
fwrite($f, '<?php'."\n");
for ($counter = 0; $counter < 10000; $counter++)
	fwrite($f, '\preg_match("/'.str_repeat("a", 100).$counter.'/", "subject");'."\n");
fclose($f);

Then I ran in.php with and without this patch (php_new is this PR):

Benchmark 1: ./sapi/cli/php in.php
  Time (mean ± σ):     124.7 ms ±   2.7 ms    [User: 115.5 ms, System: 8.6 ms]
  Range (min … max):   122.5 ms … 134.1 ms    24 runs
 
Benchmark 2: ./sapi/cli/php_new in.php
  Time (mean ± σ):     126.2 ms ±   2.1 ms    [User: 117.4 ms, System: 8.1 ms]
  Range (min … max):   124.7 ms … 134.5 ms    23 runs
 
Summary
  ./sapi/cli/php in.php ran
    1.01 ± 0.03 times faster than ./sapi/cli/php_new in.php

Callgrind numbers:
before this PR: 1,250,625,712
after this PR: 1,263,046,201
~0.98% difference

@iluuu1994
Copy link
Member

@nielsdos The benchmark output says you ran in.php, just making sure that's not by accident? Also, wouldn't the biggest difference here come from using non-literal strings? I.e. strings constructed at runtime in the out.php script? Because they would have to be reallocated as persistent strings.

@nielsdos
Copy link
Member Author

nielsdos commented Aug 3, 2024

@iluuu1994

The benchmark output says you ran in.php, just making sure that's not by accident?

I meant to write in.php, not out.php, in the description, but the measurements are right. It's late :-)

Also, wouldn't the biggest difference here come from using non-literal strings? I.e. strings constructed at runtime in the out.php script? Because they would have to be reallocated as persistent strings.

I'm running without opcache, so the strings are not persistent, and the check is against the IS_STR_PERMANENT. So we really are reallocating.

I previously tried to bench something like this:

<?php
for ($counter = 0; $counter < 10000; $counter++)
	\preg_match("/" . str_repeat("a", 100) . "$counter/", "subject");

which will certainly not be persistent even with opcache, but then I realised that the string concat overhead and for loop overhead will be counted too. So I wanted to test a scenario where I had no such overhead so that the percentage difference is the most interpretable, which is why I ended up with the test script I posted in my previous comment.

@iluuu1994
Copy link
Member

I'm running without opcache, so the strings are not persistent, and the check is against the IS_STR_PERMANENT. So we really are reallocating.

Ah ok, thanks for the clarification. The 1% for this very synthetic benchmark shouldn't enough of a reason to bother with a different solution. Optimally, we could allocate interned strings as persistent in CLI while counting them towards the memory limit, which would avoid the realloc. But that's not something to do here.

Comment on lines -507 to -509
if (PCRE_G(per_request_cache)) {
zend_hash_destroy(&PCRE_G(pcre_cache));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the description and what I've understood at so far... couldn't you re-initialize this to an empty table after the zend_hash_destroy? No more use-after-free, no other code changes necessary because it's still got a valid hash table in it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change described as-is would only work for the resource test. The spl test would still cause a uaf because it references the pcre cache entry directly. That would be solvable by always looking up the instance from the cache though.
However, even with the uaf out of the way, the issue still wouldn't be fixed as the pcre entry no longer exists, so then we end up with an exception on cli while the code works fine on non-cli SAPIs. We could fix that by allowing to re-add entries to the hash table, but that is all messy and complicated and opens the question of who needs to clean up the cache entries after re-adding them after they were removed in the first place.

@iluuu1994
Copy link
Member

The uaf is not on the hash table itself, but on the pcre_cache_entry. But it might indeed be possible to use the existing refcount in php_free_pcre_cache() instead of always assuming the memory can be freed?

@nielsdos
Copy link
Member Author

nielsdos commented Aug 6, 2024

The uaf is not on the hash table itself, but on the pcre_cache_entry. But it might indeed be possible to use the existing refcount in php_free_pcre_cache() instead of always assuming the memory can be freed?

Maybe, I investigated this a bit a month ago or so, so I don't remember the details but it got a bit messy to have both hash table destruction and refcount-based destruction because when the refcount reaches zero we shouldn't remove the cache entry otherwise the cache would almost always be empty. So it requires more changes than just that.

@nielsdos
Copy link
Member Author

nielsdos commented Aug 6, 2024

Another problem I just realized is that executing the matching may recreate the cached unmatched pair zvals, leaking request memory.

@nielsdos
Copy link
Member Author

@iluuu1994 I fixed the remaining leak issue and updated the test to check for this. May you please have a final look?

Copy link
Member

@iluuu1994 iluuu1994 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks correct to me.

@nielsdos
Copy link
Member Author

Merged as ded8fb7. I'll let it sit in master and see to backport it (modulo struct changes) later when it has had time to sit and wait.

@nielsdos nielsdos closed this Sep 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PCRE pattern cache crashes process in wrapper stream_close called after shutdown
3 participants