Skip to content

Fix bug #79836: Segfault in concat_function #10049

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 7 commits into from

Conversation

nielsdos
Copy link
Member

@nielsdos nielsdos commented Dec 4, 2022

This fixes bug #79836 (bugs.php.net bugtracker).

Bug analysis

The following sequence of actions was happening in the bug report code which caused a null pointer dereference:

  1. debug_backtrace() returns an array
  2. The concatenation to $c will transform the array to a string via
    zval_get_string_func for op2 and output a warning.
    Note that zval op1 is of type string due to the first do-while
    sequence.
  3. The warning of an implicit "array to string conversion" triggers
    the ob_start callback to run. This code transform $c (==op1) to a long.
  4. The code below the 2 do-while sequences assume that both op1 and op2
    are strings, but this is no longer the case. A dereference of the
    string will therefore result in a null pointer dereference.

Test case reduction

I reduced the test case to the following:

ob_start ( function () use ( & $c ) { $c = 0 ;}, 1 );
$c .= [];
$c .= [];

I have also added a second test case for this bug which is about the same except that that one checks op2 instead of op1.

Finally, I also added a third test case to test a path in concat_function that wasn't tested before.

I gave credit via the Co-authored-by tag to the original bug reporter because I created my test code by reducing theirs. I hope I did this correctly :)

Solution

The solution used here is to work with the zend_string directly instead of with the ops.
The code is slightly more complicated now, and I wanted to make sure I did not introduce any performance regressions.
I wasn't too sure how to test this best, so I wrote a little micro-benchmark:

for ($i = 0; $i < 300_000; ++$i) {
  $tmp1 = 'abc' . 'def';
  $tmp2 = '' . 'def';
  $tmp3 = 'abc' . '';
  $tmp4 = 333 . 'aaa';
  $tmp5 = 'aaa' . 333;
  $tmp6 = $tmp1 . $tmp2 . $tmp3 . $tmp4 . $tmp5;
  $a = 'abc';
  $a .= $a;
}

I checked callgrind and it reports a tiny decrease in I refs:

  • I refs before this PR = 322,710,409
  • I refs after this PR = 320,892,922

The wall clock time is the same however (ran it 50 times and averaged it and the difference is negligible.

I also checked with running the artisan command from a Laravel project and the I refs also drop a tiny tiny amount, and the wall clock time is the same.

I therefore think there is no negative performance impact.

@nielsdos nielsdos changed the title Fix bug #79836 Fix bug #79836: Segfault in concat_function Dec 4, 2022
@nielsdos nielsdos force-pushed the fix-bug-79836 branch 2 times, most recently from 9d84e97 to 09e4545 Compare December 4, 2022 22:53
@nielsdos
Copy link
Member Author

nielsdos commented Dec 4, 2022

I had some trouble because the regression test produces the output Warning: Array to string conversion in Unknown on line 0, which I didn't get in the test output locally. I then tried setting error_reporting via a function call, but that didn't work. I noticed some other .phpt files used the INI section, so I'm trying that now.
Sorry for the noise.

@nielsdos
Copy link
Member Author

nielsdos commented Dec 5, 2022

There's a test failure in the tracing JIT for socket_import_stream: Test with multicasting [ext/sockets/tests/socket_import_stream-3.phpt], but I can't reproduce that locally. I'm also not seeing string concats in that test. If you have any information please let me know :)

Also now that I'm looking at this again, I believe there were actually 2 bugs: a type confusion and a UAF. I fixed the type confusion issue, but maybe not the user-after-free: If we are in the case `opX_string = Z_STR_P(opX);", and the string is not interned and dropped by a userland function, then I think we have a UAF. I'll look into this again in the evening.

@nielsdos nielsdos marked this pull request as draft December 5, 2022 20:55
@nielsdos nielsdos force-pushed the fix-bug-79836 branch 3 times, most recently from aa52ce3 to f3656c5 Compare December 7, 2022 19:01
@nielsdos nielsdos marked this pull request as ready for review December 7, 2022 20:18
@cmb69
Copy link
Member

cmb69 commented Dec 8, 2022

Thank you for the PR!

Since this is somewhat related to https://bugs.php.net/81705, I've checked that, and found that we're leaking memory with this patch when running the following script:

<?php

$arr = [0];
$my_var = str_repeat("a", 1);
set_error_handler(
    function() use(&$my_var) {
        echo("error\n");
        $my_var = 0x123;
    }
);
$my_var .= $GLOBALS["arr"];
var_dump($my_var);

Arguably, that's better than a segfault due to the type confusion, but maybe we can improve that; at least we need to make sure that there's no memory leak for working code.

@iluuu1994 iluuu1994 self-assigned this Dec 8, 2022
@nielsdos
Copy link
Member Author

nielsdos commented Dec 8, 2022

Hi Christoph, thank you for the review.
I think I know why the memleak is happening: I forgot to account for the EG(exception) case .
I will look at this in the evening and then fix it (hopefully), and also add that testcase you linked as a .phpt file.

--TEST--
Bug #79836 (Segfault in concat_function)
--INI--
error_reporting = E_ALL & ~E_WARNING
Copy link
Member

@iluuu1994 iluuu1994 Dec 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be error_reporting = E_WARNING. Otherwise no warning is printed, thus the ob closure isn't called and $c isn't modified. But I think tests generally assume E_WARNING is enabled.


do {
if (UNEXPECTED(Z_TYPE_P(op1) != IS_STRING)) {
if (Z_ISREF_P(op1)) {
op1 = Z_REFVAL_P(op1);
if (Z_TYPE_P(op1) == IS_STRING) break;
if (Z_TYPE_P(op1) == IS_STRING) {
op1_string = zend_string_copy(Z_STR_P(op1));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should avoid copying the string here and then only release the string when op1_string_from_func is set.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that this suggestion would introduce a use-after-free:

Suppose Z_STR_P(op1) is a non-interned string. It'll be assigned to op1_string.
If in the second do-while (for op2) we get to: zval_get_string_func(op2), then the userland function can decrease the refcount of op1_string, and if the only reference was from op1, we'd be accessing freed memory later down the line.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I think I see a way to avoid the copy, by doing it in the second do-while.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old implementation essentially uses op1 in the happy path (when op1 is a string) and only uses op1_copy when new memory is allocated. There shouldn't be a use-after-free unless you actually free op1 (or the string is contains). It would be nice if we could avoid the needless copy here.

}
}
op1 = &op1_copy;
} else {
op1_string = zend_string_copy(Z_STR_P(op1));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, I don't think this should be copied.

Copy link
Member Author

@nielsdos nielsdos Dec 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same reasoning as in the above comment applies here I think.
Edit: I found a way to do it like you suggested.

}
}
op1 = &op1_copy;
} else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I'd flip the condition (and UNEXPECTED)

ZVAL_UNDEF(&op1_copy);
ZVAL_UNDEF(&op2_copy);
zend_string *op1_string, *op2_string = NULL;
bool op1_string_from_func = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

op1_is_non_string?

}
} else if (UNEXPECTED(Z_STRLEN_P(op2) == 0)) {
if (EXPECTED(result != op1)) {
zend_string_release(op1_string);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this release should be handled at the end of the function, and only be executed when op1_string_from_func is set (as that would indicate it was allocated, maybe that variable needs a better name).

Copy link
Member Author

@nielsdos nielsdos Dec 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not possible because of the use-after-free situation I described earlier I think.
I agree that the variable needs a better name though and I'll use your suggested name.
Edit: I found a way to do it like you suggested.

if (result == op1) {
/* As result == op1, op1_string is equal to Z_STR_P(result) and we thus we have a double reference.
* We should drop it before extending such that it can be extended in place in most cases. */
if (!op1_string_from_func) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think both releases here should go away with proper memory management above.

Copy link
Member Author

@nielsdos nielsdos Dec 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same reasoning applies as above comment I think.
Edit: I found a way to do it like you suggested.

@nielsdos
Copy link
Member Author

nielsdos commented Dec 8, 2022

Thanks for the review, I applied your suggestions and pushed the branch.
I notice some failures with the following diff for the newly added test bug79836.phpt, usually in the tracing JIT test but not the regular test:

001+ Warning: Array to string conversion in Unknown on line 0
002+ 
003+ Warning: Array to string conversion in Unknown on line 0

This seems to indicate ob_start failed somehow?
However, I cannot reproduce that on my own Linux machine.
Should the test be skipped if ob_start returns false or am I missing something?

@cmb69
Copy link
Member

cmb69 commented Dec 9, 2022

This seems to indicate ob_start failed somehow?

I rather guess that this conversion occurs during compilation (optimization). You probably need to enable OPcache (opcache.enable_cli etc.) to reproduce. You can check the opcodes (see https://www.npopov.com/2017/04/14/PHP-7-Virtual-machine.html#obtaining-opcode-dumps on how to obtain opcode dumps).

@nielsdos
Copy link
Member Author

nielsdos commented Dec 9, 2022

You are right. The literal compaction optimisation pass caused the array to string conversion warning at compile time. That caused the ob_start callback to also not trigger.
I tried a couple of different things to avoid this but settled on disabling that particular optimisation for the relevant tests.
There is one failing test in the last CI run: an oci_connect test, it timed out but I think it is unrelated.

Copy link
Member

@iluuu1994 iluuu1994 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the changes!

There is one failing test in the last CI run: an oci_connect test, it timed out but I think it is unrelated.

Yes, unfortunately we got quite a few flaky tests 🙁

if (result == op1) {
if (op1_is_non_string_or_copy) {
i_zval_ptr_dtor(result);
op1_is_non_string_or_copy = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given how these values are used, maybe a better name would be free_op1.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's indeed better. Actually I'd prefer free_op1_string to make it clear it's about the string, not about the op.

/* account for the case where result == op1 == op2 and the realloc is done */
if (op1_string == op2_string) {
op2_string = result_str;
op2_is_non_string_or_copy = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can never be true here, maybe we can verify this with an assert instead? If this were true we'd need to release it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, actually this might be true for constants in CVs.

$l = false;
$r = false;
$x = $l . $r;

Because a static empty string is returned for those. In this case we don't need a release, but I'm not sure if it's not more confusing not to add it. Maybe there's also another case I'm not thinking of.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this, and I came to the conclusion that it is possible to get here using the following reasoning and piece of code.

Reasoning
This can only be true if line 1902 is executed.
That line requires op2_string to be NULL in order to be executed. That means that line 1870 did not execute. That means !(result == op1 && op1 == op2) which means result != op1 || op1 != op2
That means if we get to the branch starting at line 1948, result == op1 && op1 != op2 must hold.

Code

$global_non_interned_string = str_repeat("a", 3);

class Test {
        public function __toString() {
                global $global_non_interned_string;
                return $global_non_interned_string;
        }
}

$test1 = new Test;
$test2 = new Test;
$test1 .= $test2;

echo $test1 . "\n";

The current code has a memleak now because the reference to op2_string must be released before re-assigning. I'll fix this and add this test code as an additional test case.

Your example doesn't work because result != op1, which is a requirement to get in that branch.

@nielsdos
Copy link
Member Author

nielsdos commented Dec 9, 2022

Thanks for the review once again! Just another quick question: this PR will fix https://bugs.php.net/bug.php?id=81705 too (and maybe there are other related issues?). Do I need to change the commit name to also mention that issue, or what is usually done in this case?

nielsdos added 7 commits May 12, 2023 22:27
The following sequence of actions was happening which caused a null
pointer dereference:
1. debug_backtrace() returns an array
2. The concatenation to $c will transform the array to a string via
   `zval_get_string_func` for op2 and output a warning.
   Note that zval op1 is of type string due to the first do-while
   sequence.
3. The warning of an implicit "array to string conversion" triggers
   the ob_start callback to run. This code transform $c (==op1) to a long.
4. The code below the 2 do-while sequences assume that both op1 and op2
   are strings, but this is no longer the case. A dereference of the
   string will therefore result in a null pointer dereference.

The solution used here is to work with the zend_string directly instead
of with the ops.
Co-authored-by: changochen1@gmail.com
Co-authored-by: cmbecker69@gmx.de
Co-authored-by: yukik@risec.co.jp
This tests the path op1_string_from_func == true, result == op1,
op1 == op2 in the do-while, which wasn't tested before.
This tests for the case where result == op1, op1 != op2 and op1_string
and op2_string are non-interned strings that are equal.
@nielsdos nielsdos changed the base branch from PHP-8.1 to master May 13, 2023 15:02
@nielsdos
Copy link
Member Author

I did a revision of this after a long time. I changed the branch to master because:

  • it is not a critical bug
  • string concatenation is an important core functionality, so we must be careful changing this

According to benchmarks there is no regression (there's even a tiny improvement in Zend/bench.php).

Copy link
Member

@iluuu1994 iluuu1994 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@nielsdos nielsdos closed this in 727e26f May 16, 2023
iluuu1994 added a commit to iluuu1994/php-src that referenced this pull request May 22, 2023
iluuu1994 added a commit to iluuu1994/php-src that referenced this pull request May 22, 2023
free_op2_string may be set to false when op1 == op2, by re-using the same string
for both operands. In that case, the string should still be copied to result
because result is not actually a string.

Introduced by phpGH-10049
iluuu1994 added a commit to iluuu1994/php-src that referenced this pull request May 22, 2023
free_op2_string may be set to false when op1 == op2, by re-using the same string
for both operands. In that case, the string should still be copied to result
because result is not actually a string.

Introduced by phpGH-10049
iluuu1994 added a commit to iluuu1994/php-src that referenced this pull request May 22, 2023
free_op2_string may be set to false when the operands are not strings, and
`result == op1 == op2`, by re-using the same string for both operands. In that
case, the string should still be copied to result because result is not actually
a string. Also change the op1 branch to stay consistent.

Introduced by phpGH-10049
iluuu1994 added a commit to iluuu1994/php-src that referenced this pull request May 22, 2023
free_op2_string may be set to false when the operands are not strings, and
`result == op1 == op2`, by re-using the same string for both operands. In that
case, the string should still be copied to result because result is not actually
a string. Also change the op1 branch to stay consistent.

Introduced by phpGH-10049
iluuu1994 added a commit to iluuu1994/php-src that referenced this pull request May 22, 2023
iluuu1994 added a commit to iluuu1994/php-src that referenced this pull request May 22, 2023
iluuu1994 added a commit to iluuu1994/php-src that referenced this pull request May 22, 2023
iluuu1994 added a commit that referenced this pull request May 22, 2023
free_op2_string may be set to false when the operands are not strings, and
`result == op1 == op2`, by re-using the same string for both operands. In that
case, the string should still be copied to result because result is not actually
a string. Also change the op1 branch to stay consistent.

Introduced by GH-10049
iluuu1994 added a commit to iluuu1994/php-src that referenced this pull request May 22, 2023
iluuu1994 added a commit to iluuu1994/php-src that referenced this pull request May 22, 2023
iluuu1994 added a commit that referenced this pull request May 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants