Skip to content

PHPC-1839 Fix accessing referenced strings #1223

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jun 16, 2021
Merged

Conversation

alcaeus
Copy link
Member

@alcaeus alcaeus commented May 31, 2021

PHPC-1839
Link to evergreen patch: https://spruce.mongodb.com/version/60b4885c2fbabe7d9b6af541

This PR contains tests and a fix for the bug described in the original ticket.
This PR contains a test and a workaround for the issue described above. Our array helper, in this case php_array_zval_to_string, call zval_copy_ctor to create a copy of the provided zval and converts this copy to a string using convert_to_string. The *pfree output argument is set to false for non-interned strings, true for interned strings.

When dealing with strings this is fine, but for references this causes an issue. In the case of the original bug, the user uses php-di to create their client. Due to its implementation, the type map being stored is an array with references to non-interned strings (both with a refcount of 1).

When this reference is passed into php_array_zval_to_string, no special handling is given, so zval_copy_ctor ends up increasing the refcount of the value (in this case the reference to the non-interned string). We then call convert_to_string, which for references calls zend_unwrap_reference. In there, the refcount of the value is decreased and a copy of the referenced value (Z_REFVAL_P(op)) is created. This copy is then converted to a string, and php_array_zval_to_string will set *pfree to true as the string is not interned. When the value is freed, the original data ends up being corrupted, and running in debug mode also reveals a memory leak. The behaviour and resulting leak suggest that we free the wrong string, freeing the original string but leaving the copy in place. Since I'm not too familiar with the zval handling, I'd leave it to @jmikola or @sgolemon to confirm this, but the following test output highlights the problem:

Before:
array(2) refcount(2){
  ["root"]=>
  string(5) "array" refcount(1)
  ["document"]=>
  string(5) "array" refcount(1)
}
After:
array(2) refcount(2){
  ["root"]=>
  string(5) "��k��" refcount(2)
  ["document"]=>
  string(5) "��k��" refcount(2)
}

I'd expect the refcount of the root and document fields to be the same, but there's an additional reference to that somewhere and the data has been garbled.

In our case, handling the zval reference type ourselves in the array helper solves this problem. We grab the referenced zval and work on that, similar to how most other php functions handle references. This closes the memory leak and also frees the correct string.

@sgolemon I'd appreciate your input on the issue and the fix/workaround in the array helpers. Please let me know if I should contribute the patch to an upstream repo for the helper. As for the potential underlying PHP issue, I have yet to dive into the php source code and get its tests to run locally. I can then try to reproduce the behaviour in PHP to confirm, but I suspect that calling convert_to_string on such a zval will cause the same issue.

@alcaeus alcaeus requested review from jmikola and sgolemon May 31, 2021 06:51
@alcaeus alcaeus self-assigned this May 31, 2021
<?php
require_once __DIR__ . "/../utils/basic.inc";

function createTypemap()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a function here to reproduce the behaviour of php-di. Notably this produces a different zval than in test 002 (see below).

Before:
array(2) refcount(2){
["root"]=>
string(5) "array" refcount(1)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Despite us assigning a reference, this is apparently converted by PHP once the variable goes out of scope. I was able to confirm this on all PHP versions: https://3v4l.org/12fdR. For this reason, I decided to keep both tests to make sure we're not running into strange behaviour anymore; this took enough time to discover already.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I discovered that this is indeed intended: debug_zval_dump only shows the reference indicator (&) for references with a refCount > 1.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify if the value of "root" (and other keys) here are still references, and this is just a matter of the behavior of php/php-src@2eb980f deciding not to render them as such?

If so, perhaps note that in a comment on the first call to debug_zval_dump() in the test.


Based on my reading of Dereferencing and Unwrapping, the unwrapping in php_array_api.h should only affect the internal zval. I'd expect the reference in the $typeMap array to remain as-is.

@alcaeus alcaeus requested a review from jmikola June 2, 2021 10:46
Before:
array(2) refcount(2){
["root"]=>
string(5) "array" refcount(1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify if the value of "root" (and other keys) here are still references, and this is just a matter of the behavior of php/php-src@2eb980f deciding not to render them as such?

If so, perhaps note that in a comment on the first call to debug_zval_dump() in the test.


Based on my reading of Dereferencing and Unwrapping, the unwrapping in php_array_api.h should only affect the internal zval. I'd expect the reference in the $typeMap array to remain as-is.

PHPC-1839: Referenced, out-of-scope, non-interned string in typeMap
--SKIPIF--
<?php require __DIR__ . "/../utils/basic-skipif.inc"; ?>
<?php skip_if_not_live(); ?>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, but do any of these tests actually need a live server? Could you test the same code path using MongoDB\BSON\toPHP() and a simple BSON document (e.g. 0x05 followed by four null bytes).

If you make that change, all of these test files can be moved to the tests/bson directory.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated and moved. Thanks for the suggestion.

Before:
array(2) refcount(2){
["root"]=>
&string(5) "array" refcount(1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does debug_zval_dump() display a reference with refcount(1) in light of php/php-src@2eb980f. Wouldn't the & be suppressed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, I have no clue why PHP would show the correct values here. Maybe @sgolemon can provide some insight?

Before:
array(2) refcount(2){
["root"]=>
string(5) "array" refcount(1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this also a case where the values here are references, but debug_zval_dump) simply isn't displaying it as such?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this test corresponds with bug1839-001, except for the use of an interned string vs. non-interned.

@alcaeus alcaeus requested a review from jmikola June 15, 2021 12:14
Copy link
Member

@jmikola jmikola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Optionally rename the methods to the canonical forms. I could see that helping if someone ever does a case-sensitive search to find usages in the tests and misses these as-is.

@alcaeus alcaeus merged commit c68cdcd into mongodb:v1.9 Jun 16, 2021
@alcaeus alcaeus deleted the phpc-1839 branch June 16, 2021 06:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants