Skip to content

Make var_export/debug_zval_dump check for infinite recursion on the *object* #8046

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

TysonAndre
Copy link
Contributor

@TysonAndre TysonAndre commented Feb 5, 2022

Switch the check from the result of get_properties_for (the returned hash table of properties) to just checking for
infinite recursion on the object.

  • In order for a native datastructure to correctly implement
    *get_properties_for for var_export's cycle detection,
    it would need to return the exact same array every time prior to this PR.

    Prior to this commit, the requirements for cycle detection
    would prevent SplFixedArray or similar classes from returning a
    temporary array that:

    1. Wouldn't be affected by unexpected mutations from error handlers
    2. Could be garbage collected instead.

Note that SplFixedArray continues to need to return object->properties
until php 9.0, when dynamic properties are forbidden.
(even after php 9.0, because #[\AllowDynamicProperties] can be used in subclasses?)


Thankfully, those are the only remaining issues I'm aware of after the last fix https://github.com/php/php-src/pull/8105/files
(delayed __destruct of elements after a var_export call on a collection followed by change such as remove()/pop(), extra memory usage on internal data structures such as SPL after a var_export call, poor worst-case performance for cycle detection in var_export and related functions)

@TysonAndre TysonAndre requested review from dstogov and nikic February 5, 2022 23:58
@TysonAndre
Copy link
Contributor Author

Something I also noticed is that php_json_encode will call zend_std_get_properties through Z_OBJPROP_P, but the only thing it does with myht is to use it to protect against recursion.

Does it make sense to replace those with GC_PROTECT_RECURSION on the object (Z_OBJ_P), instead of the properties of the object (haven't tried this yet). For collections such as SplFixedArray which implement write_properties, it will:

  1. Allocate and populate the HashTable if obj->properties was previously null (also affects userland classes using default zend write_properties implementation, creating an associative array where there previously may have been none)

  2. In the case of SplFixedArray, which was recently changed to implement JsonSerializable to show up as [...] (or any collection that would implement jsonSerialize so that it could be JSON encoded as a list), the call to get_properties will call zend_hash_index_update to update all the elements in the obj->properties as a side effect, despite it not being needed.

    This doubles the memory usage of SplFixedArray in 8.2 the first time json_encode is called

  3. It puts limitations on how classes can implement get_properties - in order to prevent infinite recursion when json encoding they need to return the same array every time, e.g. obj->properties (or some other array with a consistent address, e.g. in the interned data for the object)

// Zend/zend_types.h
#define Z_OBJPROP(zval)				Z_OBJ_HT((zval))->get_properties(Z_OBJ(zval))
#define Z_OBJPROP_P(zval_p)			Z_OBJPROP(*(zval_p))

// ext/json/json_encoder.c
static int php_json_encode_serializable_object(smart_str *buf, zval *val, int options, php_json_encoder *encoder) /* {{{ */
{
	zend_class_entry *ce = Z_OBJCE_P(val);
	HashTable* myht = Z_OBJPROP_P(val);
	zval retval, fname;
	int return_code;

	if (myht && GC_IS_RECURSIVE(myht)) {
		encoder->error_code = PHP_JSON_ERROR_RECURSION;
		if (options & PHP_JSON_PARTIAL_OUTPUT_ON_ERROR) {
			smart_str_appendl(buf, "null", 4);
		}
		return FAILURE;
	}

@dstogov
Copy link
Member

dstogov commented Feb 7, 2022

I think the problem is in SplFixedArray again. The following patch should fix the problem. Please check and commit this, if you agree.

diff --git a/ext/spl/spl_fixedarray.c b/ext/spl/spl_fixedarray.c
index dac31fe8c3..8901c95694 100644
--- a/ext/spl/spl_fixedarray.c
+++ b/ext/spl/spl_fixedarray.c
@@ -210,6 +210,10 @@ static HashTable* spl_fixedarray_object_get_properties(zend_object *obj)
 	if (!spl_fixedarray_empty(&intern->array)) {
 		zend_long j = zend_hash_num_elements(ht);
 
+		if (GC_REFCOUNT(ht) > 1) {
+			intern->std.properties = zend_array_dup(ht);
+			GC_TRY_DELREF(ht);
+		}
 		for (zend_long i = 0; i < intern->array.size; i++) {
 			zend_hash_index_update(ht, i, &intern->array.elements[i]);
 			Z_TRY_ADDREF(intern->array.elements[i]);

@TysonAndre
Copy link
Contributor Author

TysonAndre commented Feb 12, 2022

I think the problem is in SplFixedArray again. The following patch should fix the problem. Please check and commit this, if you agree.

Sorry for the late response, I'd missed that email/notification and I would have tested it earlier if I'd seen it.

It introduces new issues in #8079

@dstogov
Copy link
Member

dstogov commented Feb 17, 2022

Now this looks much clear to me.
I see we introduced the var_dump lock on object in 7.0.
Now, I would accept this, but I afraid this may change behaviour in some cases.

… the object

This test case previously failed with an assertion error in debug builds
(for updating an index of the array to set it to the same value it already had)

Then repeat the existing check on the return value of `get_properties_for`
in case there was a reason for doing it that way.

1. `HT_ASSERT_RC1(ht);` would fail for SplFixedArray and related
   datastructures.
2. In order for a native datastructure to correctly implement
   `*get_properties_for` for var_export's cycle detection,
   it would need to return the exact same array every time prior to this PR.

   This would prevent SplFixedArray or similar classes from returning a
   temporary array that:

   1. Wouldn't be affected by unexpected mutations from error handlers
   2. Could be garbage collected instead.

Note that SplFixedArray continues to need to return `object->properties`
until php 9.0, when dynamic properties are forbidden.

(GitHub wasn't updating the previous PR despite pushing the branch
to the right repo earlier, creating a new branch)

Closes phpGH-8044
@TysonAndre TysonAndre force-pushed the var_export-infinite-recursion-check-v2 branch from b8f7265 to e2b96e2 Compare August 27, 2022 19:51
@TysonAndre
Copy link
Contributor Author

Summarizing motivations for wanting this:

  1. If var_export/debug_zval_dump is called on a data structure, then all of that data structure's fields get added to the properties and a hash table is allocated. This grows the memory usage, so programs that happen to use var_export (e.g. checking if objects change after unserializing) would see worse memory usage with data structures that implement support for being inspected with var_export.

    The alternative (if trying to keep memory usage low) is to avoid exposing any information about what a data structure contains to var_export in a PECL, but that's dissatisfying if you're an end user trying to debug. e.g. https://pecl.php.net/ds does not expose information about members of data structures in var_export (I assume due to this and other problems that could be encountered in get_properties)

    Data structures in https://github.com/TysonAndre/pecl-teds (and PHP's spl) currently does expose that information to var_export for convenience, so these issues would affect both of those

    Moving the infinite recursion check to be on the object would allow changing the get_properties implementation and avoid the need to keep the array around - it could instead return a brand new array and avoid allocating and populating the properties hash table and leaving it around

  2. When var_export is called, all of the properties remain on the object's instance until the next time var_export is called. This would delay freeing objects that used to be contained in a data structure (e.g. after pop() was called) and calling __destruct until var_export gets called again or the object is finally freed or gc triggers. In rare use cases this would be a problem for https://en.wikipedia.org/wiki/Resource_acquisition_is_initialization (e.g. closing network connections)

  3. In the worst case (e.g. an SplFixedArray of count 1000 where elements had 1000 cyclic references to SplFixedArray), this would need to update the entire property table of 1000 properties in the get_properties* handler first, 1000 times, just to report *RECURSION* when the gc check on the updated hash table reveals var_export is recursing. (taking quadratic time 1000*1000). This would be rare in practice)


Thankfully, those are the only remaining issues I'm aware of after the last fix https://github.com/php/php-src/pull/8105/files
(delayed __destruct of elements after a var_export call on a collection followed by change such as remove()/pop(), extra memory usage, poor worst-case performance for cycle detection in var_export and related functions)

@nikic
Copy link
Member

nikic commented Aug 28, 2022

Doing the recursion check on the object makes sense to me. However, shouldn't we be doing it only on the object? Why do we need it on both the object and the HT?

The previous commit checked for recursion on the object itself.
Normally, different objects shouldn't share hash tables.
@TysonAndre
Copy link
Contributor Author

TysonAndre commented Aug 28, 2022

Doing the recursion check on the object makes sense to me. However, shouldn't we be doing it only on the object? Why do we need it on both the object and the HT?

Mostly, it was a lack of clarity on why the check was done on the object properties in the first place rather than the object itself, and to continue printing *RECURSION* in any edge cases I'd overlooked (e.g. in PECLs I wasn't aware of), though for var_export/debug_zval_prop it would likely be safe to print the properties even if a finite number of objects shared hash tables.

  • I can't think of anything that actually does that, though, most classes should now use the default Z_OBJ_PROP implementation or at least use the object's own hash table, and even unusual classes such as spl ArrayObject will put the backing array (potentially from a different object - https://www.php.net/manual/en/arrayobject.construct.php) in intern->array rather than the object properties
  • Since no specific examples are brought up, it's more likely we don't need to preserve this

That, and to keep interactions with other recursion protection checks on property tables the same, in case there were any objections to that.

Now, I would accept this, but I afraid this may change behaviour in some cases.

@TysonAndre TysonAndre changed the title Make var_export/debug_zval_dump first check for infinite recursion on the object Make var_export/debug_zval_dump check for infinite recursion on the *object* Aug 30, 2022
@TysonAndre TysonAndre closed this Aug 30, 2022
@TysonAndre TysonAndre deleted the var_export-infinite-recursion-check-v2 branch August 30, 2022 11:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants