-
Notifications
You must be signed in to change notification settings - Fork 7.9k
serialize: Fixed handling of nested object references #11305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The removed if in var.c caused serialize to not handle object references correctly under certain circumstances. See tests/serialize/serialization_objects_019.phpt The bug was originally introduced in commit 6c5942f, and the problematic line was last modified in commit bb0b4eb. (Fixes oss-fuzz #44954) The testcase from bb0b4eb still passes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change looks correct to me. This test makes it evident that an object can appear multiple times in the serialized data, even if the refcount is 1. This is because $r->a
and $r->b
point to the same underlying array, but serialize
treats them as separate (because those are the PHP semantics, the fact that they are shared is an optimization).
graph TD;
$r-->a;
$r-->b;
a-->array;
b-->array;
array-->$s;
You're right, this was likely an optimization to not add objects that aren't referenced anywhere else in &data->ht
, but as demonstrated this is not safe. Another way to solve this would be to also de-duplicate arrays in the serialized data but this would be a substantially bigger change, and incompatible with older PHP versions.
/cc @dstogov
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good enough.
I don't see performance degradation on Symfony benchmarks.
In general, we might keep that optimization for everything else except for elements of arrays with refcount > 1, but I'm not sure if this costs the effort.
@dstogov That sounds reasonable. It would require accounting for recursive arrays, as you could add one more level of nesting to make the RC go back to 1: graph TD;
$r-->a;
$r-->b;
a-->array1_r2;
b-->array1_r2;
array1_r2-->array2_rc1;
array2_rc1-->$s;
If you like I can try that to see if it makes any difference. |
@iluuu1994 it may make sense to try this. That optimization might really make improvement. On the other hand, passing more arguments and additional checks may make degradation for the more common cases. |
@dstogov The tracking of RC1 arrays was straight-forward. The most common case is likely passing a flat array of objects to serialize. Unfortunately, this will still result in an array with RC2 if the array is stored in a CV, because Let me know what you think. Benchmark
<?php
class Foo {}
$objects = [];
for ($i = 0; $i < 10000; $i++) {
$objects[] = new Foo();
}
for ($i = 0; $i < 1000; $i++) {
serialize($objects);
}
?>
Patch
From 718dae4761de8a4d411f6466fca4b4b89c8572f4 Mon Sep 17 00:00:00 2001
From: Ilija Tovilo <ilija.tovilo@me.com>
Date: Tue, 30 May 2023 14:59:25 +0200
Subject: [PATCH] Skip serialize object recording when array paths are rc1
Additionally, handle the common case of passing a CV-array with RC1 to
serialize. This results in RC2 which we can safely treat as RC1 because no
additional internal references are possible.
---
.../serialize/serialization_objects_019.phpt | 33 +++++++---
ext/standard/var.c | 66 ++++++++++++++-----
2 files changed, 74 insertions(+), 25 deletions(-)
diff --git a/ext/standard/tests/serialize/serialization_objects_019.phpt b/ext/standard/tests/serialize/serialization_objects_019.phpt
index 505fc3d9af..916a899b17 100644
--- a/ext/standard/tests/serialize/serialization_objects_019.phpt
+++ b/ext/standard/tests/serialize/serialization_objects_019.phpt
@@ -2,16 +2,29 @@
Object serialization with references
--FILE--
<?php
-function gen() {
- $s = new stdClass;
- $r = new stdClass;
- $r->a = [$s];
- $r->b = $r->a;
- return $r;
+
+function rcn() {
+ $root = new stdClass;
+ $end = new stdClass;
+ $root->a = [$end];
+ $root->b = $root->a;
+ unset($end);
+ var_dump(serialize($root));
}
-var_dump(serialize(gen()));
-?>
---EXPECTF--
-string(78) "O:8:"stdClass":2:{s:1:"a";a:1:{i:0;O:8:"stdClass":0:{}}s:1:"b";a:1:{i:0;r:3;}}"
+function rcn_rc1() {
+ $root = new stdClass;
+ $end = new stdClass;
+ $root->a = [[$end]];
+ $root->b = $root->a;
+ unset($end);
+ var_dump(serialize($root));
+}
+rcn();
+rcn_rc1();
+
+?>
+--EXPECT--
+string(78) "O:8:"stdClass":2:{s:1:"a";a:1:{i:0;O:8:"stdClass":0:{}}s:1:"b";a:1:{i:0;r:3;}}"
+string(98) "O:8:"stdClass":2:{s:1:"a";a:1:{i:0;a:1:{i:0;O:8:"stdClass":0:{}}}s:1:"b";a:1:{i:0;a:1:{i:0;r:4;}}}"
diff --git a/ext/standard/var.c b/ext/standard/var.c
index 641e0275f5..08a9996472 100644
--- a/ext/standard/var.c
+++ b/ext/standard/var.c
@@ -652,9 +652,13 @@ PHP_FUNCTION(var_export)
}
/* }}} */
-static void php_var_serialize_intern(smart_str *buf, zval *struc, php_serialize_data_t var_hash);
+static void php_var_serialize_intern(smart_str *buf, zval *struc, php_serialize_data_t var_hash, bool unique, bool subtract_rc);
-static inline zend_long php_add_var_hash(php_serialize_data_t data, zval *var) /* {{{ */
+/**
+ * @param bool unique Whether the element can only appear once in the serialized data. It can appear
+ * multiple times if it is embedded in an array with RC > 1.
+ */
+static inline zend_long php_add_var_hash(php_serialize_data_t data, zval *var, bool unique) /* {{{ */
{
zval *zv;
zend_ulong key;
@@ -666,6 +670,10 @@ static inline zend_long php_add_var_hash(php_serialize_data_t data, zval *var) /
/* pass */
} else if (Z_TYPE_P(var) != IS_OBJECT) {
return 0;
+ } else if (unique
+ && Z_REFCOUNT_P(var) == 1
+ && (Z_OBJ_P(var)->properties == NULL || GC_REFCOUNT(Z_OBJ_P(var)->properties) == 1)) {
+ return 0;
}
/* References to objects are treated as if the reference didn't exist */
@@ -921,7 +929,7 @@ static int php_var_serialize_get_sleep_props(
}
/* }}} */
-static void php_var_serialize_nested_data(smart_str *buf, zval *struc, HashTable *ht, uint32_t count, bool incomplete_class, php_serialize_data_t var_hash) /* {{{ */
+static void php_var_serialize_nested_data(smart_str *buf, zval *struc, HashTable *ht, uint32_t count, bool incomplete_class, php_serialize_data_t var_hash, bool unique) /* {{{ */
{
smart_str_append_unsigned(buf, count);
smart_str_appendl(buf, ":{", 2);
@@ -951,19 +959,19 @@ static void php_var_serialize_nested_data(smart_str *buf, zval *struc, HashTable
if (Z_TYPE_P(data) == IS_ARRAY) {
if (UNEXPECTED(Z_IS_RECURSIVE_P(data))
|| UNEXPECTED(Z_TYPE_P(struc) == IS_ARRAY && Z_ARR_P(data) == Z_ARR_P(struc))) {
- php_add_var_hash(var_hash, struc);
+ php_add_var_hash(var_hash, struc, unique);
smart_str_appendl(buf, "N;", 2);
} else {
if (Z_REFCOUNTED_P(data)) {
Z_PROTECT_RECURSION_P(data);
}
- php_var_serialize_intern(buf, data, var_hash);
+ php_var_serialize_intern(buf, data, var_hash, unique, false);
if (Z_REFCOUNTED_P(data)) {
Z_UNPROTECT_RECURSION_P(data);
}
}
} else {
- php_var_serialize_intern(buf, data, var_hash);
+ php_var_serialize_intern(buf, data, var_hash, unique, false);
}
} ZEND_HASH_FOREACH_END();
}
@@ -978,13 +986,13 @@ static void php_var_serialize_class(smart_str *buf, zval *struc, HashTable *ht,
if (php_var_serialize_get_sleep_props(&props, struc, ht) == SUCCESS) {
php_var_serialize_class_name(buf, struc);
php_var_serialize_nested_data(
- buf, struc, &props, zend_hash_num_elements(&props), /* incomplete_class */ 0, var_hash);
+ buf, struc, &props, zend_hash_num_elements(&props), /* incomplete_class */ 0, var_hash, GC_REFCOUNT(&props) == 1);
}
zend_hash_destroy(&props);
}
/* }}} */
-static void php_var_serialize_intern(smart_str *buf, zval *struc, php_serialize_data_t var_hash) /* {{{ */
+static void php_var_serialize_intern(smart_str *buf, zval *struc, php_serialize_data_t var_hash, bool unique, bool subtract_rc) /* {{{ */
{
zend_long var_already;
HashTable *myht;
@@ -993,7 +1001,7 @@ static void php_var_serialize_intern(smart_str *buf, zval *struc, php_serialize_
return;
}
- if (var_hash && (var_already = php_add_var_hash(var_hash, struc))) {
+ if (var_hash && (var_already = php_add_var_hash(var_hash, struc, unique))) {
if (var_already == -1) {
/* Reference to an object that failed to serialize, replace with null. */
smart_str_appendl(buf, "N;", 2);
@@ -1102,7 +1110,7 @@ again:
if (Z_ISREF_P(data) && Z_REFCOUNT_P(data) == 1) {
data = Z_REFVAL_P(data);
}
- php_var_serialize_intern(buf, data, var_hash);
+ php_var_serialize_intern(buf, data, var_hash, Z_REFCOUNT_P(&retval) == 1, false);
} ZEND_HASH_FOREACH_END();
smart_str_appendc(buf, '}');
@@ -1224,7 +1232,7 @@ again:
prop = Z_REFVAL_P(prop);
}
- php_var_serialize_intern(buf, prop, var_hash);
+ php_var_serialize_intern(buf, prop, var_hash, true, false);
}
smart_str_appendc(buf, '}');
} else {
@@ -1239,7 +1247,7 @@ again:
if (count > 0 && incomplete_class) {
--count;
}
- php_var_serialize_nested_data(buf, struc, myht, count, incomplete_class, var_hash);
+ php_var_serialize_nested_data(buf, struc, myht, count, incomplete_class, var_hash, GC_REFCOUNT(myht) == 1);
zend_release_properties(myht);
return;
}
@@ -1247,7 +1255,7 @@ again:
smart_str_appendl(buf, "a:", 2);
myht = Z_ARRVAL_P(struc);
php_var_serialize_nested_data(
- buf, struc, myht, zend_array_count(myht), /* incomplete_class */ 0, var_hash);
+ buf, struc, myht, zend_array_count(myht), /* incomplete_class */ 0, var_hash, unique && (GC_REFCOUNT(myht) - (subtract_rc ? 1 : 0)) == 1);
return;
case IS_REFERENCE:
struc = Z_REFVAL_P(struc);
@@ -1261,11 +1269,17 @@ again:
PHPAPI void php_var_serialize(smart_str *buf, zval *struc, php_serialize_data_t *data) /* {{{ */
{
- php_var_serialize_intern(buf, struc, *data);
+ php_var_serialize_intern(buf, struc, *data, true, false);
smart_str_0(buf);
}
/* }}} */
+static void php_var_serialize_cv(smart_str *buf, zval *struc, php_serialize_data_t *data)
+{
+ php_var_serialize_intern(buf, struc, *data, true, true);
+ smart_str_0(buf);
+}
+
PHPAPI php_serialize_data_t php_var_serialize_init(void) {
struct php_serialize_data *d;
/* fprintf(stderr, "SERIALIZE_INIT == lock: %u, level: %u\n", BG(serialize_lock), BG(serialize).level); */
@@ -1306,8 +1320,30 @@ PHP_FUNCTION(serialize)
Z_PARAM_ZVAL(struc)
ZEND_PARSE_PARAMETERS_END();
+ bool data_is_cv = false;
+ if (Z_TYPE_P(struc) == IS_ARRAY
+ && !(GC_FLAGS(Z_ARRVAL_P(struc)) & IS_ARRAY_IMMUTABLE)
+ && EG(current_execute_data)
+ && EG(current_execute_data)->prev_execute_data) {
+ zend_execute_data *execute_data = EG(current_execute_data)->prev_execute_data;
+ if (execute_data->func && ZEND_USER_CODE(execute_data->func->type)) {
+ zend_op_array *func = &execute_data->func->op_array;
+ const zend_op *opline = execute_data->opline;
+ if (func->opcodes < opline) {
+ const zend_op *prev_opline = opline - 1;
+ if ((prev_opline->opcode == ZEND_SEND_VAR || prev_opline->opcode == ZEND_SEND_VAR_EX) && prev_opline->op1_type == IS_CV) {
+ data_is_cv = true;
+ }
+ }
+ }
+ }
+
PHP_VAR_SERIALIZE_INIT(var_hash);
- php_var_serialize(&buf, struc, &var_hash);
+ if (data_is_cv) {
+ php_var_serialize_cv(&buf, struc, &var_hash);
+ } else {
+ php_var_serialize(&buf, struc, &var_hash);
+ }
PHP_VAR_SERIALIZE_DESTROY(var_hash);
if (EG(exception)) {
--
2.40.1 |
PHP 8.1+ does not serialize object references correctly under certain circumstances.
See tests/serialize/serialization_objects_019.phpt
The bug was originally introduced in commit 6c5942f, and the problematic line was last modified in commit bb0b4eb. (Fixes oss-fuzz #44954)
The testcase from bb0b4eb still passes.
I am not super familiar with the php codebase, but the removed IF looks suspicious and fixes my issue. I initially planned to just open an issue, but then I wrote the testcase and figured I might as well just submit the (likely) fix.
Someone else should obviously triple-check this. Especially since the original change was due to optimizations.