Skip to content

Change __sleep serialization for uninitialized typed properties #5027

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

TysonAndre
Copy link
Contributor

Proposed fix for https://bugs.php.net/79002

This will change the behavior of php 7.4 applications that use __sleep
with uninitialized typed properties.

I think this is an improvement, because if serialize() returned a string,
I'd expect unserialize() to return equivalent data unless
wakeup/unserialize/__unserialize threw or misbehaved.

Previously, a TypeError would be thrown on unserialize if the property couldn't be null.
Additionally, if a typed property was nullable, then the unserialized data would be null instead of undefined if there was no default.

  • Aside: this does not check what the defaults of the properties are. But that doesn't happen even when there's no __sleep method (handling this seems out of scope).

PHP 7.0 is able to handle leading 0s in the number of properties of an object.
I haven't tested earlier versions.

Proposed fix for https://bugs.php.net/79002

**This will change the behavior of php 7.4 applications that use __sleep
with typed properties.**

I think this is an improvement, because if `serialize()` returned a string,
I'd expect unserialize() to return equivalent data unless
wakeup/unserialize/__unserialize threw or misbehaved.

Previously, a TypeError would be thrown on unserialize.

PHP 7.0 is able to handle leading 0s in the number of properties of an object.
I haven't tested earlier versions.
@nikic
Copy link
Member

nikic commented Jan 2, 2020

Aside: this does not check what the defaults of the properties are. But that doesn't happen even when there's no __sleep method (handling this seems out of scope).

Not sure what you mean here. How are defaults related to this?

num_elements--;
continue;
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should be special-casing typed properties here. Instead we should always omit the property if it doesn't exist, rather than setting it to a null value.

php-pulls pushed a commit that referenced this pull request Jan 2, 2020
Instead of populating a hashtable of property names and then
directly serializing.

This has the advantage of a) detecting duplicate properties more
precisely and b) gives us the ability to discard values without
rewriting the serialization string after the fact for GH-5027.
@nikic
Copy link
Member

nikic commented Jan 2, 2020

I really don't like the rewriting of the serialization string that is going on here. I've just landed some old __sleep() refactorings I had lying around, which create a proper HT of __sleep properties. Ideally, the patch should now be as simple as dropping this line:

zend_hash_add(ht, name, &EG(uninitialized_zval));

@TysonAndre
Copy link
Contributor Author

I've just landed some old __sleep() refactorings I had lying around, which create a proper HT of __sleep properties. Ideally, the patch should now be as simple as dropping this line:

My reason for going with this approach instead of always omitting the undefined property from __sleep was that the below example would behave differently in php 7.3 and 7.4 if a typeless property was unset() or never added in 7.4.

I'm not aware of any frameworks that deliberately rely on that and ignore the notice, but if it does get changed, I'd want it mentioned in upgrading notes.

  • It's more consistent with what's done without __sleep(), at least for declared properties.

Aside: Ideally, adding a U; for unset in declared property values (like N; for null) would be useful, but then php 7.3 wouldn't be able to read new values. Maybe limit support to unserializing and wait a few minor versions.

<?php
class HasProp {
    public $first = 'value';
    public $p = 2;
    public function __sleep() {
        return ['first', 'p'];
    }
}
$x = new HasProp();
unset($x->p);
var_dump($x);
var_dump($s = serialize($x));
/*
Before, the result of unserialize():
object(HasProp)#2 (2) {
  ["first"]=>
  string(5) "value"
  ["p"]=>
  NULL
}
After removing zend_hash_add(ht, name, &EG(uninitialized_zval));
object(HasProp)#2 (2) {
  ["first"]=>
  string(5) "value"
  ["p"]=>
  2
}
 */
var_dump(unserialize($s));

@TysonAndre
Copy link
Contributor Author

Not sure what you mean here. How are defaults related to this?

For the above example, it's related to the question of how it would handle public $p = 2; being returned from __sleep() when unset. vs public $p = null;/public $p

@nikic
Copy link
Member

nikic commented Jan 2, 2020

I understand the concern. However, I think that the correct behavior here is either "unset properties represented as null" or "unset properties not represented", but not something in the middle that depends on whether the property in question is typed.

The main issue I see with your example above is that $p gets the default value of 2 instead of being omitted entirely. That looks like a very serious bug in general unserialization to me, because it means that the object value is not actually preserved across a serialize+unserialize cycle: https://3v4l.org/60ARA We should be starting from an all-UNDEF state, not from a default value initialization, right?

@TysonAndre
Copy link
Contributor Author

We should be starting from an all-UNDEF state, not from a default value initialization, right?

That would break my expectations of how it works now, for code deployments that add brand new properties to existing objects, as well as for the properties that are omitted from __sleep().

Suppose that the application using class HasProp adds a new property, such as public $myIsUninitialized = true;, with or without __sleep. Currently, unserializing() old data would keep the default value of $myIsUninitialized.

(Or that $myIsUninitialized already exists, but is omitted from __sleep)

@TysonAndre
Copy link
Contributor Author

That looks like a very serious bug in general unserialization to me, because it means that the object value is not actually preserved across a serialize+unserialize cycle

I agree, but think adding U; support for declared properties to unserialize() in this version, and to serialize() in a subsequent major/minor version, would be a better approach to fixing that
(so that existing properties would remain unset new properties would have their intended defaults)

@nikic
Copy link
Member

nikic commented Jan 2, 2020

Suppose that the application using class HasProp adds a new property, such as public $myIsUninitialized = true;, with or without __sleep. Currently, unserializing() old data would keep the default value of $myIsUninitialized.

What if that property is getting initialized in the constructor rather than via a default value? I don't think unserialization can really work reliably if the class structure changed in the meantime.

@nikic
Copy link
Member

nikic commented Jan 2, 2020

I agree, but think adding U; support for declared properties to unserialize() in this version, and to serialize() in a subsequent major/minor version, would be a better approach to fixing that
(so that existing properties would remain unset new properties would have their intended defaults)

If addition of new properties is indeed a concern, then I agree that adding U; support would make sense. Not sure on migration strategy.

@TysonAndre
Copy link
Contributor Author

Not sure on migration strategy.

Making serialize($x, ['serialize_unset_properties_as_unset' => true]) emit U; for objects with/without __sleep() in 8.0 would be one approach (and add a note that it wouldn't be backwards compatible with older versions)

True could become the default in 9.0 or when all officially supported php versions support unserializing U;

@nikic
Copy link
Member

nikic commented Jan 2, 2020

Even if we go with U;, we still need to decide what to do here in the meantime... One more option we have is to actually make this throw the usual "accessing uninitialized typed property" Error exception. I think that might actually make the most sense, because we'd basically be mirroring the normal PHP behavior of accessing an unset prop -- either a notice (not typed) or Error (typed).

@TysonAndre
Copy link
Contributor Author

One more option we have is to actually make this throw the usual "accessing uninitialized typed property" Error exception. I think that might actually make the most sense, because we'd basically be mirroring the normal PHP behavior of accessing an unset prop -- either a notice (not typed) or Error (typed).

That would make writing a __sleep (in an application that had unset properties) harder (for users unfamiliar with the correct way to do it) because of the absence of an is_initialized helper (https://bugs.php.net/bug.php?id=78480).

  • Applications could use a pure php helper method using ReflectionProperty.

@nikic
Copy link
Member

nikic commented Jan 2, 2020

A priori I would assume that an attempt to reference an uninitialized property in __sleep() is an application bug, where some other piece of code did not properly initialize the objects (I mean, that's why those exceptions exist in the first place).

There can certainly be code that has explicitly uninitialized properties and quite intentionally wishes to preserve them, but that sounds like a rather exotic use-case to me.

I certainly would not expect an average __sleep() implementation to ever deal with uninitialized properties.

@nikic
Copy link
Member

nikic commented Jan 2, 2020

Another perspective on this is that __sleep

public function __sleep() {
    return ['x', 'y'];
}

is basically a legacy way to write __serialize

public function __serialize() {
    return ['x' => $this->x, 'y' => $this->y];
}

and if you write that, you'd get either a notice or an Error, depending on whether it's typed.

@TysonAndre
Copy link
Contributor Author

There can certainly be code that has explicitly uninitialized properties and quite intentionally wishes to preserve them, but that sounds like a rather exotic use-case to me.

I certainly would not expect an average __sleep() implementation to ever deal with uninitialized properties.

I'd agree that that's exotic and I don't expect to see this in code I use. This is more for bringing up any possible counter-arguments for adding a new way __sleep would throw - the counterarguments seem fairly weak.

My concerns would be other php maintainers objecting to a rare case starting throwing in 7.4.2, but I don't personally object to throwing for uninitialized typed properties instead of my PR.

@nikic
Copy link
Member

nikic commented Jan 3, 2020

I've opened #5050 for the "throw Error" variant of this. I'm not overly concerned about BC at this point, as this both a) already throws a notice and b) is very likely to throw an Error on unserialization, so it's not likely that this is going to break any code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants