Skip to content

Proposal: Add ImmutableKeyValueSequence (rewindable, immutable, memory-efficient, allows any key&repeating keys) #6655

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

TysonAndre
Copy link
Contributor

@TysonAndre TysonAndre commented Jan 31, 2021

  1. This eagerly evaluates any iterable,
    and stores a read only copy of the keys and values.
  2. This allows using any value as a key, and allows repeating keys.

For example, this can be used to store the exact result of keys and values
yielded by a Generator.

Longer-term, this may be useful in creating internal helper functions/methods such as
someprefix_map(iterable $iterable): ImmutableKeyValueSequence, someprefix_flip(iterable $iterable): ImmutableKeyValueSequence,

function my_generator() {
    yield '123' => 'numeric string is preserved';
    yield '123' => 'allows repeating keys';
    yield new stdClass() => 'false';
}
$iterable = new ImmutableKeyValueSequence(my_generator());
foreach ($iterable as $key1 => $value1) {
    foreach ($iterable as $key2 => $value2) { ... }
}
final class ImmutableKeyValueSequence implements IteratorAggregate, Countable, JsonSerializable
{
    public function __construct(iterable $iterator) {}
    public function getIterator(): InternalIterator {}
    public function count(): int {}
    public static function fromPairs(iterable $pairs): ImmutableKeyValueSequence {}
    public function toPairs(): array {}  // [[$k1, $v1], [$k2, $v2]]

    public function __serialize(): array {}  // [$k1, $v1, $k2, $v2,...]
    public function __unserialize(array $data): void {}
    public static function __set_state(array $array): ImmutableKeyValueSequence {}

    // useful for converting iterables back to values
    public function keys(): array {}  // [$k1, $k2, ...]
    public function values(): array {}  // [$v1, $v2, ...]
    // useful to efficiently get offsets at the middle/end of a long iterable
    public function keyAt(int $offset): mixed {}
    public function valueAt(int $offset): mixed {}

    public function jsonSerialize(): array;  // [[$k1, $v1], [$k2, $v2]]
    // dynamic properties are forbidden
}

NOTE: This should not be confused with the pre-existing class CachingIterator. CachingIterator is not rewindable if the underlying iterator is not rewindable (e.g. generators). CachingIterator->getCache() is an array, meaning keys are limited to integers/strings and cannot be repeated.

@TysonAndre TysonAndre changed the title Add RewindableKeyValueIterator Proposal: Add RewindableKeyValueIterator Jan 31, 2021
@morrisonlevi
Copy link
Contributor

morrisonlevi commented Jan 31, 2021

I only glanced at the implementation, which looks okay at a glance. A few suggestions:

  • Do the IteratorAggregate version. I think Spl\CachedIterable would be a good name. The aggregate can support more helpers and interfaces. It can support Countable and ArrayAccess where the keys are the ordering (not the underlying keys, since there can be repeating keys). It can support serialize/unserialize, whereas Iterators generally cannot (there isn't a meaningful way to do it, any serialization must be done at a higher level). Any helper methods like keys() or values() should go here, not on the Iterator.
  • The version of the implementation's constructor that handles Traversables should see if it implements Countable (first checking the .count_elements handler for performance) and if so use that as a hint for the initial array capacity.
  • The Spl\CachedIterable::getIterator() method can return the proposed class, though I would recommend Spl\CachedIterator as a name. Should still support Countable, but should support neither ArrayAccess nor serialize/unserialize.

This PR would complement the work I'm doing for PHP 8.1 with Spl\ForwardArrayIterator and Spl\ReverseArrayIterator.

@TysonAndre
Copy link
Contributor Author

TysonAndre commented Jan 31, 2021

Do the IteratorAggregate version. I think Spl\CachedIterable would be a good name. The aggregate can support more helpers and interfaces. It can support Countable and ArrayAccess where the keys are the ordering (not the underlying keys, since there can be repeating values). It can support serialize/unserialize, whereas Iterators generally cannot (there isn't a meaningful way to do it, any serialization must be done at a higher level). Any helper methods like keys() or values() should go here, not on the Iterator.

Still undecided about the name, CachedIterable or CopiedIterable or ClonedIterable is probably a better name than RewindIterable. The similarity to the name CachingIterator may cause confusion.

Definitely makes sense for the helper methods going on the IteratorAggregate instead.

It can support Countable and ArrayAccess where the keys are the ordering (not the underlying keys, since there can be repeating values).

I think ArrayAccess would lead to more bugs in application code for those expecting $cachedIt[$i] to find the first key occurrence of $i - adding keyAt(int $offset), valueAt(int $offset), keyIndex(mixed $key): int, valueIndex(mixed $value): int

The Spl\CachedIterable::getIterator() method can return the proposed class, though I would recommend Spl\CachedIterator as a name. Should still support Countable, but shouldn't support ArrayAccess nor serialize/unserialize.

I was planning on something similar

TysonAndre referenced this pull request Jan 31, 2021
(Usage of "current_observed_frame" varible looks unsafe to me).
Comment on lines 41 to 45
/** TODO: Does C guarantee that this has the same memory layout as an array of zvals? */
typedef struct _zval_pair {
zval key;
zval value;
} zval_pair;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general there may be padding bits between the members, though in practice I haven't found a case where two adjacent members of the same type have any padding bits between them. Do you use the layout for a specific optimization?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's just easier to write/read code with zval_pair than entries[offset*2], entries[offset*2+1] - I doubt it's faster

@TysonAndre TysonAndre changed the title Proposal: Add RewindableKeyValueIterator Proposal: Add CachedIterable (rewindable, allows any key&repeating keys) Jan 31, 2021
@TysonAndre TysonAndre force-pushed the RewindableKeyValueIterator branch from 721126a to 2e6002a Compare February 1, 2021 15:10
@TysonAndre TysonAndre force-pushed the RewindableKeyValueIterator branch 3 times, most recently from 50ac0e1 to 114de5b Compare February 7, 2021 14:56
@TysonAndre TysonAndre force-pushed the RewindableKeyValueIterator branch 2 times, most recently from e2a6d35 to 67bfa99 Compare February 11, 2021 01:57
PHP_ADD_EXTENSION_DEP(spl, pcre, true)
PHP_ADD_EXTENSION_DEP(spl, standard, true)
PHP_ADD_EXTENSION_DEP(spl, json, true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there specific reason for wanting this? I don't think existing structures are json serializable, so is there specific motivation for adding it and incurring this dependency?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://wiki.php.net/rfc/always_enable_json passed in PHP 8.0.

Before that, adding a dependency on json would complicate package maintenance, and the fact that json could be compiled as a shared extension that was potentially disabled would mean that adding it was impractical (can't have an always-on extension depend on an extension that can be disabled)

Some applications may depend on the existing structures json behavior, e.g. I've used new ArrayObject($array) in the past to get associative arrays always encoded as {}, even empty ones

@@ -28,6 +29,7 @@
#define spl_ce_Serializable zend_ce_serializable
#define spl_ce_Countable zend_ce_countable
#define spl_ce_Stringable zend_ce_stringable
#define spl_ce_JsonSerializable php_json_serializable_ce
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell, this is never used. Maybe through macro magic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's the REGISTER_SPL_IMPLEMENTS(CachedIterable, JsonSerializable); magic.

Copy link
Contributor

@morrisonlevi morrisonlevi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests passed for me locally on Mac with -fsanitize=undefined,address being set.

@TysonAndre TysonAndre force-pushed the RewindableKeyValueIterator branch 2 times, most recently from b7a0060 to db8dfaf Compare June 13, 2021 14:55
@TysonAndre TysonAndre changed the title Proposal: Add CachedIterable (rewindable, allows any key&repeating keys) Proposal: Add ImmutableKeyValueSequence (rewindable, immutable, memory-efficient, allows any key&repeating keys) Jun 13, 2021
@TysonAndre TysonAndre force-pushed the RewindableKeyValueIterator branch from 414c1ef to 8861fac Compare June 13, 2021 16:40
@TysonAndre
Copy link
Contributor Author

==491575== Process terminating with default action of signal 11 (SIGSEGV)
==491575==  Access not within mapped region at address 0xC8
==491575==    at 0x7D8474: do_interface_implementation (zend_inheritance.c:1541)
==491575==    by 0x7517A2: zend_class_implements (zend_API.c:3103)
==491575==    by 0x62756B: register_class_ImmutableKeyValueSequence (spl_immutablekeyvaluesequence_arginfo.h:83)
==491575==    by 0x627AB1: zm_startup_spl_immutablekeyvaluesequence (spl_immutablekeyvaluesequence.c:871)
==491575==    by 0x607DF8: zm_startup_spl (php_spl.c:712)
==491575==    by 0x74C3EC: zend_startup_module_ex.part.0 (zend_API.c:2177)
==491575==    by 0x74C479: zend_startup_module_ex (zend_API.c:2189)
==491575==    by 0x74C479: zend_startup_module_zval (zend_API.c:2192)
==491575==    by 0x75D1B9: zend_hash_apply (zend_hash.c:1860)
==491575==    by 0x6E5EDD: php_module_startup (main.c:2238)
==491575==    by 0x828AB0: php_cli_startup (php_cli.c:409)
==491575==    by 0x35BC34: main (php_cli.c:1333)
==491575==  If you believe this happened as a result of a stack
==491575==  overflow in your program's main thread (unlikely but
==491575==  possible), you can try to increase the size of the
==491575==  main thread stack using the --main-stacksize= flag.

The CI failures would be fixed by properly declaring the dependency on json - #7117 does that for a different class

(figured this out by running travis/compile.sh instead and manually running valgrind sapi/cli/php)

@TysonAndre TysonAndre force-pushed the RewindableKeyValueIterator branch from 859307e to 2c988ba Compare June 13, 2021 17:37
1. This eagerly evaluates any iterable,
   and stores a read only copy of the keys and values.
2. This allows using any value as a key, and allows repeating keys.

For example, this can be used to store the exact result of keys and values
yielded by a generator.

Longer-term, this may be useful in creating internal helper functions/methods such as
`someprefix_map(iterable $iterable): ImmutableIterable`, `someprefix_flip(iterable $iterable): ImmutableIterable`,

```php
function my_generator() {
    yield '123' => 'numeric string is preserved';
    yield '123' => 'allows repeating keys';
    yield new stdClass() => 'false';
}
$iterable = new ImmutableIterable(my_generator());
foreach ($iterable as $key1 => $value1) {
    foreach ($iterable as $key2 => $value2) { ... }
}
```

```php
final class ImmutableIterable implements IteratorAggregate, Countable, JsonSerializable
{
    public function __construct(iterable $iterator) {}
    public function getIterator(): InternalIterator {}
    public function count(): int {}
    public static function fromPairs(iterable $pairs): ImmutableIterable;
    public function toPairs(): array;  // [[$k1, $v1], [$k2, $v2]]

    public function __serialize(): array {}  // [$k1, $v1, $k2, $v2,...]
    public function __unserialize(array $data): void {}

    // useful for converting iterables back to values
    public function keys(): array {}  // [$k1, $k2, ...]
    public function values(): array {}  // [$v1, $v2, ...]
    // useful to efficiently get offsets at the middle/end of a long iterable
    public function keyAt(int $offset): mixed {}
    public function valueAt(int $offset): mixed {}

    public function jsonSerialize(): array;  // [[$k1, $v1], [$k2, $v2]]
    // dynamic properties are forbidden
}
```

Rename to ImmutableIterable

Fix bugs, implement `__set_state`
@TysonAndre TysonAndre force-pushed the RewindableKeyValueIterator branch from 2c988ba to 541f3d4 Compare June 15, 2021 13:24
@TysonAndre TysonAndre closed this Jun 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants