-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Use more compact representation for packed arrays. #7491
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@nikic I think about implementing additional, specialized versions of ZEND_HASH_FOREACH_* macros. e.g ZEND_PACKED_FOREACH_* + ZEND_TABLE_FOREACH_* . However, we need some better name than |
@bwoebi can you please check how it's difficult to implement packed arrays support in phpdbg |
Would it make sense to start getting the arPacked and arData through a macro to assert that a That way, the macro could be changed when ZEND_DEBUG was enabled to emit an assertion error if trying to access (or when trying to access arData on a packed array rather than a non-packed array) This would help as an optional tool in writing new PECL code in anything using php-src as a reference or that's difficult to do with existing macros), or in ensuring code continues to be correct (if the usage of helper methods changes to also include non-packed arrays, etc) (for tools like xdebug or performance monitors or serializers/data encoders to avoid/detect accidentally reading or writing the wrong memory) |
z = &ht->arData[idx].val; | ||
if (Z_TYPE_P(z) != IS_UNDEF) { | ||
break; | ||
if (HT_IS_PACKED(ht)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not familiar with the history of this file: Is datefmt_format_object performance critical in any web frameworks? If it isn't, then this code is more verbose than it needed to be, even before the refactoring.
If it isn't, using the ZEND_HASH_FOREACH_VAL macro and incrementing idx for each element (and throwing if the final element count wasn't 2, or valid_format was false for any value) could be done instead as an example of how code could be written in the future.
(assigning to dataStyle for idx 0, assigning to timeStyle for idx 1)
Looks correct, though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that we shouldn't be using HT internals here -- I've refactored the code in 6e1d9f6.
if (HT_IS_PACKED(ht)) { | ||
while (1) { | ||
idx++; | ||
if (idx >= ht->nNumUsed) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume zend_hash_move_forward_ex is used pretty frequently.
can *pos = idx;
and *pos = ht->nNumUsed
be combined? I assume the resulting assembly code would be smaller
The above statements idx < ht->nNumUsed
and idx++
make it look like idx and ht->nNumUsed will be the identical in this branch.
i.e.
if (idx >= ht->nNumUsed || Z_TYPE(ht->arPacked[idx]) != IS_UNDEF) {
*pos = idx;
ZEND_ASSERT(idx <= ht->nNumUsed);
return SUCCESS;
Same for the non-packed case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, ht->nNumUsed
may be changed between iterations (some elements may be added-removed). I can't be sure, if this optimization is safe for all cases. I prefer to keep the existing logic.
@@ -2531,6 +2861,10 @@ ZEND_API void ZEND_FASTCALL zend_hash_sort_ex(HashTable *ht, sort_func_t sort, b | |||
return; | |||
} | |||
|
|||
if (HT_IS_PACKED(ht)) { | |||
zend_hash_packed_to_hash(ht); // TODO: ??? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it'd be possible to add an optimization for packed array when renumber
is true, sorting sizeof(zval)
and a new function along the lines of zend_hash_zval_with_extra_renum_swap
(and set Z_EXTRA(ht->arPacked[i]) = i;
to continue to make it a stable sort)
I assume the TODO: ???
refers to the fact the below HT_IS_PACKED calls are now impossible.
- When renumber is false, would it make sense to declare
const bool was_packed = HT_IS_PACKED(ht);
before callingzend_packed_to_hash
and use that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense.
Zend/zend_list.c
Outdated
Bucket *p = &ht->arData[i]; | ||
if (Z_TYPE(p->val) != IS_UNDEF) { | ||
zend_resource *res = Z_PTR(p->val); | ||
zval *p = HT_IS_PACKED(ht) ? &ht->arPacked[i] : &ht->arData[i].val; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/* Reload ht->arData on each iteration, as it may be reallocated. */
I assume the only possible allocation is to grow the array, because resource ids can't be reused? Would be useful to document that implicit assumption if that was the case (to prevent people from copying this to places where that's wrong and size can shrink (and i
would be out of bounds)).
Would code like this that iterates over values in raw buckets may benefit from a macro when it isn't performance sensitive enough to use two different loops (also to reduce maintenance work), e.g. in xdebug, phpdbg, uopz/runkit7, serializers, etc.?
#define HT_GET_ZVAL_AT_RAW_OFFSET(ht, i) (HT_IS_PACKED((ht)) ? &(ht)->arPacked[i] : &(ht)->arData[i].val)
zval *p = HT_IS_PACKED(ht) ? &ht->arPacked[i] : &ht->arData[i].val; | |
zval *p = HT_GET_ZVAL_AT_RAW_OFFSET(i); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ TysonAndre In general, packed array may be converted into real hash with less number of elements.
Store just an array of zvals without keys. Because of smaller data set, this patch may show performance improvement on some apps and benchmarks that use packed arrays. (~1% on PHP-Parser) The patch has a number of disadvantages: - sapi/phpdbg needs special support for packed arrays (WATCH_ON_BUCKET). - ZEND_HASH_FOREACH_* macros became more expensive (bigger and slower), because they have to support both representations (hash and packed). CLI PHP became 37KB bigger just because of these macros. - zend_hash_sort_ex() may require converting packed arrays to hash. - zend_hash_minmax() prototype was changed to compare only values.
@@ -14233,7 +14233,11 @@ static int zend_jit_hash_jmp(dasm_State **Dst, const zend_op *opline, const zend | |||
} | |||
| LOAD_ADDR FCARG1a, jumptable | |||
| sub r0, aword [FCARG1a + offsetof(HashTable, arData)] | |||
| mov FCARG1a, (sizeof(Bucket) / sizeof(void*)) | |||
if (HT_IS_PACKED(jumptable)) { | |||
| mov FCARG1a, (sizeof(zval) / sizeof(void*)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not related to this PR, noticed while reading: On x64, sizeof(Bucket)=32 sizeof(zval)=16, and void*
is a power of two on both. Elsewhere, I see the optimization shl/shr used instead of dividing or multiplying by a power of 2, e.g. right above this.
(e.g. shr 2
for x64 bucket, shr 3
for x64 zval, shr 4
for x86 zval)
On x86, sizeof(zval) is also 16.
|.if X64
| mov r0, FCARG2a
| shl r0, 4
|.else
| imul r0, FCARG2a, sizeof(zval)
|.endif
So I think everything except 32-bit Buckets can be micro-optimized for a switch
by using shr
instead of idiv FCARG1a
below to save a mov and a few cycles?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be easier to read and write code (or for others to contribute, or to emit an assertion error if struct sizes changed) if there were macros for these magic numbers?
It may also allow avoiding the need to repeat ZEND_ASSERT(sizeof(Bucket) == 32);
and similar assertions for other structures if the macro itself was what contained the assertion.
For example,
MULTIPLY_REGISTER_BY_POWER_OF_TWO FCARG2a, divisor, dividend
MULTIPLY_REGISTER_BY_POWER_OF_TWO FCARG2a, sizeof(zval), 1
DIVIDE_REGISTER_BY_POWER_OF_TWO sizeof(zval), sizeof(void*)
(with ZEND_ASSERT(divisor % dividend == 0); ZEND_ASSERT(IS_POW2(divisor / dividend))
in the macro)
(maybe omit the smaller_value)
The function zend_hash_check_size
links to BitScanReverse
and __builtin_clz
macros for efficiently doing that division - I'm not sure which operations common C compilers can optimize out at runtime, though __builtin_constant_p
may help omit the assertions if anything actually can't be
e.g. I mean something along the lines of #define fast_get_log2(size) __builtin_constant_p(size) ? ((size) == 1 ? 0 : (size == 2 ? 1) ? (size == 4 ? 2) : ordinary_get_log2(size)) : ordinary_get_log2(size)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried replacing multiplication by shift, but unfortunately this didn't make any speed improvement even on bench.php
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried replacing multiplication by shift, but unfortunately this didn't make any speed improvement even on bench.php
Should bench.php be updated to include at least one benchmarks of match or switch statements, or maybe another benchmark that's used for less common operations?
It looks like this code is in the function static int zend_jit_hash_jmp
, where the only caller is static int zend_jit_switch()
, which is used for match
expressions and switch
statements
case ZEND_SWITCH_LONG:
case ZEND_SWITCH_STRING:
case ZEND_MATCH:
if (!zend_jit_switch(&dasm_state, opline, op_array, ssa, NULL, NULL)) {
goto jit_failure;
}
goto done;
» grep switch Zend/bench.php
returned zero results. Same for match
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better not to change bench.php.
Micro benchmarks for switch/match may go into micro_bench.php
@nikic bump, eyes on this one please when you have some time :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This generally looks reasonable to me, though the amount of code duplication is unfortunate. I'm okay with doing this if you think the performance/memory benefit is worthwhile.
z = &ht->arData[idx].val; | ||
if (Z_TYPE_P(z) != IS_UNDEF) { | ||
break; | ||
if (HT_IS_PACKED(ht)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that we shouldn't be using HT internals here -- I've refactored the code in 6e1d9f6.
tmp_repl = &repl_ht->arData[repl_idx].val; | ||
if (Z_TYPE_P(tmp_repl) != IS_UNDEF) { | ||
break; | ||
if (HT_IS_PACKED(repl_ht)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be a recurring pattern. We could add a macro for it (or maybe it should use zend_hash_get_current_data/zend_hash_move_forward?)
* master: (378 commits) Removed debug code. JIT: Fixed use after free introduced in 92ad90a Inline "helper" macros Use zend_result as return value for DBA handler JIT: Avoid ZEND_CALL_RELEASE_THIS checks JIT_G(current_frame)->stacj is always true. JIT: Avoid dead type store [ci skip] Fix NEWS Fix #76167: mbstring may use pointer from some previous request Tracing JIT: Fixed possible endless loop when escape from ZEND_CALL_TOP frame Remove tests for obsolete Oracle DB version Remove test for obsolete Oracle DB version Fix tests for method camel case change Fix tests for method camel case change DBA should not convert elements in-place if the key param is an array Fix DBA on MacOS (php#7611) Refactor DBA Drop confusing ac local variable for ZEND_NUM_ARGS() Inline DBA_GET2 macro Inline DBA_ID_PARS macro ...
@nikic could you please take a loop once again. ZEND_HASH_FOREACH_* macros were separared into general ZEND_ARRAY_FOREACH_, packed ZEND_PACKED_FOREACH_ and hash specific ZEND_HASH_FOREACH_. May be this should be changed to keep ZEND_HASH_FOREACH_ semantic. Other improvement ideas are welcome. |
…EMENT, ZEND_ARRAY_PREV_ELEMENT macros
* master: Tracing JIT: Fixed reference counting when escape because of IS_UNDEF element (test) Tracing JIT: Fixed reference counting when escape because of IS_UNDEF element Fixed incorrect assumption about reference counting JIT: Fixed numeric string index handling JIT: Fixed register allocation in case of integer overflow Handle FETCH_DIM_R after FETCH_DIM_FUNC_ARG in inference Fix range inference hang Fix finally exception chaining on recursion Fix scdf loop var free check for phi vars Deprecate use of mbstring to convert text to Base64/QPrint/HTML entities/etc Fix self-assign evaluation order for ASSIGN_DIM_OP Accept null and 0 for microseconds argument in stream_select() Update libmysqlclient job to MySQL 8.0.27 Fix a method name `ZipArchive::clearError` typo Configuring for 8.1.0RC6 [ci skip] Fix typo (Okt → Oct) Fix bug #81026 (PHP-FPM oob R/W in root process leading to priv escalation)
…ht) ? &ht->arPacked[i] : &ht->arData[i].val"
@dstogov I've been recently making myself familiar with phpdbg watchpoints again - it's not so difficult, but it requires a couple changes to ensure we're using WATCH_ON_ZVAL instead of WATCH_ON_BUCKET for the packed values. Otherwise it's just 1-2 small changes to ambiguate between arData and arPacked access. |
@nikic I think, this is ready to be merged. Could you please take a final look. Other improvement ideas are welcome. |
Great! I didn't understand all the details of watchpints implementation. Could you take care about phpdbg, when this merged? |
Yes, I will. I also have a couple stability fixes and a watchpoint callback functaionality ready … will push this soon and then take care of this as I find time. |
On topic of tracking the changes: when a watched address is removed (or the key of a Bucket altered), the watchpoint will be recreated with information about the parent container and its name within it. (see queue_for_recreation functions etc.) In this specific case I'll trigger the recreation attempt when efree()'ing the arPacked. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to avoid packed->hash conversion in zend_hash_sort if renumbering, but we can leave that for later.
Zend/zend_API.c
Outdated
@@ -1558,7 +1558,7 @@ ZEND_API void object_properties_load(zend_object *object, HashTable *properties) | |||
zend_long h; | |||
zend_property_info *property_info; | |||
|
|||
ZEND_HASH_FOREACH_KEY_VAL(properties, h, key, prop) { | |||
ZEND_HASH_MAP_FOREACH_KEY_VAL(properties, h, key, prop) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this may not be true, e.g. the usage at
Line 1558 in 4239ec5
object_properties_load(&hash->std, Z_ARRVAL_P(members_zv)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EDIT: I see this was undone so no concerns, agree it may be packed
can probably be a packed array depending on what gets passed to unserialize()?
/* {{{ */
PHP_METHOD(SplDoublyLinkedList, __unserialize) {
spl_dllist_object *intern = Z_SPLDLLIST_P(ZEND_THIS);
HashTable *data;
zval *flags_zv, *storage_zv, *members_zv, *elem;
if (zend_parse_parameters(ZEND_NUM_ARGS(), "h", &data) == FAILURE) {
RETURN_THROWS();
}
flags_zv = zend_hash_index_find(data, 0);
storage_zv = zend_hash_index_find(data, 1);
members_zv = zend_hash_index_find(data, 2);
if (!flags_zv || !storage_zv || !members_zv ||
Z_TYPE_P(flags_zv) != IS_LONG || Z_TYPE_P(storage_zv) != IS_ARRAY ||
Z_TYPE_P(members_zv) != IS_ARRAY) {
zend_throw_exception(spl_ce_UnexpectedValueException,
"Incomplete or ill-typed serialization data", 0);
RETURN_THROWS();
}
intern->flags = (int) Z_LVAL_P(flags_zv);
ZEND_HASH_FOREACH_VAL(Z_ARRVAL_P(storage_zv), elem) {
spl_ptr_llist_push(intern->llist, elem);
} ZEND_HASH_FOREACH_END();
object_properties_load(&intern->std, Z_ARRVAL_P(members_zv));
} /* }}} */
Also, userland code can directly call functions such as $splDoubleLinkedList->__unserialize($data)
with any array value as $data[2]
getting passed as properties
to object_properties_load
.
- A separate question is whether that should be detected and forbidden if the collection is already initialized/constructed. The
Deque
RFC implementation forbids that, for example
And third party serializers may not have the same restrictions on arrays in the future, even if they currently do (igbinary currently does convert non-empty packed arrays to hashes while unserializing)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ TysonAndre this chunk is reverted. I didn't understand, do you still see some problem here?
@@ -823,7 +823,7 @@ int make_http_soap_request(zval *this_ptr, | |||
/* Send cookies along with request */ | |||
cookies = Z_CLIENT_COOKIES_P(this_ptr); | |||
ZEND_ASSERT(Z_TYPE_P(cookies) == IS_ARRAY); | |||
if (zend_hash_num_elements(Z_ARRVAL_P(cookies)) != 0) { | |||
if (zend_hash_num_elements(Z_ARRVAL_P(cookies)) != 0 && !HT_IS_PACKED(Z_ARRVAL_P(cookies))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You probably wanted to use ZEND_HASH_FOREACH_MAP below.
Merged via 90b7bde |
Store just an array of zvals without keys.
Because of smaller data set, this patch may show performance improvement
on some apps and benchmarks that use packed arrays. (~1% on PHP-Parser)
The patch has a number of disadvantages:
because they have to support both representations (hash and packed).
CLI PHP became 37KB bigger just because of these macros.