Skip to content

Attempt to fix GH-10008: Narrowing occurred during type inference of ZEND_ADD_ARRAY_ELEMENT #10294

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

nielsdos
Copy link
Member

@nielsdos nielsdos commented Jan 11, 2023

Attempt to fix GH-10008, see #10008 (comment)

This test triggers narrowing for two ops: first ZEND_ADD_ARRAY_ELEMENT, and then ZEND_ASSIGN.

The type inference happens in the following order:

The ZEND_ADD_ARRAY_ELEMENT infers type 0x40e04080 (packed flag is set), arr_type=0 at this point because it hasn't been set by ZEND_INIT_ARRAY yet
The ZEND_INIT_ARRAY infers type 0x40804080
The ZEND_ADD_ARRAY_ELEMENT infers type 0x40e04080, arr_type=0x40804080, which does not have the packed flag set while the existing result of ZEND_ADD_ARRAY_ELEMENT has the packed flag set

This seems to occur because of the phi node introduced by the while loop. If I remove the loop the problem goes away.

As Arnaud noted, this seems to be caused by a too wide type inference for arr_type==0. We should keep the invariant that if x>=y then key_type(x) >= key_type(y).
If we write the possible results down in a table we get:

arr_type           resulting key type
---------------    --------------------------------------------------------------------------
HASH_ONLY       -> MAY_BE_ARRAY_NUMERIC_HASH
PACKED_ONLY     -> MAY_BE_ARRAY_NUMERIC_HASH | MAY_BE_ARRAY_PACKED (== MAY_BE_ARRAY_KEY_LONG)
HASH || PACKED  -> MAY_BE_ARRAY_NUMERIC_HASH | MAY_BE_ARRAY_PACKED (== MAY_BE_ARRAY_KEY_LONG)
0               -> MAY_BE_ARRAY_NUMERIC_HASH | MAY_BE_ARRAY_PACKED (== MAY_BE_ARRAY_KEY_LONG)

As we can see, HASH_ONLY > 0 but MAY_BE_ARRAY_NUMERIC_HASH < MAY_BE_ARRAY_NUMERIC_HASH | MAY_BE_ARRAY_PACKED, which violates the invariant.
Instead if we modify the zero case to have MAY_BE_ARRAY_NUMERIC_HASH instead, we get the following table which satisfies the invariant.

arr_type           resulting key type
---------------    --------------------------------------------------------------------------
HASH_ONLY       -> MAY_BE_ARRAY_NUMERIC_HASH
PACKED_ONLY     -> MAY_BE_ARRAY_NUMERIC_HASH | MAY_BE_ARRAY_PACKED (== MAY_BE_ARRAY_KEY_LONG)
HASH || PACKED  -> MAY_BE_ARRAY_NUMERIC_HASH | MAY_BE_ARRAY_PACKED (== MAY_BE_ARRAY_KEY_LONG)
0               -> MAY_BE_ARRAY_NUMERIC_HASH

@arnaud-lb
Copy link
Member

Although arr_type = RES_USE_INFO() | RES_INFO() prevents the narrowing since RES_INFO() is the type of ssa_op->result_def, I don't think that we can use ssa_op->result_def to compute itself because it's not in its use chain.

MAY_BE_ARRAY_PACKED is added here:

tmp |= MAY_BE_HASH_ONLY(arr_type) ? MAY_BE_ARRAY_NUMERIC_HASH : MAY_BE_ARRAY_KEY_LONG;

arr_type is 0 in this case. When assign_dim_array_result_type is called a second time, arr_type is 0x40804080 and MAY_BE_ARRAY_PACKED is not added.

The issue may be that assign_dim_array_result_type returns a two wide type with arr_type=0. I think that we are supposed to have assign_dim_array_result_type(x) >= assign_dim_array_result_type(y) for all arr_types x and y with x >= y.

For reference, the issue has been introduced in 1ffbb73.

@nielsdos
Copy link
Member Author

Right... that makes sense

The issue may be that assign_dim_array_result_type returns a two wide type with arr_type=0. I think that we are supposed to have assign_dim_array_result_type(x) >= assign_dim_array_result_type(y) for all arr_types x and y with x >= y.

That sounds like the issue indeed and makes sense as an invariant, thanks for analyzing.

…_ARRAY_ELEMENT

This test triggers narrowing for two ops: first ZEND_ADD_ARRAY_ELEMENT,
and then ZEND_ASSIGN.

The type inference happens in the following order:
1) The ZEND_ADD_ARRAY_ELEMENT infers type 0x40e04080 (packed flag is set),
   arr_type=0 at this point because it hasn't been set by ZEND_INIT_ARRAY yet.
2) The ZEND_INIT_ARRAY infers type 0x40804080
3) The ZEND_ADD_ARRAY_ELEMENT infers type 0x40e04080, arr_type=0x40804080,
   which does not have the packed flag set while the existing result of
   ZEND_ADD_ARRAY_ELEMENT has the packed flag set.

This seems to occur because of the phi node introduced by the while
loop. If I remove the loop the problem goes away.

As Arnaud noted, this seems to be caused by a too wide type inference
for arr_type==0. We should keep the invariant that if x>=y then
key_type(x) >= key_type(y).
If we write the possible results down in a table we get:

```
arr_type           resulting key type
---------------    --------------------------------------------------------------------------
HASH_ONLY	-> MAY_BE_ARRAY_NUMERIC_HASH
PACKED_ONLY	-> MAY_BE_ARRAY_NUMERIC_HASH | MAY_BE_ARRAY_PACKED (== MAY_BE_ARRAY_KEY_LONG)
HASH || PACKED	-> MAY_BE_ARRAY_NUMERIC_HASH | MAY_BE_ARRAY_PACKED (== MAY_BE_ARRAY_KEY_LONG)
0		-> MAY_BE_ARRAY_NUMERIC_HASH | MAY_BE_ARRAY_PACKED (== MAY_BE_ARRAY_KEY_LONG)
```

As we can see, `HASH_ONLY > 0` but
`MAY_BE_ARRAY_NUMERIC_HASH < MAY_BE_ARRAY_NUMERIC_HASH | MAY_BE_ARRAY_PACKED`,
which violates the invariant.
Instead if we modify the zero case to have MAY_BE_ARRAY_NUMERIC_HASH instead,
we get the following table which satisfies the invariant.

```
arr_type           resulting key type
---------------    --------------------------------------------------------------------------
HASH_ONLY	-> MAY_BE_ARRAY_NUMERIC_HASH
PACKED_ONLY	-> MAY_BE_ARRAY_NUMERIC_HASH | MAY_BE_ARRAY_PACKED (== MAY_BE_ARRAY_KEY_LONG)
HASH || PACKED	-> MAY_BE_ARRAY_NUMERIC_HASH | MAY_BE_ARRAY_PACKED (== MAY_BE_ARRAY_KEY_LONG)
0		-> MAY_BE_ARRAY_NUMERIC_HASH
```
@nielsdos
Copy link
Member Author

nielsdos commented Sep 27, 2023

I looked at this again.
You're right I think that the inference for arr_type==0 is too wide.
I changed it in the code. I also made a common function because the same pattern was repeated a couple of times.
If we write it down in a table:

CURRENTLY

arr_type           resulting key type
---------------    --------------------------------------------------------------------------
HASH_ONLY	-> MAY_BE_ARRAY_NUMERIC_HASH
PACKED_ONLY	-> MAY_BE_ARRAY_NUMERIC_HASH | MAY_BE_ARRAY_PACKED (== MAY_BE_ARRAY_KEY_LONG)
HASH || PACKED	-> MAY_BE_ARRAY_NUMERIC_HASH | MAY_BE_ARRAY_PACKED (== MAY_BE_ARRAY_KEY_LONG)
0		-> MAY_BE_ARRAY_NUMERIC_HASH | MAY_BE_ARRAY_PACKED (== MAY_BE_ARRAY_KEY_LONG)

As we can see, HASH_ONLY > 0 but MAY_BE_ARRAY_NUMERIC_HASH < MAY_BE_ARRAY_NUMERIC_HASH | MAY_BE_ARRAY_PACKED, which violates the invariant.

AFTER THIS PATCH

Instead if we modify the zero case to have MAY_BE_ARRAY_NUMERIC_HASH instead we get the following table which satisfies the invariant.

arr_type           resulting key type
---------------    --------------------------------------------------------------------------
HASH_ONLY	-> MAY_BE_ARRAY_NUMERIC_HASH
PACKED_ONLY	-> MAY_BE_ARRAY_NUMERIC_HASH | MAY_BE_ARRAY_PACKED (== MAY_BE_ARRAY_KEY_LONG)
HASH || PACKED	-> MAY_BE_ARRAY_NUMERIC_HASH | MAY_BE_ARRAY_PACKED (== MAY_BE_ARRAY_KEY_LONG)
0		-> MAY_BE_ARRAY_NUMERIC_HASH

@nielsdos nielsdos marked this pull request as ready for review September 28, 2023 06:16
@arnaud-lb
Copy link
Member

This has the unfortunate effect of always widening MAY_BE_ARRAY_PACKED to MAY_BE_ARRAY_NUMERIC_HASH | MAY_BE_ARRAY_PACKED when adding a numeric key to the array.

I'm not sure this is a significant issue, but after looking more into it I believe that we can avoid this.

We should have the same issue with this code:

$array = ["b" => $bool];
$array[$string_key] = 123;
0006 #16.T8 = INIT_ARRAY 1 #14.CV0($bool) string("b")
0007 ASSIGN #9.CV3($array) -> #17.CV3($array) #16.T8
0008 ASSIGN_DIM #17.CV3($array) -> #18.CV3($array) #15.CV2($string_key) 
0009 OP_DATA int(123)

It turns out that this is prevented in _zend_update_type_info by skipping oplines whose operands do not have a type yet:

/* If one of the operands cannot have any type, this means the operand derives from
* unreachable code. Propagate the empty result early, so that that the following
* code may assume that operands have at least one type. */
if (!(t1 & (MAY_BE_ANY|MAY_BE_UNDEF|MAY_BE_CLASS))

  • INIT_ARRAY uses op1=#14.CV0, which has type 0 initialy (due to $bool = true; if ($bool !== true) {)
  • ASSIGN uses op2=#16.T8, which has type 0 because INIT_ARRAY was skipped
  • ASSIGN_DIM uses op1=#17.CV3, which has type 0 because ASSIGN was skipped

All oplines are skipped initially, so the problem does not happen. The oplines are visited again after $bool = false;.

In gh10008.phpt we don't skip ADD_ARRAY_ELEMENT because the array is passed in result. This seems to be the root cause. I think that we could fix this issue by skipping ADD_ARRAY_ELEMENT if the type of result is 0, in case ZEND_ADD_ARRAY_ELEMENT:.

@nielsdos
Copy link
Member Author

Thanks for the review.

This has the unfortunate effect of always widening MAY_BE_ARRAY_PACKED to MAY_BE_ARRAY_NUMERIC_HASH | MAY_BE_ARRAY_PACKED when adding a numeric key to the array.

This is a pre-existing problem though, no?
i.e. the flag MAY_BE_ARRAY_PACKED is already never returned on its own. And trying to do so when the array is packed only results in narrowing errors in ext/opcache/tests/jit/assign_dim_010.phpt.

In gh10008.phpt we don't skip ADD_ARRAY_ELEMENT because the array is passed in result. This seems to be the root cause. I think that we could fix this issue by skipping ADD_ARRAY_ELEMENT if the type of result is 0, in case ZEND_ADD_ARRAY_ELEMENT:.

By adding this in the else branch of if (opline->opcode == ZEND_INIT_ARRAY) { it also fixes the problem indeed:

if (arr_type == 0) {
    break;
}

However, I don't understand how this avoids the problem with packed that you mentioned.
The following code sets both MAY_BE_ARRAY_HASH|MAY_BE_ARRAY_PACKED on the array. Seems to be because the result (#13.T6) of INIT_ARRAY has both flags.
#13.T6 NOESC [array [long] of [false]] = INIT_ARRAY 2 (packed) #12.CV0($bool) [false] RANGE[0..1] NEXT

function test()
{
    $bool = true;
    for ($i = 0; $i < 10; $i++) {
        if ($bool !== true) {
            $array = [$bool, 123];
        }

        $bool = false;
    }
}

So I'm a bit confused honestly.

@arnaud-lb
Copy link
Member

arnaud-lb commented Sep 29, 2023

Indeed, you are right. I was not expected that MAY_BE_ARRAY_PACKED was never returned on its own. The only references to MAY_BE_PACKED outside of zend_inference.c are in zend_jit_trace.c, and appear to be used in conditions to add jit guards (for arrays that are both PACKED and HASH), so the JIT may benefit from returning just MAY_BE_PACKED in the future, as this would remove some guards.

Looks good to me then. Just a nit pick on the name array_type_to_kind: I think it's too general; what do you think of assign_long_dim_array_result_type() or something along of line of "array type after adding possibly long key"?

@nielsdos nielsdos closed this in e72fc12 Sep 29, 2023
@nielsdos
Copy link
Member Author

Thank you Arnaud, I've fixed the name to assign_long_dim_array_result_type in the merge. I also added you to the credits in NEWS as well.

@nielsdos
Copy link
Member Author

nielsdos commented Sep 29, 2023

I had to revert the fix. It passes CI tests in 8.1, but the JIT segfaults on 2 tests in 8.2 and higher.
I haven't had a chance to look into why this happens yet.
See https://github.com/php/php-src/actions/runs/6357716793/job/17269225109 for example

@nielsdos nielsdos reopened this Sep 29, 2023
@nielsdos nielsdos closed this Nov 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants