Skip to content

Infer value is not undef within foreach #4982

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

TysonAndre
Copy link
Contributor

@TysonAndre TysonAndre commented Dec 7, 2019

I discovered that opcache infers that the variable can be undefined
within the loop body (or any types prior to the loop)

This seems to be unnecessary - the sources of blocks outside of the loop body
include both FE_FETCH_R and FE_FETCH_RW.

/* Handle cases such as `$i = &$obj->typedProperty`, which can change the type of assignments */
tmp = t2;
}
// fprintf(stderr, "Previously %x for FE_FETCH_ARR at line %d\n", (int)tmp, (int)opline->lineno);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please split this into a separate PR and remove the commented code.

I also don't really understand the particular logic you used here. Possibly what you want is just tmp = (t2 & MAY_BE_REF)? That's the only thing that should be inherited from the original variable...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

I also don't really understand the particular logic you used here. Possibly what you want is just tmp = (t2 & MAY_BE_REF)? That's the only thing that should be inherited from the original variable...

I wasn't familiar with why t2 was being used or if there were edge cases that I wasn't aware of that would require t2 (e.g. if this was deliberate for some op types). Apparently not.

I discovered that opcache infers that the variable can be undefined
within the loop body (or any types prior to the loop)

This seems to be unnecessary - the sources of blocks outside of the loop body
include both FE_FETCH_R and FE_FETCH_RW.
@TysonAndre TysonAndre changed the title Infer value is not undef within foreach; Speed up ZEND_IS_IDENTICAL Infer value is not undef within foreach Dec 7, 2019
TysonAndre added a commit to TysonAndre/php-src that referenced this pull request Dec 7, 2019
Add a specialized opcode handler to use for `===`/`!==` when:

1. At least one side is a $cv, and the other is a $cv or CONST
   (avoids the need to free operands)
2. Neither operand can be undefined or a reference
   (avoids the need for error handling and dereferencing)

```
// Elapsed time decreased from 0.275 seconds to 0.243 seconds in combination
// with PR php#4982
function count_same(array $values) {
    $same = 0;
    foreach ($values as $x) {
        foreach ($values as $y) {
            if ($y === $x) {
                $same++;
            }
        }
    }
    return $same;
}
$values = range(0, 5000);
$values[] = new stdClass();
$values[] = null;
$values[] = 3;
$start = microtime(true);
$total = count_same($values);
```
@nikic
Copy link
Member

nikic commented Dec 7, 2019

Unfortunately this won't work: The old value is going to be preserved when FE_FETCH aborts the iteration, as we don't distinguish the two cases here.

It also looks like we have a related miscompile:

<?php

function test() {
    $a = ["3"];
    $x = 1;
    foreach ($a as $x) {
        $x = 2.0;
    }
    var_dump($x);
}
test();

This prints "3" instead of 2.

@TysonAndre
Copy link
Contributor Author

TysonAndre commented Dec 7, 2019

It also looks like we have a related miscompile in sccp:

That miscompile predates this PR, and can also be found in php 7.4-dev.

I also see the same issue (both with and without this PR) for this example:

<?php

function test(string $a, string $b) {
    $a = [$a];
    $x = 1;
    foreach ($a as $x) {
        $x = 2.0;
    }
    var_dump($x === 2.0);
    var_dump($x === 'y');
    var_dump($x);
}
test('x', 'y');

Opcache enabled (both 8.0 NTS and 7.4 debug zts):

bool(false)
bool(false)
string(1) "x"

No opcache (correct behavior):

bool(true)
bool(false)
float(2)

Unfortunately this won't work: The old value is going to be preserved when FE_FETCH aborts the iteration, as we don't distinguish the two cases here.

Could you give a different example - both of these examples were broken prior to this PR.

I would have hoped that the ssa graph could take the definitions from the locations of the continue [x] and break [x] statements (and implicit continue at end of loops) instead of the FE_FETCH_R opcode.

@TysonAndre
Copy link
Contributor Author

I can also reproduce this issue for the example in #4982 (comment) in php 7.2.6, and 7.3.9/7.3.14. 7.1.30 seems to be unaffected

@nikic
Copy link
Member

nikic commented Dec 7, 2019

Sorry for being unclear, the miscompile is just something I noticed while looking at this PR, it's not caused by it (though I suspect the root issue is the same).

I think we'll want to adjust SSA construction to use the pre-assignment value on the loop exit edge. This is slightly fishy in that we don't do it anywhere else, but I also don't think it should cause any issues in this case.

@nikic
Copy link
Member

nikic commented Dec 8, 2019

To clarify what I mean, currently we generate the following type-inferred SSA form for my code example:

$_main: ; (lines=3, args=0, vars=0, tmps=0, ssa_vars=0, no_loops)
    ; (before dfa pass)
    ; /home/nikic/php-src/t083.php:1-12
    ; return  [long] RANGE[1..1]
BB0: start exit lines=[0-2]
    ; level=0
            INIT_FCALL 0 192 string("test")
            DO_UCALL
            RETURN int(1)

test: ; (lines=11, args=0, vars=2, tmps=1, ssa_vars=9)
    ; (before dfa pass)
    ; /home/nikic/php-src/t083.php:3-10
    ; return  [null] RANGE[0..0]
    ; #0.CV0($a) NOVAL [undef]
    ; #1.CV1($x) NOVAL [undef]
BB0: start lines=[0-2]
    ; to=(BB3, BB1)
    ; level=0
    ; children=(BB1, BB3)
            ASSIGN #0.CV0($a) NOVAL [undef] -> #2.CV0($a) [array [long] of [string]] array(...)
            ASSIGN #1.CV1($x) NOVAL [undef] -> #3.CV1($x) [long] RANGE[1..1] int(1)
            #4.V2 [array [long] of [string]] = FE_RESET_R #2.CV0($a) [array [long] of [string]] BB3
BB1: follow target loop_header lines=[3-3]
    ; from=(BB0, BB2)
    ; to=(BB3, BB2)
    ; idom=BB0
    ; level=1
    ; children=(BB2)
        #5.CV1($x) NOVAL [long, double] = Phi(#3.CV1($x) [long] RANGE[1..1], #7.CV1($x) NOVAL [double])
            FE_FETCH_R #4.V2 [array [long] of [string]] #5.CV1($x) NOVAL [long, double] -> #6.CV1($x) [long, double, string] BB3
BB2: follow lines=[4-5]
    ; from=(BB1)
    ; to=(BB1)
    ; idom=BB1
    ; level=2
    ; loop_header=1
            ASSIGN #6.CV1($x) [long, double, string] -> #7.CV1($x) NOVAL [double] float(2)
            JMP BB1
BB3: target exit lines=[6-10]
    ; from=(BB0, BB1)
    ; idom=BB0
    ; level=1
        #8.CV1($x) [long, double, string] = Phi(#3.CV1($x) [long] RANGE[1..1], #6.CV1($x) [long, double, string])
            FE_FREE #4.V2 [array [long] of [string]]
            INIT_FCALL 1 96 string("var_dump")
            SEND_VAR #8.CV1($x) [long, double, string] 1
            DO_ICALL
            RETURN null

Note that the phi node in BB3 uses #6, which is nominally the post-assignment result of FE_FETCH. We should change SSA construction to make it use #5 instead (on the exit edge) and only use #6 on the non-exit edge.

This will allow us to treat op2 of FE_FETCH as a proper no-val use. Which we currently already mistaken do, resulting in the DCE issue and possibly other issues, see

if (opline->opcode == ZEND_FE_FETCH_R) {
return ssa_op->op2_use == var && ssa_op->op1_use != var;
}

Once this is done, we can also change type-inference as proposed here.

@dstogov
Copy link
Member

dstogov commented Dec 9, 2019

@nikic The problem, that FE_FETCH in BB1 includes ASSIGN to #6.CV1($x), that should be only in BB2. Proper fix would require splitting ASSIGN form FE_FETCH (and reduce performance). Using #5 looks wrong, because it's redefined by #6 in dominated block.

@nikic
Copy link
Member

nikic commented Dec 9, 2019

@dstogov I agree that having the separate ASSIGN (like we did in PHP 5) would be cleanest, but I'm not sure it's worth it just for the SSA representation. Using #5 instead of #6 here should be fine as far as SSA is concerned. In general using dominated values can cause issues because it breaks CSSA form (and thus may need SSA deconstruction), but because this is directly on the control-flow edge and represents existing behavior, it shouldn't be a problem here.

php-pulls pushed a commit that referenced this pull request Dec 10, 2019
For now, don't treat FE_FETCH op2 as no-val use. See GH-4982.
@nikic
Copy link
Member

nikic commented Dec 10, 2019

I've committed 87691e7 as a temporary fix for the DCE issue.

TysonAndre added a commit to TysonAndre/php-src that referenced this pull request Dec 23, 2019
Add a specialized opcode handler to use for `===`/`!==` when:

1. At least one side is a $cv, and the other is a $cv or CONST
   (avoids the need to free operands)
2. Neither operand can be undefined or a reference
   (avoids the need for error handling and dereferencing)

```
// Elapsed time decreased from 0.275 seconds to 0.243 seconds in combination
// with PR php#4982
function count_same(array $values) {
    $same = 0;
    foreach ($values as $x) {
        foreach ($values as $y) {
            if ($y === $x) {
                $same++;
            }
        }
    }
    return $same;
}
$values = range(0, 5000);
$values[] = new stdClass();
$values[] = null;
$values[] = 3;
$start = microtime(true);
$total = count_same($values);
```
@nikic
Copy link
Member

nikic commented Dec 30, 2019

I've opened #5040 to address the issue in SSA construction. I'll close this PR in favor of that one.

@nikic nikic closed this Dec 30, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants