-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Proposal: Run the Control Flow Graph pass twice #5097
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Run the Control Flow Graph optimizations again, after all of the type inference steps and optimizations have run. Passes run after CFG eliminate various redundant conditions, infer types and constants for values, and can also change or remove opcodes, ensuring that the Control Flow Graph optimizations can optimize code further, and lets those passes do transformations relying on the subsequent optimizations implemented in the Control Flow Graph code. (without duplicating the implementation) This is meant as a proof of concept, - I'm not sure of the best way to make the debug output (e.g. pass numbers) follow conventions. - I'm not familiar with the data structures used. I'm assuming it's safe to run pass 5 at that point. - There are at most 3 passes on the first run of CFG. (see `#define PASSES 3` in block_pass.c) Maybe the second run of CFG should have a lower limit. Tests were updated: - CFG can eliminate unnecessary jumps - CFG optimized the calls to ECHO into a single call with a string.
It may be better, to extend DFA pass to perform the missed optimizations. @nikic you had some thoughts about Optimizer improvement...? |
I'll look into that. I found out why the tests fail - it's probably because pass 5 doesn't adjust all of the addresses of caller_init_opline - It mallocs a new array of oplines, but doesn't update the pointers to the old one. There may be other issues.
|
That would have some drawbacks which made me hesitant to propose doing it that way:
Another option would be to add another ini directive, e.g. |
Instead directly use return_value. Hopefully this also works around the "may be uninitialized" warning...
The remaining build failure looks like it wasn't caused by my PR itself, but instead by the fact that opcache had to recompile instead of using file_cache (or some other reason) - the implementation of array_reduce changed.
|
My general preference would be to move optimizations from block pass to SSA pass where possible. There are some things that need to happen before SSA construction (e.g. optimization of |
So, I don't think, it makes sense to "move" optimizations, but better "duplicate" them, or separate valuable patterns and call them from both passes. |
@dstogov I don't remember the details, but IIRC block pass can be much more expensive than the DFA passes, because the variable source tracking it does is so inefficent. Just zeroing out the source buffer for each block is very expensive if there are many temporaries. (We could compact temporaries earlier, but I think block pass relies on unique temporaries for some of its optimizations.) That's why I think that any optimizations related to variables should be in DFA. Of course, jump optimizations should be in block pass. |
I'm stuck on how that optimization would best be implemented outside of the block pass. Is there a way to (efficiently) tell (during the SCCP phase) if an opcode other than the preceding opcode will also jump to a given opcode? If so, does this information get updated after no-ops are removed?
I see mentions of predecessors and successors, but it looks like it's for data flow, not for code flow. (and I'd guess IS_CONST wouldn't have data flow) E.g. how would I tell if the zend_opline for
<?php
function main(bool $flag) {
if ($flag) {
echo "True";
}
mid:
echo "Suffix";
}
|
@TysonAndre The DFA pass also builds a CFG, so it should have access to the same information as the block pass. The CFG is stored here: php-src/ext/opcache/Optimizer/zend_ssa.h Line 133 in 3987a31
|
And filter out echoes of the empty string (e.g. false/null) Split out of php#5097
And filter out echoes of the empty string (e.g. false/null) Split out of php#5097
And filter out echoes of the empty string (e.g. false/null) Split out of php#5097
Looking again, some of the optimizations in the block pass seem unsound to use the way this PR will start using them, such as merging ZEND_ECHO calls. (not just less effective) e.g. if $x = self::SOME_CONST, and echo $x; is used multiple times, then ZEND_OP1_LITERAL(last_op) might be referring to the same id. if (Z_TYPE(ZEND_OP1_LITERAL(opline)) != IS_STRING) {
convert_to_string(&ZEND_OP1_LITERAL(opline));
}
if (Z_TYPE(ZEND_OP1_LITERAL(last_op)) != IS_STRING) {
convert_to_string(&ZEND_OP1_LITERAL(last_op));
}
old_len = Z_STRLEN(ZEND_OP1_LITERAL(last_op));
l = old_len + Z_STRLEN(ZEND_OP1_LITERAL(opline));
if (!Z_REFCOUNTED(ZEND_OP1_LITERAL(last_op))) {
zend_string *tmp = zend_string_alloc(l, 0);
memcpy(ZSTR_VAL(tmp), Z_STRVAL(ZEND_OP1_LITERAL(last_op)), old_len);
Z_STR(ZEND_OP1_LITERAL(last_op)) = tmp;
} else {
Z_STR(ZEND_OP1_LITERAL(last_op)) = zend_string_extend(Z_STR(ZEND_OP1_LITERAL(last_op)), l, 0);
} If I'm reading it right, that pass assumes a given literal id for some opcode types gets used in exactly one place, and that it's safe to modify the zval of the constant. Also, closing this PR because this PR isn't the way to go, and some of the work is started. If you think any of these tasks from the PR discussion should get done, file an issue.
|
And filter out echoes of the empty string (e.g. false/null) Split out of php#5097
And filter out echoes of the empty string (e.g. false/null) Split out of php#5097
Currently, it isn't possible to enable optimizations without enabling caching. They should be orthogonal features - it's already possible to cache without optimization passes (`opcache.optimization_level=0`) Being forced to either enable caching or file_cache_only causes these problems: - For file_cache_only, users would be forced to manage the file cache, and users may not be confident in properly handling all potential edge cases. (Running out of disk space, need to clear stale entries, concerns about opcode corruption not being fixed after restarting a process, etc) - The shared memory block uses up a lot of RAM when individual PHP processes are known to be long-lived and don't have a common memory address space. (e.g. multiple long-lived php scripts managed by something that is not a php script (e.g. `supervisord`, daemons, tools run in the background in IDEs, etc)) The opcodes are duplicated in shmem, which wastes memory in use cases where they're stored in the cache but never retrieved. The default for opcache.memory_consumption is 128M. Even when this isn't used, the virtual memory to track the shared memory(shmem) segment seems to add 2MB extra per **independent** php process in "shared memory" segments, reducing the free RAM available for other processes. (starting a large number of php CLI scripts that sleep() in a loop, `free` (linux program to report free memory) reports that `shared` increases by 2MB per process without opcache.no_cache, but barely increases with opcache.no_cache=1 On an unrelated PR php#5097 (comment) dstogov mentioned that > Also, it would be great to move optimizer and JIT into core, to make them > available even without opcode caching. This PR is one potential approach for making the optimizer and JIT available without opcode caching. This is currently a proof of concept - feel free to point out combinations of settings that should be rejected. Potential areas for future work: - Normally, opcache optimizes a file based only on that one file's contents. When `opcache.no_cache=1` is used, it may be possible to use all of the class, function, constant, etc. definitions parsed from previously parsed files (to eliminate dead code, inline function calls, etc).
Currently, it isn't possible to enable optimizations without enabling caching. They should be orthogonal features - it's already possible to cache without optimization passes (`opcache.optimization_level=0`) (In use cases such as running a web server in apache, it's very desirable to both cache and optimize opcodes. For short-lived cli scripts, either file_cache or disabling opcache (to reduce optimization overhead) is often useful. In a few use cases, it's desirable to optimize without any caching) Being forced to either enable shared memory caching or file_cache_only causes these problems: - **For file_cache_only, users would be forced to manage the file cache.** - Users may not be confident in properly handling all potential edge cases. - End users of applications may not be familiar with opcache. (Concerns include running out of disk space, needing to clear stale entries, concerns about opcode corruption or opcodes for the wrong file contents not being fixed after restarting a process, etc) - **The shared memory block uses up a lot of RAM when individual PHP processes are known to be long-lived and don't have a common memory address space.** (e.g. multiple long-lived php scripts managed by something that is not a php script (e.g. `supervisord`, daemons, tools run in the background in IDEs, etc)) The opcodes are duplicated in shmem, which wastes memory in use cases where opcodes are stored in the cache but never retrieved. The default for opcache.memory_consumption is 128M. Even when this isn't used, the virtual memory to track the shared memory(shmem) segment seems to add 2MB extra per **independent** php process in "shared memory" segments, reducing the free RAM available for other processes. (starting a large number of php CLI scripts that sleep() in a loop, `free` (linux program to report free memory) reports that `shared` increases by 2MB per process without opcache.no_cache, but barely increases with opcache.no_cache=1 `opcache.no_cache` takes precedence over `opcache.file_cache*` settings. (If enabled, neither the shared memory cache nor the file cache will be used) On an unrelated PR php#5097 (comment) dstogov mentioned that > Also, it would be great to move optimizer and JIT into core, to make them > available even without opcode caching. This PR is one potential approach for making the optimizer and JIT available without opcode caching. - Even if everything except caching was moved into core, It would still be useful to have a configurable way to disable opcode caching via a system ini setting (`php -d opcache.no_cache=1 my_script.php`) This is currently a proof of concept - feel free to point out combinations of settings that should be rejected. Potential areas for future work: - Normally, opcache optimizes a file based only on that one file's contents. When `opcache.no_cache=1` is used, it may be possible to use all of the class, function, constant, etc. definitions parsed from previously parsed files (to eliminate dead code, inline function calls, etc).
Currently, it isn't possible to enable optimizations without enabling caching. They should be orthogonal features - it's already possible to cache without optimization passes (`opcache.optimization_level=0`) (In use cases such as running a web server in apache, it's very desirable to both cache and optimize opcodes. For short-lived cli scripts, either file_cache or disabling opcache (to reduce optimization overhead) is often useful. In a few use cases, it's desirable to optimize without any caching) Being forced to either enable shared memory caching or file_cache_only causes these problems: - **For file_cache_only, users would be forced to manage the file cache.** - Users may not be confident in properly handling all potential edge cases. - End users of applications may not be familiar with opcache. (Concerns include running out of disk space, needing to clear stale entries, concerns about opcode corruption or opcodes for the wrong file contents not being fixed after restarting a process, etc) - **The shared memory block uses up a lot of RAM when individual PHP processes are known to be long-lived and don't have a common memory address space.** (e.g. multiple long-lived php scripts managed by something that is not a php script (e.g. `supervisord`, daemons, tools run in the background in IDEs, etc)) The opcodes are duplicated in shmem, which wastes memory in use cases where opcodes are stored in the cache but never retrieved. The default for opcache.memory_consumption is 128M. Even when this isn't used, the virtual memory to track the shared memory(shmem) segment seems to add 2MB extra per **independent** php process in "shared memory" segments, reducing the free RAM available for other processes. (starting a large number of php CLI scripts that sleep() in a loop, `free` (linux program to report free memory) reports that `shared` increases by 2MB per process without opcache.no_cache, but barely increases with opcache.no_cache=1 `opcache.no_cache` takes precedence over `opcache.file_cache*` settings. (If enabled, neither the shared memory cache nor the file cache will be used. It is an error to use both opcache.no_cache and opcache.preload.) On an unrelated PR php#5097 (comment) dstogov mentioned that > Also, it would be great to move optimizer and JIT into core, to make them > available even without opcode caching. This PR is one potential approach for making the optimizer and JIT available without opcode caching. - Even if everything except caching was moved into core, It would still be useful to have a configurable way to disable opcode caching via a system ini setting (`php -d opcache.no_cache=1 my_script.php`) This is currently a proof of concept - feel free to point out combinations of settings that should be rejected. Potential areas for future work: - Normally, opcache optimizes a file based only on that one file's contents. When `opcache.no_cache=1` is used, it may be possible to use all of the class, function, constant, etc. definitions parsed from previously parsed files (to eliminate dead code, inline function calls, etc).
Currently, it isn't possible to enable optimizations without enabling caching. They should be orthogonal features - it's already possible to cache without optimization passes (`opcache.optimization_level=0`) (In use cases such as running a web server in apache, it's very desirable to both cache and optimize opcodes. For short-lived cli scripts, either file_cache or disabling opcache (to reduce optimization overhead) is often useful. In a few use cases, it's desirable to optimize without any caching) Being forced to either enable shared memory caching or file_cache_only causes these problems: - **For file_cache_only, users would be forced to manage the file cache.** - Users may not be confident in properly handling all potential edge cases. - End users of applications may not be familiar with opcache. (Concerns include running out of disk space, needing to clear stale entries, concerns about opcode corruption or opcodes for the wrong file contents not being fixed after restarting a process, etc) - **The shared memory block uses up a lot of RAM when individual PHP processes are known to be long-lived and don't have a common memory address space.** (e.g. multiple long-lived php scripts managed by something that is not a php script (e.g. `supervisord`, daemons, tools run in the background in IDEs, etc)) The opcodes are duplicated in shmem, which wastes memory in use cases where opcodes are stored in the cache but never retrieved. The default for opcache.memory_consumption is 128M. Even when this isn't used, the virtual memory to track the shared memory(shmem) segment seems to add 2MB extra per **independent** php process in "shared memory" segments, reducing the free RAM available for other processes. (starting a large number of php CLI scripts that sleep() in a loop, `free` (linux program to report free memory) reports that `shared` increases by 2MB per process without opcache.no_cache, but barely increases with opcache.no_cache=1 `opcache.no_cache` takes precedence over `opcache.file_cache*` settings. (If enabled, neither the shared memory cache nor the file cache will be used. It is an error to use both opcache.no_cache and opcache.preload.) On an unrelated PR php#5097 (comment) dstogov mentioned that > Also, it would be great to move optimizer and JIT into core, to make them > available even without opcode caching. This PR is one potential approach for making the optimizer and JIT available without opcode caching. - Even if everything except caching was moved into core, It would still be useful to have a configurable way to disable opcode caching via a system ini setting (`php -d opcache.no_cache=1 my_script.php`) This is currently a proof of concept - feel free to point out combinations of settings that should be rejected. Potential areas for future work: - Normally, opcache optimizes a file based only on that one file's contents. When `opcache.no_cache=1` is used, it may be possible to use all of the class, function, constant, etc. definitions parsed from previously parsed files (to eliminate dead code, inline function calls, etc).
Run the Control Flow Graph optimizations again,
after all of the type inference steps and optimizations (e.g. SCCP) have run.
Passes run after CFG eliminate various redundant conditions,
infer types and constants for values,
and can also change or remove opcodes,
ensuring that the Control Flow Graph optimizations
can optimize code further,
and lets those passes do transformations
relying on the subsequent optimizations
implemented in the Control Flow Graph code.
(without duplicating the implementation)
This is meant as a proof of concept,
I'm not sure of the best way to make the debug output
(e.g. pass numbers) follow conventions.
I'm not familiar with the data structures used.
I'm assuming it's safe to run pass 5 at that point.
There are at most 3 passes on the first run of CFG.
(see
#define PASSES 3
in block_pass.c)Maybe the second run of CFG should have a lower limit.
I'm not sure if the passes are done only once for a different reason (e.g. correctness, reduce startup time, etc).
Hopefully, that's not the case (e.g. opcache shared memory cache and file cache help with performance).
I didn't see any similar proposals after a quick search of PRs/issue trackers for "opcache"/"pass"/"cfg"
Tests were updated:
EDIT: INIT_FCALL seems to have the wrong stack size after this patch, investigating (https://ci.appveyor.com/project/php/php-src/builds/30241892/job/1swrsqwxnxkcuhkd/tests)
It was previously 192 and 144, and now it's wrong.