Skip to content

Also cache the computed stack size in INIT_FCALL_BY_NAME, speed up calling global functions in other files #7761

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

TysonAndre
Copy link
Contributor

@TysonAndre TysonAndre commented Dec 11, 2021

The computed stack size only depends on the function definition fbc and the number
of opcodes in extended_value, and that doesn't change.

INIT_FCALL_BY_NAME is used when the function definition is not available when a
file is being compiled, commonly due to the function being declared in a
different file.
Precomputing the stack size is an optimization already done by INIT_FCALL
used when the function is known.

Replacing hallo in Zend/micro_bench.php with a require_once to load hallo
from a different file in a build with opcache enabled (but not the jit):

(Not sure how reliable these benchmarks are - they're possibly subject to power management affecting performance or some other factors)

Before (best run) - total time and subtracting time needed for an empty loop
empty_loop         0.162
func_elsewhere()   0.615    0.475

After (best run):
empty_loop         0.159 (unaffected, just noise)
func_elsewhere()   0.579    0.420
<?php  // other.php, loaded with require_once __DIR__ . '/other.php';
function hallo() {}

Note that opcache eliminates calls to empty functions if the empty function is defined within the same file, so func() would be doing the same thing as empty_loop when opcache was enabled and take the same time as empty_loop.

I noticed this optimization while reviewing #7728 .
Extensions that modify the function definition in place can clear the run time cache entries corresponding to the opline of INIT_FCALL_BY_NAME (set CACHED_PTR to 0) if the function name matches, to force the stack size to be recomputed.

@Girgias
Copy link
Member

Girgias commented Dec 11, 2021

One of the test failures on Travis seems to be related:

========DIFF========

--

       [2]=>

       int(9)

     }

049+ Segmentation fault

050+ 

051+ Termsig=11

049- array(3) {

050-   [0]=>

051-   int(10)

052-   [1]=>

053-   int(11)

054-   [2]=>

055-   int(12)

056- }

057- array(3) {

058-   [0]=>

059-   int(1)

060-   [1]=>

061-   int(2)

062-   [2]=>

063-   int(3)

064- }

========DONE========

FAIL Statics in nested functions & evals. [tests/lang/static_variation_001.phpt] 

@TysonAndre
Copy link
Contributor Author

TysonAndre commented Dec 12, 2021

ext/opcache/jit/zend_jit_x86.dasc (and other platforms) seems like they also needs to be modified - it's possible the test failures are specific to the jit - I hadn't modified the code path yet, but I need to update that as well in zend_jit_init_fcall.

			} else if (opline->opcode == ZEND_INIT_FCALL_BY_NAME) {
				|	LOAD_ADDR FCARG1a, Z_STR_P(zv + 1);
				|	lea FCARG2a, aword [r2 + opline->result.num]
				|	EXT_CALL zend_jit_find_func_helper, r0

Edit: Still, it's strange - is it possible for something to switch from the JIT to non-JIT? Or is this some other error - https://www.php.net/manual/en/opcache.configuration.php#ini.opcache.jit

Edit: I can reproduce this crash in make test TESTS='-d opcache.jit_buffer_size=16M -d opcache.enable=1 -d opcache.enable_cli=1 --show-diff -d opcache.protect_memory=1 -j8 -d opcache.jit=tracing tests/lang --repeat 2' in an NTS debug build - adding the --repeat 2 is required

    --repeat [n]
                Run the tests multiple times in the same process and check the
                output of the last execution (CLI SAPI only).

The computed stack size only depends on the function definition fbc and the number
of opcodes in extended_value, and that doesn't change.

INIT_FCALL_BY_NAME is used when the function definition is not available when a
file is being compiled, commonly due to the function being declared in a
different file.
Precomputing the stack size is an optimization already done by INIT_FCALL
used when the function is known.

Replacing `hallo` in Zend/micro_bench.php with a require_once to load `hallo`
from a different file in a build with opcache enabled (but not the jit):

```
Before (best run) - total time and subtracting time needed for an empty loop
empty_loop         0.162
func_elsewhere()   0.615    0.475

After (best run):
empty_loop         0.159 (unaffected, just noise)
func_elsewhere()   0.579    0.420
```
@TysonAndre
Copy link
Contributor Author

This is ready for review - https://dev.azure.com/phpazuredevops/PHP/_build/results?buildId=22204&view=logs&j=0d3730b8-0d59-574b-ab4a-5e189a74d53b&t=492e5677-f211-533f-274d-3a4af39d4cce just seems to have been slow overall and hit the 60 minute timeout. E.g. the build step took 3 times longer than normal compared to the unrelated previous jobs - https://dev.azure.com/phpazuredevops/PHP/_build/results?buildId=22220&view=logs&j=0d3730b8-0d59-574b-ab4a-5e189a74d53b

To make this easier to review or bisect if needed, I'd rather start using the precomputed cache size in the JIT in a separate commit

|.if X64
| LOAD_ADDR CARG3, opline
|.else
| sub r4, 12
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused about why it's 12 bytes in other calls I've seen with 3 arguments rather than something smaller for a 4-byte pointer (in the examples I used as reference), but even if it was too large it wouldn't cause bugs

@TysonAndre TysonAndre requested review from dstogov and nikic December 13, 2021 14:55
@dstogov
Copy link
Member

dstogov commented Dec 13, 2021

I think it doesn't make a lot of sense to implement this optimization for INIT_FCALL_BY_NAME only.

The same idea may be also applied for INIT_METHOD_CALL, INIT_STATIC_METHOD_CALL, INIT_NS_FCALL_BY_NAME and NEW. Then we may measure impact on performance of real-life apps.

I'm not sure, if this is a good idea or not, yet. Caching is always a compromise between the cost of calculation and memory consumption. Sometimes, it's cheaper to perform a branch-less calculation, than save a few CPU cycles, but extend cache size and get much more significant penalty because of cache misses. This needs to be checked.

I'll help with JIT related changes if this will speed up real-life apps without JIT.

@TysonAndre
Copy link
Contributor Author

The same idea may be also applied for INIT_METHOD_CALL, INIT_STATIC_METHOD_CALL, INIT_NS_FCALL_BY_NAME and NEW. Then we may measure impact on performance of real-life apps.

INIT_NS_FCALL_BY_NAME is possible, but I wasn't sure if rebinding self/static/new would work for method calls/new (IS_UNUSED), and definitely not for others (e.g. vars from ZEND_FETCH_CLASS (new $dynamic())), but may be possible for IS_CONST.

  • Looking at this again, I think I see what you mean - CACHED_POLYMORPHIC_PTR seems to allow this by checking if the ce of self still matches - a third argument could be added with the stack size in method calls.

@dstogov
Copy link
Member

dstogov commented Dec 15, 2021

  • Looking at this again, I think I see what you mean - CACHED_POLYMORPHIC_PTR seems to allow this by checking if the ce of self still matches - a third argument could be added with the stack size in method calls.

yeah

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants