Skip to content

opcache: optimize some more functions on compile time #5870

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

brainexe
Copy link
Contributor

@brainexe brainexe commented Jul 17, 2020

I profiled our production code and found some (more or less rare) function calls which could get evaluated on compile time already + extended it by some other function calls which should always return a static result.

List: (updated one)

  • dirname(string[, int])
  • array_unique(array)
  • array_filter(array)
  • crc32(string)
  • str_replace(string, string, string)

Dummy code:

// source from official docs: https://www.php.net/manual/en/function.php-uname.php
if (strtoupper(substr(PHP_OS, 0, 3)) === 'WIN') {
    echo 'This is a server using Windows!';
} else {
    echo 'This is a server not using Windows!';
}

function getRoot()
{
    return dirname(__DIR__);
}

function getBaseCacheKey()
{
    return str_replace('a', 'z', md5(PHP_VERSION));
}

class A {
    public const BASE_ABILITIES = [
        1,
        2,
    ];
}
class B extends A {
    public const ABILITIES = [
        2,
        3,
        4,
        0,
    ];

    public function getAllAbilities(): string
    {
        return implode(',', array_filter(
            array_unique(array_merge(self::BASE_ABILITIES, self::ABILITIES)))
        );
    }
}

Opcodes before:

$_main:
     ; (lines=9, args=0, vars=0, tmps=2)
     ; (after optimizer)
     ; /www/php-src/test.php:1-144
0000 INIT_FCALL 1 96 string("strtoupper")
0001 SEND_VAL string("Lin") 1
0002 V1 = DO_ICALL
0003 T0 = IS_IDENTICAL V1 string("WIN")
0004 JMPZ T0 0007
0005 ECHO string("This is a server using Windows!")
0006 RETURN int(1)
0007 ECHO string("This is a server not using Windows!")
0008 RETURN int(1)

getRoot:
     ; (lines=1, args=0, vars=0, tmps=0)
     ; (after optimizer)
     ; /www/php-src/test.php:8-11
0000 RETURN string("/www")

getBaseCacheKey:
     ; (lines=10, args=0, vars=0, tmps=1)
     ; (after optimizer)
     ; /www/php-src/test.php:13-16
0000 INIT_FCALL 3 128 string("str_replace")
0001 SEND_VAL string("a") 1
0002 SEND_VAL string("z") 2
0003 INIT_FCALL 1 96 string("md5")
0004 SEND_VAL string("8.0.0-dev") 1
0005 V0 = DO_ICALL
0006 SEND_VAR V0 3
0007 V0 = DO_ICALL
0008 VERIFY_RETURN_TYPE V0
0009 RETURN V0
LIVE RANGES:
     0: 0008 - 0009 (tmp/var)

B::getAllAbilities:
     ; (lines=11, args=0, vars=0, tmps=1)
     ; (after optimizer)
     ; /www/php-src/test.php:32-37
0000 INIT_FCALL 2 112 string("implode")
0001 SEND_VAL string(",") 1
0002 INIT_FCALL 1 96 string("array_filter")
0003 INIT_FCALL 1 96 string("array_unique")
0004 SEND_VAL array(...) 1
0005 V0 = DO_ICALL
0006 SEND_VAR V0 1
0007 V0 = DO_ICALL
0008 SEND_VAR V0 2
0009 V0 = DO_ICALL
0010 RETURN V0

Opcodes afterwards:

$_main:
     ; (lines=2, args=0, vars=0, tmps=0)
     ; (after optimizer)
     ; /www/php-src/test.php:1-159
0000 ECHO string("This is a server not using Windows!")
0001 RETURN int(1)


getRoot:
     ; (lines=1, args=0, vars=0, tmps=0)
     ; (after optimizer)
     ; /www/php-src/test.php:8-11
0000 RETURN string("/www")

getBaseCacheKey:
     ; (lines=1, args=0, vars=0, tmps=0)
     ; (after optimizer)
     ; /www/php-src/test.php:13-16
0000 RETURN string("0583081z712641ez72e8bb3fz6d3d645")

B::getAllAbilities:
     ; (lines=2, args=0, vars=0, tmps=1)
     ; (after optimizer)
     ; /www/php-src/test.php:32-37
0000 V0 = QM_ASSIGN string("1,2,3,4")
0001 RETURN V0

- dirname(string[, int])
- md5(string)
- crc32(string)
- sha1(string)
- array_unique(array)
- array_filter(array)
- basename(string[, string])
@GrahamCampbell
Copy link
Contributor

GrahamCampbell commented Jul 18, 2020

I am not sure those hash functions are a good idea to optimise. sha1 and md5 have no use as crypto hashes and are much slower than modern hashing algorithms if you just want a quick hash with good distribution properties.

EDIT: looks like you have removed the hash functions I was concerned about now. 👍

@GrahamCampbell
Copy link
Contributor

GrahamCampbell commented Jul 18, 2020

I am 👍 for array_filter, but perhaps not any of the others, actually. array_unique is already a slow function (high complexity), and so you gain nothing in reality from optimising it in this way unless your inputs are of size 0 or 1. If you have constant stuff being passed into array_unique, I'd question why that code is like that, more than seeing if we can optimize it. Similarly for the others, having constant input seems like not a practical or common use case?

@TysonAndre
Copy link
Contributor

I am 👍 for array_filter, but perhaps not any of the others, actually. array_unique is already a slow function (high complexity), and so you gain nothing in reality from optimising it in this way unless your inputs are of size 0 or 1. If you have constant stuff being passed into array_unique, I'd question why that code is like that, more than seeing if we can optimize it. Similarly for the others, having constant input seems like not a practical or common use case?

In PHP 8 and probably 7, I think that array_unique with 2 args is of complexity O(n), not of complexity O(n*n).
The default sort type is SORT_STRING. It inserts them all into a HashTable by the numeric or string key and only does that one O(1) operation per element.

See ext/standard/array for PHP_FUNCTION(array_unique)

I'd agree that code needing to use array_unique is uncommon, but not impossible. For example, self::$privateValidTypes = array_values(array_unique(array_merge(self::TYPES_A, self::TYPES_B)));

(timing strangeness possibly due to cache locality or the time being amortized and not accounting for resizing, but it isn't quadratic)

php > function benchmark_unique(int $n) { $x = range(1, $n); $start = microtime(true); $y = array_unique($x); $end = microtime(true); printf("Elapsed: %.3f\n", $end - $start);}
php > benchmark_unique(100000);
Elapsed: 0.014
php > benchmark_unique(200000);
Elapsed: 0.023
php > benchmark_unique(300000);
Elapsed: 0.036
php > benchmark_unique(500000);
Elapsed: 0.051

- don't optimize basename(), version_compare(string, string, string)
- optimize strtolower(string), strtoupper()
@brainexe brainexe force-pushed the scpp branch 2 times, most recently from 98e5128 to 794c5fb Compare July 20, 2020 11:20
- don't optimize basename(), version_compare(string, string, string)
@brainexe
Copy link
Contributor Author

Onother live example for the dirname() optimization: Symfony dumped DIC
-> in our project we have a few dozen such service calls which are all load dirname(__DIR__, 1)

@GrahamCampbell
Copy link
Contributor

I think your updated proposed list of functions looks much better, now. 👍

@brainexe brainexe requested a review from nikic July 31, 2020 07:47
@nikic
Copy link
Member

nikic commented Nov 27, 2020

Sorry for the delay! I finally got around to rewriting this code, as the current approach seemed unnecessarily fragile (see the version_compare case found here). I have added support for throwing functions (we will simply discard the exception), so we only need to check for functions/args that are runtime dependent or can result in warnings, now done in this function:

static zend_bool can_ct_eval_func_call(zend_string *name, uint32_t num_args, zval **args) {

Would you mind rebasing on top of these changes? I think it should now be just a matter of adding functions to the list, without the need to add any new logic.

@TysonAndre
Copy link
Contributor

TysonAndre commented Nov 27, 2020

Would you mind rebasing on top of these changes? I think it should now be just a matter of adding functions to the list, without the need to add any new logic.

Those changes introduced some differences in behavior due to opcache always evaluating the code with strict_types=0 - 0ce9b5f#r44598985 , but look good if that was fixed


Obviously, logic for array_filter would need to be kept to check if there were 1 arguments - the callback has side effects

The argument counts/types for str_replace should also continue to need to be checked for arrays within arrays because those would emit notices during compilation

EDIT: Should also continue to check that there are 3 parameters for str_replace(array|string $search, array|string $replace, string|array $subject, &$count = null) - the 4th parameter is an output reference

php > var_export(str_replace(['foo'], [[]], 'foo'));

Notice: Array to string conversion in php shell code on line 1
'Array'

nikic added a commit that referenced this pull request Jul 19, 2021
@nikic
Copy link
Member

nikic commented Jul 19, 2021

I've added array_unique and str_replace from here in 273720d. dirname has already been added earlier (for non-Windows).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants