Skip to content

zend_compile: Add support for %d to sprintf() optimization #14561

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jun 17, 2024

Conversation

TimWolla
Copy link
Member

This is a follow-up for #14546 that I announced in #14546 (comment)


This extends the existing sprintf() optimization by support for the %d placeholder, which effectively equivalent to an (int) cast followed by a (string) cast.

For a synthetic test using:

<?php

$a = 'foo';
$b = 42;

for ($i = 0; $i < 100_000_000; $i++) {
    sprintf("%s-%d", $a, $b);
}

This optimization yields a 1.3× performance improvement:

$ hyperfine 'sapi/cli/php -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php' \
      '/tmp/unoptimized -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php'
Benchmark 1: sapi/cli/php -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php
  Time (mean ± σ):      3.296 s ±  0.094 s    [User: 3.287 s, System: 0.005 s]
  Range (min … max):    3.213 s …  3.527 s    10 runs

Benchmark 2: /tmp/unoptimized -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php
  Time (mean ± σ):      4.300 s ±  0.025 s    [User: 4.290 s, System: 0.007 s]
  Range (min … max):    4.266 s …  4.334 s    10 runs

Summary
  sapi/cli/php -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php ran
    1.30 ± 0.04 times faster than /tmp/unoptimized -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php

TimWolla added 2 commits June 13, 2024 13:47
…` in `zend_compile_func_sprintf()`

This is intended to make the diff of a follow-up commit smaller.
This extends the existing `sprintf()` optimization by support for the `%d`
placeholder, which effectively equivalent to an `(int)` cast followed by a
`(string)` cast.

For a synthetic test using:

    <?php

    $a = 'foo';
    $b = 42;

    for ($i = 0; $i < 100_000_000; $i++) {
        sprintf("%s-%d", $a, $b);
    }

This optimization yields a 1.3× performance improvement:

    $ hyperfine 'sapi/cli/php -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php' \
          '/tmp/unoptimized -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php'
    Benchmark 1: sapi/cli/php -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php
      Time (mean ± σ):      3.296 s ±  0.094 s    [User: 3.287 s, System: 0.005 s]
      Range (min … max):    3.213 s …  3.527 s    10 runs

    Benchmark 2: /tmp/unoptimized -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php
      Time (mean ± σ):      4.300 s ±  0.025 s    [User: 4.290 s, System: 0.007 s]
      Range (min … max):    4.266 s …  4.334 s    10 runs

    Summary
      sapi/cli/php -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php ran
        1.30 ± 0.04 times faster than /tmp/unoptimized -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php
@TimWolla TimWolla requested a review from iluuu1994 as a code owner June 13, 2024 12:27
@arnaud-lb
Copy link
Member

Nice! It would be interesting to see benchmark results with more placeholders, and larger ints. I suspect the improvement may be larger for cases where sprintf() has to realloc its initial buffer.

@TimWolla
Copy link
Member Author

For

<?php

$a = 'foo';
$b = PHP_INT_MAX;

for ($i = 0; $i < 10_000_000; $i++) {
    sprintf("%s-%d-%s-%d-%s-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------%d-%s-%d-%s-%d", $a, $b, $a, $b, $a, $b, $a, $b, $a, $b);
}

the improvement actually is smaller (also note how I needed to reduce the number of iterations by an order of magnitude):

$ hyperfine 'sapi/cli/php -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php' \
      '/tmp/unoptimized -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php'
Benchmark 1: sapi/cli/php -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php
  Time (mean ± σ):      2.073 s ±  0.038 s    [User: 2.068 s, System: 0.005 s]
  Range (min … max):    2.023 s …  2.164 s    10 runs
 
Benchmark 2: /tmp/unoptimized -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php
  Time (mean ± σ):      2.399 s ±  0.023 s    [User: 2.394 s, System: 0.004 s]
  Range (min … max):    2.368 s …  2.442 s    10 runs
 
Summary
  sapi/cli/php -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php ran
    1.16 ± 0.02 times faster than /tmp/unoptimized -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php

For this branch:

$ sudo perf report --stdio
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 8K of event 'cpu_core/cycles:P/'
# Event count (approx.): 7989105747
#
# Overhead  Command  Shared Object         Symbol                                   
# ........  .......  ....................  .........................................
#
    48.64%  php      php                   [.] zend_long_to_str
    12.20%  php      php                   [.] ZEND_ROPE_END_SPEC_TMP_TMPVAR_HANDLER
     9.68%  php      libc.so.6             [.] __memmove_avx_unaligned_erms
     5.32%  php      php                   [.] execute_ex
     4.08%  php      php                   [.] _efree
     4.04%  php      php                   [.] _emalloc
     2.94%  php      php                   [.] ZEND_CAST_SPEC_CV_HANDLER
     2.79%  php      php                   [.] memcpy@plt
     2.41%  php      php                   [.] ZEND_ROPE_ADD_SPEC_TMP_CONST_HANDLER
     2.05%  php      php                   [.] zval_get_string_func
     1.87%  php      php                   [.] ZEND_ROPE_ADD_SPEC_TMP_TMPVAR_HANDLER
     1.61%  php      php                   [.] ZEND_ROPE_ADD_SPEC_TMP_CV_HANDLER
     0.83%  php      php                   [.] ZEND_FREE_SPEC_TMPVAR_HANDLER
     0.67%  php      php                   [.] ZEND_ROPE_INIT_SPEC_UNUSED_CV_HANDLER
     0.33%  php      php                   [.] rc_dtor_func
     0.23%  php      [kernel.kallsyms]     [k] acpi_os_read_port
     0.05%  php      ld-linux-x86-64.so.2  [.] do_lookup_x
     0.05%  php      [kernel.kallsyms]     [k] security_inode_permission
     0.04%  php      [kernel.kallsyms]     [k] __pte_offset_map_lock
     0.04%  php      [kernel.kallsyms]     [k] __handle_mm_fault
     0.03%  php      libc.so.6             [.] __memset_avx2_unaligned_erms
     0.03%  php      [kernel.kallsyms]     [k] percpu_counter_add_batch
     0.02%  php      [kernel.kallsyms]     [k] advance_transaction
     0.01%  php      php                   [.] zend_hash_destroy
     0.01%  php      [kernel.kallsyms]     [k] folio_mark_accessed
     0.01%  php      libc.so.6             [.] malloc_consolidate
     0.01%  php      [kernel.kallsyms]     [k] zap_pte_range
     0.01%  php      [kernel.kallsyms]     [k] cpuacct_account_field
     0.00%  php      [kernel.kallsyms]     [k] flush_signal_handlers
     0.00%  php      [kernel.kallsyms]     [k] nmi_restore
     0.00%  perf-ex  [kernel.kallsyms]     [k] __rcu_read_unlock
     0.00%  php      [kernel.kallsyms]     [k] nmi_handle
     0.00%  perf-ex  [kernel.kallsyms]     [k] perf_sample_event_took
     0.00%  php      [kernel.kallsyms]     [k] sched_clock_noinstr
     0.00%  perf-ex  [kernel.kallsyms]     [k] native_sched_clock
     0.00%  perf-ex  [kernel.kallsyms]     [k] native_apic_mem_write
     0.00%  perf-ex  [kernel.kallsyms]     [k] perf_ctx_enable
     0.00%  php      [kernel.kallsyms]     [k] _raw_spin_unlock
     0.00%  php      [kernel.kallsyms]     [k] native_apic_mem_write


# Samples: 133  of event 'cpu_atom/cycles:P/'
# Event count (approx.): 91732603
#
# Overhead  Command  Shared Object      Symbol                                   
# ........  .......  .................  .........................................
#
    49.81%  php      php                [.] zend_long_to_str
     9.57%  php      php                [.] ZEND_ROPE_END_SPEC_TMP_TMPVAR_HANDLER
     8.06%  php      libc.so.6          [.] __memmove_avx_unaligned_erms
     6.82%  php      php                [.] _emalloc
     4.94%  php      php                [.] execute_ex
     4.86%  php      php                [.] ZEND_ROPE_ADD_SPEC_TMP_CV_HANDLER
     4.55%  php      php                [.] _efree
     2.83%  php      php                [.] ZEND_CAST_SPEC_CV_HANDLER
     2.82%  php      php                [.] ZEND_ROPE_ADD_SPEC_TMP_TMPVAR_HANDLER
     2.13%  php      php                [.] zval_get_string_func
     2.11%  php      php                [.] ZEND_ROPE_ADD_SPEC_TMP_CONST_HANDLER
     0.70%  php      php                [.] ZEND_ROPE_INIT_SPEC_UNUSED_CV_HANDLER
     0.70%  php      php                [.] rc_dtor_func
     0.09%  php      [kernel.kallsyms]  [k] rseq_ip_fixup
     0.00%  php      [kernel.kallsyms]  [k] nmi_cpu_backtrace_handler
     0.00%  php      [kernel.kallsyms]  [k] native_sched_clock
     0.00%  php      [kernel.kallsyms]  [k] intel_pmu_handle_irq
     0.00%  php      [kernel.kallsyms]  [k] perf_ctx_enable

For master:

$ sudo perf report --stdio
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 9K of event 'cpu_core/cycles:P/'
# Event count (approx.): 9063440188
#
# Overhead  Command      Shared Object         Symbol                             
# ........  ...........  ....................  ...................................
#
    26.65%  unoptimized  unoptimized           [.] php_sprintf_appendint
    21.62%  unoptimized  unoptimized           [.] php_formatted_print
    14.98%  unoptimized  unoptimized           [.] php_sprintf_appendstring
     8.58%  unoptimized  unoptimized           [.] execute_ex
     8.52%  unoptimized  libc.so.6             [.] __memmove_avx_unaligned_erms
     6.68%  unoptimized  unoptimized           [.] php_sprintf_appendchars
     4.90%  unoptimized  libc.so.6             [.] __memchr_avx2
     1.72%  unoptimized  unoptimized           [.] memcpy@plt
     1.55%  unoptimized  unoptimized           [.] _erealloc
     0.99%  unoptimized  unoptimized           [.] zif_sprintf
     0.89%  unoptimized  unoptimized           [.] memchr@plt
     0.87%  unoptimized  unoptimized           [.] __ctype_b_loc@plt
     0.64%  unoptimized  libc.so.6             [.] __ctype_b_loc
     0.59%  unoptimized  unoptimized           [.] _efree
     0.33%  unoptimized  unoptimized           [.] _emalloc_320
     0.13%  unoptimized  unoptimized           [.] rc_dtor_func
     0.07%  unoptimized  unoptimized           [.] zend_register_functions
     0.04%  unoptimized  ld-linux-x86-64.so.2  [.] strcmp
     0.04%  unoptimized  libc.so.6             [.] __libc_sigaction
     0.03%  unoptimized  [kernel.kallsyms]     [k] __perf_addr_filters_adjust
     0.03%  unoptimized  [kernel.kallsyms]     [k] folio_add_lru
     0.02%  unoptimized  libc.so.6             [.] _IO_file_underflow@@GLIBC_2.2.5
     0.02%  unoptimized  [kernel.kallsyms]     [k] kick_ilb
     0.02%  unoptimized  [kernel.kallsyms]     [k] native_write_msr
     0.01%  unoptimized  [kernel.kallsyms]     [k] psi_task_change
     0.01%  unoptimized  unoptimized           [.] zend_type_release
     0.01%  unoptimized  [kernel.kallsyms]     [k] task_mm_cid_work
     0.01%  unoptimized  [kernel.kallsyms]     [k] tick_program_event
     0.01%  unoptimized  [kernel.kallsyms]     [k] __hrtimer_run_queues
     0.01%  unoptimized  [kernel.kallsyms]     [k] task_work_add
     0.01%  unoptimized  [kernel.kallsyms]     [k] __folio_cancel_dirty
     0.01%  unoptimized  [kernel.kallsyms]     [k] irq_enter_rcu
     0.01%  unoptimized  libgcc_s.so.1         [.] 0x000000000001df08
     0.01%  unoptimized  [kernel.kallsyms]     [k] _compound_head
     0.00%  unoptimized  [kernel.kallsyms]     [k] perf_iterate_ctx
     0.00%  unoptimized  [kernel.kallsyms]     [k] ct_nmi_exit
     0.00%  unoptimized  [kernel.kallsyms]     [k] nmi_restore
     0.00%  perf-exec    [kernel.kallsyms]     [k] native_sched_clock
     0.00%  unoptimized  [kernel.kallsyms]     [k] perf_sample_event_took
     0.00%  perf-exec    [kernel.kallsyms]     [k] perf_sample_event_took
     0.00%  unoptimized  [kernel.kallsyms]     [k] sched_clock_noinstr
     0.00%  unoptimized  [kernel.kallsyms]     [k] native_sched_clock
     0.00%  unoptimized  [kernel.kallsyms]     [k] intel_pmu_handle_irq
     0.00%  unoptimized  [kernel.kallsyms]     [k] perf_ctx_enable
     0.00%  perf-exec    [kernel.kallsyms]     [k] intel_pmu_handle_irq
     0.00%  perf-exec    [kernel.kallsyms]     [k] perf_ctx_enable


# Samples: 23  of event 'cpu_atom/cycles:P/'
# Event count (approx.): 17720139
#
# Overhead  Command      Shared Object      Symbol                      
# ........  ...........  .................  ............................
#
    49.49%  unoptimized  unoptimized        [.] php_sprintf_appendint
    29.13%  unoptimized  unoptimized        [.] php_formatted_print
     9.16%  unoptimized  unoptimized        [.] memcpy@plt
     6.11%  unoptimized  unoptimized        [.] execute_ex
     5.71%  unoptimized  unoptimized        [.] php_sprintf_appendstring
     0.38%  unoptimized  [kernel.kallsyms]  [k] __get_user_8
     0.02%  unoptimized  [kernel.kallsyms]  [k] native_sched_clock
     0.00%  unoptimized  [kernel.kallsyms]  [k] intel_pmu_handle_irq
     0.00%  unoptimized  [kernel.kallsyms]  [k] perf_ctx_enable


#
# (Cannot load tips.txt file, please install perf!)
#

So it appears that stringifying the integer is the expensive part here.

Copy link
Member

@iluuu1994 iluuu1994 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks correct, but I also wonder if the performance may be improved.

So it appears that stringifying the integer is the expensive part here.

Currently, ROPE_INIT and ROPE_ADD convert the operand to strings before storing them in the string list. I wonder if it may be possible to delay this coercion to ROPE_END, and then coerce strings in-place in the new string buffer. That would avoid allocating a string for the integer, just to discard it right after.

That would be a bigger change though, obviously.

@TimWolla
Copy link
Member Author

That would be a bigger change though, obviously.

Yes. Given that this change even without that optimization appears to be consistently faster than calling sprintf(), this is probably a good follow-up to investigate / make, but should not be a blocker for this PR.

Copy link
Member

@Girgias Girgias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to me :)


$a = 42;
$b = -1337;
$c = 3.14;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Memo to myself, it might make sense to change the sprintf family of functions to emit the deprecation about implicit truncation for floats.

@TimWolla TimWolla merged commit 2c5ed50 into php:master Jun 17, 2024
1 of 4 checks passed
@TimWolla TimWolla deleted the sprintf-optimize-d branch June 17, 2024 15:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants