-
Notifications
You must be signed in to change notification settings - Fork 7.9k
zend_compile: Add support for %d
to sprintf()
optimization
#14561
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…` in `zend_compile_func_sprintf()` This is intended to make the diff of a follow-up commit smaller.
This extends the existing `sprintf()` optimization by support for the `%d` placeholder, which effectively equivalent to an `(int)` cast followed by a `(string)` cast. For a synthetic test using: <?php $a = 'foo'; $b = 42; for ($i = 0; $i < 100_000_000; $i++) { sprintf("%s-%d", $a, $b); } This optimization yields a 1.3× performance improvement: $ hyperfine 'sapi/cli/php -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php' \ '/tmp/unoptimized -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php' Benchmark 1: sapi/cli/php -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php Time (mean ± σ): 3.296 s ± 0.094 s [User: 3.287 s, System: 0.005 s] Range (min … max): 3.213 s … 3.527 s 10 runs Benchmark 2: /tmp/unoptimized -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php Time (mean ± σ): 4.300 s ± 0.025 s [User: 4.290 s, System: 0.007 s] Range (min … max): 4.266 s … 4.334 s 10 runs Summary sapi/cli/php -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php ran 1.30 ± 0.04 times faster than /tmp/unoptimized -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php
Nice! It would be interesting to see benchmark results with more placeholders, and larger ints. I suspect the improvement may be larger for cases where sprintf() has to realloc its initial buffer. |
For <?php
$a = 'foo';
$b = PHP_INT_MAX;
for ($i = 0; $i < 10_000_000; $i++) {
sprintf("%s-%d-%s-%d-%s-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------%d-%s-%d-%s-%d", $a, $b, $a, $b, $a, $b, $a, $b, $a, $b);
} the improvement actually is smaller (also note how I needed to reduce the number of iterations by an order of magnitude):
For this branch: $ sudo perf report --stdio
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 8K of event 'cpu_core/cycles:P/'
# Event count (approx.): 7989105747
#
# Overhead Command Shared Object Symbol
# ........ ....... .................... .........................................
#
48.64% php php [.] zend_long_to_str
12.20% php php [.] ZEND_ROPE_END_SPEC_TMP_TMPVAR_HANDLER
9.68% php libc.so.6 [.] __memmove_avx_unaligned_erms
5.32% php php [.] execute_ex
4.08% php php [.] _efree
4.04% php php [.] _emalloc
2.94% php php [.] ZEND_CAST_SPEC_CV_HANDLER
2.79% php php [.] memcpy@plt
2.41% php php [.] ZEND_ROPE_ADD_SPEC_TMP_CONST_HANDLER
2.05% php php [.] zval_get_string_func
1.87% php php [.] ZEND_ROPE_ADD_SPEC_TMP_TMPVAR_HANDLER
1.61% php php [.] ZEND_ROPE_ADD_SPEC_TMP_CV_HANDLER
0.83% php php [.] ZEND_FREE_SPEC_TMPVAR_HANDLER
0.67% php php [.] ZEND_ROPE_INIT_SPEC_UNUSED_CV_HANDLER
0.33% php php [.] rc_dtor_func
0.23% php [kernel.kallsyms] [k] acpi_os_read_port
0.05% php ld-linux-x86-64.so.2 [.] do_lookup_x
0.05% php [kernel.kallsyms] [k] security_inode_permission
0.04% php [kernel.kallsyms] [k] __pte_offset_map_lock
0.04% php [kernel.kallsyms] [k] __handle_mm_fault
0.03% php libc.so.6 [.] __memset_avx2_unaligned_erms
0.03% php [kernel.kallsyms] [k] percpu_counter_add_batch
0.02% php [kernel.kallsyms] [k] advance_transaction
0.01% php php [.] zend_hash_destroy
0.01% php [kernel.kallsyms] [k] folio_mark_accessed
0.01% php libc.so.6 [.] malloc_consolidate
0.01% php [kernel.kallsyms] [k] zap_pte_range
0.01% php [kernel.kallsyms] [k] cpuacct_account_field
0.00% php [kernel.kallsyms] [k] flush_signal_handlers
0.00% php [kernel.kallsyms] [k] nmi_restore
0.00% perf-ex [kernel.kallsyms] [k] __rcu_read_unlock
0.00% php [kernel.kallsyms] [k] nmi_handle
0.00% perf-ex [kernel.kallsyms] [k] perf_sample_event_took
0.00% php [kernel.kallsyms] [k] sched_clock_noinstr
0.00% perf-ex [kernel.kallsyms] [k] native_sched_clock
0.00% perf-ex [kernel.kallsyms] [k] native_apic_mem_write
0.00% perf-ex [kernel.kallsyms] [k] perf_ctx_enable
0.00% php [kernel.kallsyms] [k] _raw_spin_unlock
0.00% php [kernel.kallsyms] [k] native_apic_mem_write
# Samples: 133 of event 'cpu_atom/cycles:P/'
# Event count (approx.): 91732603
#
# Overhead Command Shared Object Symbol
# ........ ....... ................. .........................................
#
49.81% php php [.] zend_long_to_str
9.57% php php [.] ZEND_ROPE_END_SPEC_TMP_TMPVAR_HANDLER
8.06% php libc.so.6 [.] __memmove_avx_unaligned_erms
6.82% php php [.] _emalloc
4.94% php php [.] execute_ex
4.86% php php [.] ZEND_ROPE_ADD_SPEC_TMP_CV_HANDLER
4.55% php php [.] _efree
2.83% php php [.] ZEND_CAST_SPEC_CV_HANDLER
2.82% php php [.] ZEND_ROPE_ADD_SPEC_TMP_TMPVAR_HANDLER
2.13% php php [.] zval_get_string_func
2.11% php php [.] ZEND_ROPE_ADD_SPEC_TMP_CONST_HANDLER
0.70% php php [.] ZEND_ROPE_INIT_SPEC_UNUSED_CV_HANDLER
0.70% php php [.] rc_dtor_func
0.09% php [kernel.kallsyms] [k] rseq_ip_fixup
0.00% php [kernel.kallsyms] [k] nmi_cpu_backtrace_handler
0.00% php [kernel.kallsyms] [k] native_sched_clock
0.00% php [kernel.kallsyms] [k] intel_pmu_handle_irq
0.00% php [kernel.kallsyms] [k] perf_ctx_enable For master:
So it appears that stringifying the integer is the expensive part here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks correct, but I also wonder if the performance may be improved.
So it appears that stringifying the integer is the expensive part here.
Currently, ROPE_INIT
and ROPE_ADD
convert the operand to strings before storing them in the string list. I wonder if it may be possible to delay this coercion to ROPE_END
, and then coerce strings in-place in the new string buffer. That would avoid allocating a string for the integer, just to discard it right after.
That would be a bigger change though, obviously.
Yes. Given that this change even without that optimization appears to be consistently faster than calling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense to me :)
|
||
$a = 42; | ||
$b = -1337; | ||
$c = 3.14; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Memo to myself, it might make sense to change the sprintf family of functions to emit the deprecation about implicit truncation for floats.
This is a follow-up for #14546 that I announced in #14546 (comment)
This extends the existing
sprintf()
optimization by support for the%d
placeholder, which effectively equivalent to an(int)
cast followed by a(string)
cast.For a synthetic test using:
This optimization yields a 1.3× performance improvement: