Skip to content

Optimize in-memory XMLWriter #16120

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

Conversation

nielsdos
Copy link
Member

We're currently using a libxml buffer, which requires copying the buffer to zend_strings every time we want to output the string. Furthermore, its use of the system allocator instead of ZendMM makes it not count towards the memory_limit and hinders performance.

This patch adds a custom writer such that the strings are written to a smart_str instance, using ZendMM for improved performance, and giving the ability to not copy the string in the common case where flush has empty set to true.

We're currently using a libxml buffer, which requires copying the buffer
to zend_strings every time we want to output the string. Furthermore,
its use of the system allocator instead of ZendMM makes it not count
towards the memory_limit and hinders performance.

This patch adds a custom writer such that the strings are written to a
smart_str instance, using ZendMM for improved performance, and giving
the ability to not copy the string in the common case where flush has
empty set to true.
Copy link
Member

@Girgias Girgias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks sensible

@staabm
Copy link
Contributor

staabm commented Sep 30, 2024

This patch adds a custom writer such that the strings are written to a smart_str instance, using ZendMM for improved performance

do we have an idea how much faster this is?

@nielsdos
Copy link
Member Author

This patch adds a custom writer such that the strings are written to a smart_str instance, using ZendMM for improved performance

do we have an idea how much faster this is?

It highly depends on the workload.
On a simple test like this:

<?php

$writer = XMLWriter::toMemory();
var_dump($writer);

for ($i = 0; $i < 10000; $i++) {
    xmlwriter_start_element($writer, 'foo');
    xmlwriter_write_cdata($writer, 'some cdata');
    xmlwriter_end_element($writer);
}

for ($i = 0; $i < 10000; $i++)
    $writer->flush(false);

I get:

Benchmark 1: ./sapi/cli/php x.php
  Time (mean ± σ):     148.7 ms ±   3.2 ms    [User: 144.6 ms, System: 3.8 ms]
  Range (min … max):   144.3 ms … 156.5 ms    19 runs
 
Benchmark 2: ./sapi/cli/php_old x.php
  Time (mean ± σ):     212.8 ms ±   4.5 ms    [User: 207.7 ms, System: 4.6 ms]
  Range (min … max):   203.8 ms … 220.1 ms    14 runs
 
Summary
  ./sapi/cli/php x.php ran
    1.43 ± 0.04 times faster than ./sapi/cli/php_old x.php

Further speed improvements likely possible by switching to fast ZPP.

@nielsdos nielsdos closed this in f5e81fe Sep 30, 2024
jorgsowa pushed a commit to jorgsowa/php-src that referenced this pull request Oct 1, 2024
We're currently using a libxml buffer, which requires copying the buffer
to zend_strings every time we want to output the string. Furthermore,
its use of the system allocator instead of ZendMM makes it not count
towards the memory_limit and hinders performance.

This patch adds a custom writer such that the strings are written to a
smart_str instance, using ZendMM for improved performance, and giving
the ability to not copy the string in the common case where flush has
empty set to true.

Closes phpGH-16120.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants