Skip to content

Performance Issue: fromHtml is 10x Slower than hast-util-from-parse5 #6

Closed as not planned
@zolero

Description

@zolero

Initial checklist

Affected packages and versions

hast-util-from-html

Link to runnable example

No response

Steps to reproduce

Set up a benchmark using src/parser.bench.ts.

Load html test: https://gist.github.com/zolero/6ecc0d238595b37fabb42cc86588998b

Run the benchmarks comparing hast-util-from-html and hast-util-from-parse5.
Observe the performance difference.

import path from "node:path";

import { fromHtml } from "hast-util-from-html";
import { fromParse5 } from "hast-util-from-parse5";
import { parse } from "parse5";
import { bench, describe } from "vitest";

describe("performance testing", () => {
    const html = fs.readFileSync(path.join(process.cwd(), `assets/html/real-world/bol-com.html`)).toString();
    bench("performance of HTML to HAST using hast-util-from-html", () => {
        fromHtml(html);
    });
    bench("performance of HTML to HAST using hast-util-from-parse5 + sourceCodeLocationInfo", () => {
        fromParse5(parse(html, { sourceCodeLocationInfo: true }));
    });
    bench("performance of HTML to HAST using hast-util-from-parse5 + w/o sourceCodeLocationInfo", () => {
        fromParse5(parse(html, { sourceCodeLocationInfo: false }));
    });
});

Expected behavior

I expected fromHtml to have similar or better performance compared to hast-util-from-parse5, considering both are intended to perform similar tasks.

Actual behavior

fromHtml is significantly slower than hast-util-from-parse5, which raises concerns about its efficiency in performance-critical applications.

Benchmark Results
Here are the results of my performance tests:

 ✓ src/parser.bench.ts (3) 5453ms
   ✓ performance testing (3) 5451ms
     name                                                                                       hz      min      max     mean      p75      p99     p995     p999     rme  samples   
   · performance of HTML to HAST using hast-util-from-html                                  4.5929   208.73   235.69   217.73   220.69   235.69   235.69   235.69  ±2.64%       10   slowest
   · performance of HTML to HAST using hast-util-from-parse5 + sourceCodeLocationInfo      23.1773  38.8248  49.1888  43.1457  44.3062  49.1888  49.1888  49.1888  ±4.92%       12   
   · performance of HTML to HAST using hast-util-from-parse5 + w/o sourceCodeLocationInfo  32.4982  25.4096  36.3745  30.7710  31.6351  36.3745  36.3745  36.3745  ±4.80%       17   fastest

Additional benchmark data:

 ✓ src/parser.bench.ts (3) 5510ms
   ✓ performance testing (3) 5508ms
     name                                                                                       hz      min      max     mean      p75      p99     p995     p999     rme  samples   
   · performance of HTML to HAST using hast-util-from-html                                  4.4994   211.85   247.17   222.25   221.88   247.17   247.17   247.17  ±3.59%       10   slowest
   · performance of HTML to HAST using hast-util-from-parse5 + sourceCodeLocationInfo      22.4136  41.9222  48.7456  44.6158  45.3911  48.7456  48.7456  48.7456  ±3.13%       12   
   · performance of HTML to HAST using hast-util-from-parse5 + w/o sourceCodeLocationInfo  32.9171  26.1313  35.0066  30.3794  31.1243  35.0066  35.0066  35.0066  ±3.47%       17   fastest

BENCH Summary
hast-util-from-parse5 + w/o sourceCodeLocationInfo is 1.47x faster than hast-util-from-parse5 + sourceCodeLocationInfo.
hast-util-from-parse5 + w/o sourceCodeLocationInfo is 7.32x faster than hast-util-from-html.

Affected runtime and version

v22.6.0

Affected package manager and version

1.22.22

Affected OS and version

Win 11

Build and bundle tools

Rollup

Metadata

Metadata

Assignees

No one assigned

    Labels

    👎 phase/noPost cannot or will not be acted on🤷 no/invalidThis cannot be acted upon

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions