Description
Initial checklist
- I read the support docs
- I read the contributing guide
- I agree to follow the code of conduct
- I searched issues and couldn’t find anything (or linked relevant results below)
Affected packages and versions
hast-util-from-html
Link to runnable example
No response
Steps to reproduce
Set up a benchmark using src/parser.bench.ts.
Load html test: https://gist.github.com/zolero/6ecc0d238595b37fabb42cc86588998b
Run the benchmarks comparing hast-util-from-html and hast-util-from-parse5.
Observe the performance difference.
import path from "node:path";
import { fromHtml } from "hast-util-from-html";
import { fromParse5 } from "hast-util-from-parse5";
import { parse } from "parse5";
import { bench, describe } from "vitest";
describe("performance testing", () => {
const html = fs.readFileSync(path.join(process.cwd(), `assets/html/real-world/bol-com.html`)).toString();
bench("performance of HTML to HAST using hast-util-from-html", () => {
fromHtml(html);
});
bench("performance of HTML to HAST using hast-util-from-parse5 + sourceCodeLocationInfo", () => {
fromParse5(parse(html, { sourceCodeLocationInfo: true }));
});
bench("performance of HTML to HAST using hast-util-from-parse5 + w/o sourceCodeLocationInfo", () => {
fromParse5(parse(html, { sourceCodeLocationInfo: false }));
});
});
Expected behavior
I expected fromHtml to have similar or better performance compared to hast-util-from-parse5, considering both are intended to perform similar tasks.
Actual behavior
fromHtml is significantly slower than hast-util-from-parse5, which raises concerns about its efficiency in performance-critical applications.
Benchmark Results
Here are the results of my performance tests:
✓ src/parser.bench.ts (3) 5453ms
✓ performance testing (3) 5451ms
name hz min max mean p75 p99 p995 p999 rme samples
· performance of HTML to HAST using hast-util-from-html 4.5929 208.73 235.69 217.73 220.69 235.69 235.69 235.69 ±2.64% 10 slowest
· performance of HTML to HAST using hast-util-from-parse5 + sourceCodeLocationInfo 23.1773 38.8248 49.1888 43.1457 44.3062 49.1888 49.1888 49.1888 ±4.92% 12
· performance of HTML to HAST using hast-util-from-parse5 + w/o sourceCodeLocationInfo 32.4982 25.4096 36.3745 30.7710 31.6351 36.3745 36.3745 36.3745 ±4.80% 17 fastest
Additional benchmark data:
✓ src/parser.bench.ts (3) 5510ms
✓ performance testing (3) 5508ms
name hz min max mean p75 p99 p995 p999 rme samples
· performance of HTML to HAST using hast-util-from-html 4.4994 211.85 247.17 222.25 221.88 247.17 247.17 247.17 ±3.59% 10 slowest
· performance of HTML to HAST using hast-util-from-parse5 + sourceCodeLocationInfo 22.4136 41.9222 48.7456 44.6158 45.3911 48.7456 48.7456 48.7456 ±3.13% 12
· performance of HTML to HAST using hast-util-from-parse5 + w/o sourceCodeLocationInfo 32.9171 26.1313 35.0066 30.3794 31.1243 35.0066 35.0066 35.0066 ±3.47% 17 fastest
BENCH Summary
hast-util-from-parse5 + w/o sourceCodeLocationInfo is 1.47x faster than hast-util-from-parse5 + sourceCodeLocationInfo.
hast-util-from-parse5 + w/o sourceCodeLocationInfo is 7.32x faster than hast-util-from-html.
Affected runtime and version
v22.6.0
Affected package manager and version
1.22.22
Affected OS and version
Win 11
Build and bundle tools
Rollup