Skip to content

Commit 1caf028

Browse files
authored
Optimize raw HTML post-processor (#1510)
Don't precompute placeholder replacements in raw HTML post-processor. Fixes #1507. Previously, the raw HTML post-processor would precompute all possible replacements for placeholders in a string, based on the HTML stash. It would then apply a regular expression substitution using these replacements. Finally, if the text changed, it would recurse, and do all that again. This was inefficient because placeholders were re-computed each time it recursed, and because only a few replacements would be used anyway. This change moves the recursion into the regular expression substitution, so that: 1. the regular expression does minimal work on the text (contrary to re-scanning text already scanned in previous frames); 2. but more importantly, replacements aren't computed ahead of time anymore (and even less *several times*), and only fetched from the HTML stash as placeholders are found in the text. The substitution function relies on the regular expression groups ordering: we make sure to match `<p>PLACEHOLDER</p>` first, before `PLACEHOLDER`. The presence of a wrapping `p` tag indicates whether to wrap again the substitution result, or not (also depending on whether the substituted HTML is a block-level tag).
1 parent f6cfc5c commit 1caf028

File tree

2 files changed

+15
-26
lines changed

2 files changed

+15
-26
lines changed

docs/changelog.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
1515
* DRY fix in `abbr` extension by introducing method `create_element` (#1483).
1616
* Clean up test directory some removing some redundant tests and port
1717
non-redundant cases to the newer test framework.
18+
* Improved performance of the raw HTML post-processor (#1510).
1819

1920
### Fixed
2021

markdown/postprocessors.py

Lines changed: 14 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,6 @@
2828

2929
from __future__ import annotations
3030

31-
from collections import OrderedDict
3231
from typing import TYPE_CHECKING, Any
3332
from . import util
3433
import re
@@ -73,37 +72,26 @@ class RawHtmlPostprocessor(Postprocessor):
7372

7473
def run(self, text: str) -> str:
7574
""" Iterate over html stash and restore html. """
76-
replacements = OrderedDict()
77-
for i in range(self.md.htmlStash.html_counter):
78-
html = self.stash_to_string(self.md.htmlStash.rawHtmlBlocks[i])
79-
if self.isblocklevel(html):
80-
replacements["<p>{}</p>".format(
81-
self.md.htmlStash.get_placeholder(i))] = html
82-
replacements[self.md.htmlStash.get_placeholder(i)] = html
83-
8475
def substitute_match(m: re.Match[str]) -> str:
85-
key = m.group(0)
86-
87-
if key not in replacements:
88-
if key[3:-4] in replacements:
89-
return f'<p>{ replacements[key[3:-4]] }</p>'
90-
else:
91-
return key
92-
93-
return replacements[key]
94-
95-
if replacements:
76+
if key := m.group(1):
77+
wrapped = True
78+
else:
79+
key = m.group(2)
80+
wrapped = False
81+
if (key := int(key)) >= self.md.htmlStash.html_counter:
82+
return m.group(0)
83+
html = self.stash_to_string(self.md.htmlStash.rawHtmlBlocks[key])
84+
if not wrapped or self.isblocklevel(html):
85+
return pattern.sub(substitute_match, html)
86+
return pattern.sub(substitute_match, f"<p>{html}</p>")
87+
88+
if self.md.htmlStash.html_counter:
9689
base_placeholder = util.HTML_PLACEHOLDER % r'([0-9]+)'
9790
pattern = re.compile(f'<p>{ base_placeholder }</p>|{ base_placeholder }')
98-
processed_text = pattern.sub(substitute_match, text)
91+
return pattern.sub(substitute_match, text)
9992
else:
10093
return text
10194

102-
if processed_text == text:
103-
return processed_text
104-
else:
105-
return self.run(processed_text)
106-
10795
def isblocklevel(self, html: str) -> bool:
10896
""" Check is block of HTML is block-level. """
10997
m = self.BLOCK_LEVEL_REGEX.match(html)

0 commit comments

Comments
 (0)