Description
I have stumbled upon a web page online that appears to overwhelm tidy somehow.
Observed behavior: Tidy keeps processing the page until it exhausts all available memory. This appears to happen in a tight loop, as the process saturates one core while acquiring more and more memory. The OS has to step in and kill the process.
Expected behavior: tidy exits quickly with an error message.
Further info
I managed to reduce the page to basically a single line with some boilerplate around, which still reproduces the behavior. Admittedly, that single line is a large and very strange collection of nested html tags wrapped in a <div style="display:none">
block.
There is still potential to reduce that single line further. I did some manual attempts at finding a true minimal example, but gave up after a while: reduced.html.zip. I did remove some blocks of tags at the front and end of that line, while still reproducing the observed memory issue. Interestingly, when removing what is now at the front (<b><td></td></b>
), tidy bails out quickly, telling me that the document is too broken to fix. But when I remove all other tags and keep this little portion, that does not reproduce the issue. So there is some complex dependencies at play here.