Skip to content

tidy runs out of memory on this small sample file #937

Open
@sthelen

Description

@sthelen

I have stumbled upon a web page online that appears to overwhelm tidy somehow.

Observed behavior: Tidy keeps processing the page until it exhausts all available memory. This appears to happen in a tight loop, as the process saturates one core while acquiring more and more memory. The OS has to step in and kill the process.

Expected behavior: tidy exits quickly with an error message.

Further info
I managed to reduce the page to basically a single line with some boilerplate around, which still reproduces the behavior. Admittedly, that single line is a large and very strange collection of nested html tags wrapped in a <div style="display:none"> block.

There is still potential to reduce that single line further. I did some manual attempts at finding a true minimal example, but gave up after a while: reduced.html.zip. I did remove some blocks of tags at the front and end of that line, while still reproducing the observed memory issue. Interestingly, when removing what is now at the front (<b><td></td></b>), tidy bails out quickly, telling me that the document is too broken to fix. But when I remove all other tags and keep this little portion, that does not reproduce the issue. So there is some complex dependencies at play here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions