Skip to content

html_to_vdom breaks on whitespaces and null HTML tags #777

Closed
@Archmonger

Description

@Archmonger

Current Situation

If a HTML string has ...

  • Additional whitespace at the tail of the HTML document
  • A null (non-terminated) HTML tag such as <hr> within the document

... then html_to_vdom will silently fail without raising an exception

See my Django IDOM PR for how I caused the script tag issue to occur at this LOC.

Proposed Actions

Whitespace issue can be fixed by simple regex and/or using str.strip().

Script tag issue needs further investigation. This might open up a can of worms to test html_to_vdom with all idom.html types.

Metadata

Metadata

Assignees

Labels

priority-2-moderateShould be resolved on a reasonable timeline.

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions