Skip to content

Assertion failures in Python 2 from the etree treewalker #190

Closed
@mattheww

Description

@mattheww

Assertion failures in Python 2 from the etree treewalker.

If I create an element directly using cElementTree and try to serialise the result using html5lib, I get assertion failures in Python 2 unless I go to special lengths to make sure cElementTree sees unicode strings everywhere.

    from xml.etree import cElementTree as etree
    import html5lib

    doc = html5lib.parse(
        u"<p>test",
        treebuilder="etree",
        namespaceHTMLElements=False)

    head = doc.find("head")
    link = etree.Element("link")
    head.append(link)

    stream = html5lib.treewalkers.getTreeWalker("etree")(doc)
    serializer = html5lib.serializer.htmlserializer.HTMLSerializer()
    rendered = serializer.render(stream)

The render() call fails with:

AssertionError: <type 'str'>
html5lib/treewalkers/etree.py:61 (getNodeDetails)
failing line:
assert type(node.tag) == text_type, type(node.tag)

Using unicode string literals everywhere isn't enough to avoid trouble because cElementTree sometimes constructs attribute names from keyword arguments, eg:

    doc = html5lib.parse(
        u"<p>test",
        treebuilder="etree",
        namespaceHTMLElements=False)

    head = doc.find("head")
    link = etree.Element(u"link", rel=u"stylesheet")
    head.append(link)

    stream = html5lib.treewalkers.getTreeWalker("etree")(doc)
    serializer = html5lib.serializer.htmlserializer.HTMLSerializer()
    rendered = serializer.render(stream)

The render() call fails with:

AssertionError
html5lib/serializer/htmlserializer.py:165 (encodeStrict)
failing line:
assert(isinstance(string, text_type))

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions