Description
Assertion failures in Python 2 from the etree treewalker.
If I create an element directly using cElementTree and try to serialise the result using html5lib, I get assertion failures in Python 2 unless I go to special lengths to make sure cElementTree sees unicode strings everywhere.
from xml.etree import cElementTree as etree
import html5lib
doc = html5lib.parse(
u"<p>test",
treebuilder="etree",
namespaceHTMLElements=False)
head = doc.find("head")
link = etree.Element("link")
head.append(link)
stream = html5lib.treewalkers.getTreeWalker("etree")(doc)
serializer = html5lib.serializer.htmlserializer.HTMLSerializer()
rendered = serializer.render(stream)
The render() call fails with:
AssertionError: <type 'str'>
html5lib/treewalkers/etree.py:61 (getNodeDetails)
failing line:
assert type(node.tag) == text_type, type(node.tag)
Using unicode string literals everywhere isn't enough to avoid trouble because cElementTree sometimes constructs attribute names from keyword arguments, eg:
doc = html5lib.parse(
u"<p>test",
treebuilder="etree",
namespaceHTMLElements=False)
head = doc.find("head")
link = etree.Element(u"link", rel=u"stylesheet")
head.append(link)
stream = html5lib.treewalkers.getTreeWalker("etree")(doc)
serializer = html5lib.serializer.htmlserializer.HTMLSerializer()
rendered = serializer.render(stream)
The render() call fails with:
AssertionError
html5lib/serializer/htmlserializer.py:165 (encodeStrict)
failing line:
assert(isinstance(string, text_type))