Closed
Description
This gives the empty string on Python 3.x
from urllib.client import urlopen
html5lib.serialize(html5lib.parse(urlopen(
'http://html5lib.readthedocs.org/en/latest/')))
The cause is a bug in Python: http://bugs.python.org/issue20007 , bug given CPython’s release cycle timeline I would like to have a work-around in html5lib.
The bug is triggered here:
html5lib-python/html5lib/inputstream.py
Line 122 in e269a2f
Unfortunately, the only work-around I can think of (adding a special case if isinstance(source, http.client.HTTPResponse): isUnicode = False
) is very ugly.