Skip to content

http.client.HTTPResponse is not playing nice (Python 3.x) #127

Closed
@SimonSapin

Description

@SimonSapin

This gives the empty string on Python 3.x

from urllib.client import urlopen
html5lib.serialize(html5lib.parse(urlopen(
    'http://html5lib.readthedocs.org/en/latest/')))

The cause is a bug in Python: http://bugs.python.org/issue20007 , bug given CPython’s release cycle timeline I would like to have a work-around in html5lib.

The bug is triggered here:

isUnicode = isinstance(source.read(0), text_type)

Unfortunately, the only work-around I can think of (adding a special case if isinstance(source, http.client.HTTPResponse): isUnicode = False ) is very ugly.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions