Description
From Google Code Issue 220:
Reported by steve@strassmann.com, Mar 1, 2013
What steps will reproduce the problem?
Reproducible in Jython 2.5.2 and Jython 2.7b1>> import html5lib import html5lib Traceback (most recent call last): File "<stdin>", line 1, in <module> File "lib/html5lib/__init__.py", line 14, in <module> from html5parser import HTMLParser, parse, parseFragment File "lib/html5lib/html5parser.py", line 33, in <module> import inputstream UnicodeDecodeError: 'unicodeescape' codec can't decode bytes in position 48-54: illegal Unicode character
What is the expected output? What do you see instead?
jython cannot read inputstream.py.Please provide any additional information below.
inputstream.py contains some seriously broken Unicode characters in the range 0xD800-0xDFFF, which are known as "unpaired surrogates".
This has been closed as wont-fix: http://bugs.jython.org/issue1836
It may be necessary to modify inputstream.py to not use these unicode character literals when running in Jython.
n.b. a test for Jython:
import platform JYTHON = (platform.system() == 'Java')
Apr 7 (2 days ago) geoffers
As I just commented on the Jython bug, I believe this is a bug in Jython not implemented Python as it is documented. I don't particularly want to add in hacks for a bug in Jython.
Apr 7 (2 days ago) geoffers
Furthermore, if I do add a hack for it, we go into an infinite loop in the testsuite. I'm changing this bug to a more generic support Jython bug — no timeframe or even decision whether this is likely to happen.