Skip to content

Support Jython #2

Closed
Closed
@gsnedders

Description

@gsnedders

From Google Code Issue 220:

Reported by steve@strassmann.com, Mar 1, 2013

What steps will reproduce the problem?
Reproducible in Jython 2.5.2 and Jython 2.7b1

>> import html5lib
import html5lib
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "lib/html5lib/__init__.py", line 14, in <module>
    from html5parser import HTMLParser, parse, parseFragment
  File "lib/html5lib/html5parser.py", line 33, in <module>
    import inputstream
UnicodeDecodeError: 'unicodeescape' codec can't decode bytes in position 48-54: illegal Unicode character

What is the expected output? What do you see instead?
jython cannot read inputstream.py.

Please provide any additional information below.

inputstream.py contains some seriously broken Unicode characters in the range 0xD800-0xDFFF, which are known as "unpaired surrogates".

This has been closed as wont-fix: http://bugs.jython.org/issue1836

It may be necessary to modify inputstream.py to not use these unicode character literals when running in Jython.

n.b. a test for Jython:

import platform
JYTHON = (platform.system() == 'Java')

Apr 7 (2 days ago) geoffers

As I just commented on the Jython bug, I believe this is a bug in Jython not implemented Python as it is documented. I don't particularly want to add in hacks for a bug in Jython.

Apr 7 (2 days ago) geoffers

Furthermore, if I do add a hack for it, we go into an infinite loop in the testsuite. I'm changing this bug to a more generic support Jython bug — no timeframe or even decision whether this is likely to happen.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions