Open
Description
I'm can't believe this hasn't been covered before, but I've as yet been
unable to find a solution to the following:
puts HTML5::HTMLParser.new.parse('Test dátá')
provides:
<html><head/><body>Test dรกtรก</body></html>
As can be seen, the text in the body has the wrong characters where á
should be, so I suspected a normal UTF8 conversion bug.
However, just to really mess with my mind, I thought the following would be
a more complete test to post here:
puts HTML5::HTMLParser.new.parse('Sámple Téxt Wíth Acceñts')
produces:
<html><head/><body>Sámple Téxt Wíth Acceñts</body></html>
which is correct!! My next step was to try removing each accent one by one,
until only the first á is present. Each attempt worked except the last,
which produced:
<html><head/><body>Sรกmple Text With Accents</body></html>
Clearly, there is something very strange here, and its causing major pain.
Does anyone have any suggests as to what's going on, and more importantly,
how to fix it?
Versions:
gem -v 1.0.1
html5 (0.10.0)
ruby 1.8.6
Ubuntu 7.10 systems
Many thanks, Sam
Original issue reported on code.google.com by sam.l...@gmail.com
on 15 Feb 2008 at 11:44