Closed
Description
THTML.Entities in UHTMLUtils.pas
handles old ASCII control and whitespace characters with codes < 32 fine, but this is not really Unicode safe.
Ideally we'd take encoding into account if outputting any ANSI text, but that would require this method to know the output encoding.
Assuming Unicode, the following may be an improvement:
- Convert all whitespace except CR & LF to numeric entities, but test with appropriate TCharacter method
- Convert all control character similarly, again using appropriate TCharacter method
It may also be worth explicitly converting some whitespace & symbols using correct named HTML entities, e.g.
& –
.