Skip to content

Improve handling of control & whitespace characters in THTML.Entities #17

Closed
@delphidabbler

Description

@delphidabbler

THTML.Entities in UHTMLUtils.pas handles old ASCII control and whitespace characters with codes < 32 fine, but this is not really Unicode safe.

Ideally we'd take encoding into account if outputting any ANSI text, but that would require this method to know the output encoding.

Assuming Unicode, the following may be an improvement:

  1. Convert all whitespace except CR & LF to numeric entities, but test with appropriate TCharacter method
  2. Convert all control character similarly, again using appropriate TCharacter method

It may also be worth explicitly converting some whitespace & symbols using correct named HTML entities, e.g. &nbsp; & &ndash;.

Metadata

Metadata

Assignees

Labels

bugBug reportcompletedWork has been completed on this issue and changes have been committed to `develop` branch..

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions