Skip to content

Fix Byte Order Mark (BOM) handling in markdown display and editor. #6716

Closed
@Corey-M

Description

@Corey-M

Description

When the README.md file contains a Unicode Byte Order Mark the file the first line of the file is
formatted incorrectly. This happens for all BOMs that I have tested - UTF8, Unicode and Unicode
Big Endian. The text of the file loads correctly in all cases, the only problem is with the BOM
being treated as file content and incorrectly displayed in both the rendered view and the editor.

This is a particular problem for developers using Visual Studio and other environments which
insert a BOM by default, and can be difficult to change without add-ons to the environment.

The various Unicode Byte Order Markers are metadata and should not be treated as file content. At a minimum the markdown renderer should detect and discard BOMs in the content. The markdown editor should likewise detect and remove BOMs during load and either replace them during save or save without markers.

I don't understand Go so I won't be submitting a pull request. From 10 minutes casting about in the code it looks like ToUTF8WithErr and ToUTF8WithFallback might be a place to
start. It looks like those are the main methods used to massage text file content into UTF8 for
both render and edit.

Screenshots

Screenshots taken from try.gitea.io site mentioned above.

  • Bad first line formatting, treats BOM as unknown non-printable character:
    image
  • Editor showing BOM as unknown/bad character:
    image

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions