Description
- Gitea version (or commit ref): <= 1.8.0
- Operating system: All.
- Database (use
[x]
): All. - Can you reproduce the bug at https://try.gitea.io:
- Yes (provide example URL) - https://try.gitea.io/emonk/Markdown-BOM-Test
- No
- Not relevant
Description
When the README.md file contains a Unicode Byte Order Mark the file the first line of the file is
formatted incorrectly. This happens for all BOMs that I have tested - UTF8, Unicode and Unicode
Big Endian. The text of the file loads correctly in all cases, the only problem is with the BOM
being treated as file content and incorrectly displayed in both the rendered view and the editor.
This is a particular problem for developers using Visual Studio and other environments which
insert a BOM by default, and can be difficult to change without add-ons to the environment.
The various Unicode Byte Order Markers are metadata and should not be treated as file content. At a minimum the markdown renderer should detect and discard BOMs in the content. The markdown editor should likewise detect and remove BOMs during load and either replace them during save or save without markers.
I don't understand Go so I won't be submitting a pull request. From 10 minutes casting about in the code it looks like ToUTF8WithErr
and ToUTF8WithFallback
might be a place to
start. It looks like those are the main methods used to massage text file content into UTF8 for
both render and edit.
Screenshots
Screenshots taken from try.gitea.io site mentioned above.