Skip to content

toc: Unusual characters in heading ids not well supported #1493

Closed
@pawamoy

Description

@pawamoy

I noticed that toc encodes characters like * as \x0242\x03, 42 being the index of * in the ASCII table. This causes a discrepancy between the permalink of a heading and the link in the table of contents.

mkdir /tmp/toc
cd /tmp/toc
python -m venv .venv
. .venv/bin/activate
python -m pip install mkdocs
python -m mkdocs new .

index file:

# Welcome

Demonstrating an issue with HTML ids and `toc`.

## `*Foo*` { id="\*Foo\*" }

- Click on `*Foo*`'s permalink: `#*Foo*` in the URL.
- Click on `*Foo*` in the table of contents: `#%0242%03Foo%0242%03` in the URL.

mkdocs config:

site_name: My Docs

markdown_extensions:
- attr_list
- toc:
    permalink: true

Serve and observe the behavior described in the index page.

I'm not saying this is a bug. I'm just curious if this is expected, and whether there would be a way improve support for headings with such "unusual" ids. This would help for the work I'm doing with mkdocstrings, where we try to expand our languages support, and some languages might use uncommon characters in object identifiers. Not only toc would have to work, but also mkdocs-autorefs, which picks up ids from the table of contents when registering URLs and anchors to objects.

I believe HTML5 supports any kind of characters in ids. Some of them just cause a bit of pain, like . or #, because they then need to be escaped in CSS selectors.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugBug report.confirmedConfirmed bug report or approved feature request.extensionRelated to one or more of the included extensions.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions