Skip to content

str: utf8 encoder allows encoding invalid code points #6943

Closed
@thestinger

Description

@thestinger

The highest valid code point is 1114111 (0x10FFFF) and the modern UTF-8
standard guarantees that the maximum number of bytes needed to encode a
code point is 4 (instead of 6, in the legacy standard).

From https://tools.ietf.org/html/rfc3629:

Changes from RFC 2279

o Restricted the range of characters to 0000-10FFFF (the UTF-16
accessible range).

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions