Description
The current utf-8 implementation contains some assertions for the validity of utf-8 bytes. Specifically, passing a sequence such as "\x80\xae" to a string function will throw an Assertion is_utf8(v) failed
.
However, overlong encodings are accepted without any such error. So a sequence as "\xC0\xAE" (an overlong encoding for \x2E, a dot) will be accepted, and appear in the final rust-string.
This raises some security concerns as described in RFC3629 Section 10:
https://tools.ietf.org/html/rfc3629#section-10
Short example: when a program allows a user to access files, but wants to restrict access to "../", it must not be possible to circumvent this check by using an overlong encoding of a dot, and the author of the program shouldn't have to rely on the OS to perform any such check either.
fn main() {
// overlong dot, should be invalid but is accepted
let s1 = str::from_bytes([0xc0 as u8, 0xae as u8]);
io::println(fmt!("len: %u, chars: %u, value: %s", s1.len(), str::char_len(s1), s1));
// regular invalid utf, triggering an assertion fail
let s2 = str::from_bytes([0x80 as u8, 0xae as u8]);
io::println(fmt!("len: %u, chars: %u, value: %s", s2.len(), str::char_len(s2), s2));
}