Description
I made https://github.com/SimonSapin/rust-casefold for Servo, the HTML spec requires “compatibility caseless matching”. Some of it might be interesting to have in libunicode/libcollections. @aturon, @alexcrichton, how much do you think is appropriate to include? I’d like your input before a prepare a PR (and have to deal with Rust bootstrapping).
zip_all
and iter_eq
are two generic function (independent of Unicode) that could be default methods of Iterator
. The former is like i.zip(j).all(f)
, but also return false
if the two iterators have a different length. The latter (which uses the former) check that the iterators have the same content. That is, it is equivalent to i.collect::Vec<_>() == j.collect::Vec<_>()
, but compares elements one by one and does not allocate. (It also stops at the first difference rather than consume both iterators until the end.)
Case folding is fairly straightforward. The data could be generated with src/etc/unicode.py
and kept in src/libunicode/tables.rs
, like existing Unicode data.
Caseless matching however is more complex: there are different variants of it. Other than the “default” variant, they require NFD and NFKD normalization. libunicode already has nfd_chars
and nfkd_chars
methods on &str
, but here that would require allocating an intermediate String
. So, in the same spirit as #19042, it might be useful to expose another API for Unicode normalization (all four variants of it, while we’re at it) from a generic Iterator<char>
rather than just &str
/ Chars
.
Thoughts?
Nothing urgent here, but consider this when stabilizing libunicode.