Unicode case folding, caseless matching, and iterator methods

I made https://github.com/SimonSapin/rust-casefold for Servo, the HTML spec requires “compatibility caseless matching”. Some of it might be interesting to have in libunicode/libcollections. @aturon, @alexcrichton, how much do you think is appropriate to include? I’d like your input before a prepare a PR (and have to deal with Rust bootstrapping).

`zip_all` and `iter_eq` are two generic function (independent of Unicode) that could be default methods of `Iterator`. The former is like `i.zip(j).all(f)`, but also return `false` if the two iterators have a different length. The latter (which uses the former) check that the iterators have the same content. That is, it is equivalent to `i.collect::Vec<_>() == j.collect::Vec<_>()`, but compares elements one by one and does not allocate. (It also stops at the first difference rather than consume both iterators until the end.)

Case folding is fairly straightforward. The data could be generated with `src/etc/unicode.py` and kept in `src/libunicode/tables.rs`, like existing Unicode data.

Caseless matching however is more complex: there are different variants of it. Other than the “default” variant, they require NFD and NFKD normalization. libunicode already has `nfd_chars` and `nfkd_chars` methods on `&str`, but here that would require allocating an intermediate `String`. So, in the same spirit as #19042, it might be useful to expose another API for Unicode normalization (all four variants of it, while we’re at it) from a generic `Iterator<char>` rather than just `&str` / `Chars`.

Thoughts?

Nothing urgent here, but consider this when stabilizing libunicode.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Unicode case folding, caseless matching, and iterator methods #19277

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Unicode case folding, caseless matching, and iterator methods #19277

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions