Skip to content

slice contains subslice #499

Closed
Closed
@folkertdev

Description

@folkertdev

Proposal

Problem statement

Determining whether a slice is contained within another slice, analogous to "foobar".contains("foo"), but for slices.

Motivating examples or use cases

I recently had cause to write

fn contains_osstr(haystack: impl AsRef<OsStr>, needle: impl AsRef<OsStr>) -> bool {
    let needle = needle.as_ref().as_encoded_bytes();
    let haystack = haystack.as_ref().as_encoded_bytes();

    haystack.windows(needle.len()).any(|h| h == needle)
}

Partially the problem here is the limited API on OsStr (and similarly Path and CStr), but I've wanted a "contains slice" operation on just standard slices too. It is especially odd that the contains operation is defined on &str, but not when you drop down to raw byte values.

I see two problems

  • I can't neatly express the intuitive "contains" operation that I want
  • The implementation is less performant than it could be

Solution sketch

I think the nicest solution is to mirror the core::str::pattern design, so that we could have

impl<T> [T] {
    pub fn contains(&self, pat: P) -> bool
    where 
        T: PartialEq
        P: core::slice::pattern::Pattern<T>;

    pub fn find(&self, pat: P) -> Option<usize>
    where 
        T: PartialEq
        P: core::slice::pattern::Pattern<T>;
}

This appears to work out

trait Pattern<T:PartialEq>: Sized { /* ... */ }

impl<T: PartialEq> Pattern<T> for T { /* ... */ }

impl<'b, T: PartialEq> Pattern<T> for &'b [T] { /* ... */ }

// potentially arrays too, maybe even str, OsStr, CStr, and so on

And it looks backwards-compatible to me, but I'm not 100% sure that it is.

Some final notes:

  • an algorithm like KMP should be used.
  • specialization could be used for better performance (e.g. for u8 and T: Copy)
  • SIMD could be used for better performance

Alternatives

This rustc issue rust-lang/rust#54961 proposed instead

fn contains_subslice<T: PartialEq>(data: &[T], needle: &[T]) -> bool {
    data
    .windows(needle.len())
    .any(|w| w == needle)
}

fn position_subslice<T: PartialEq>(data: &[T], needle: &[T]) -> Option<usize> {
    data
    .windows(needle.len())
    .enumerate()
    .find(|&(_, w)| w == needle)
    .map(|(i, _)| i)
}

This idea works fine too if the pattern idea above has backwards compatibility issues. (though I'd suggest find_subslice for consistency).

Links and related work

What happens now?

This issue contains an API change proposal (or ACP) and is part of the libs-api team feature lifecycle. Once this issue is filed, the libs-api team will review open proposals as capability becomes available. Current response times do not have a clear estimate, but may be up to several months.

Possible responses

The libs team may respond in various different ways. First, the team will consider the problem (this doesn't require any concrete solution or alternatives to have been proposed):

  • We think this problem seems worth solving, and the standard library might be the right place to solve it.
  • We think that this probably doesn't belong in the standard library.

Second, if there's a concrete solution:

  • We think this specific solution looks roughly right, approved, you or someone else should implement this. (Further review will still happen on the subsequent implementation PR.)
  • We're not sure this is the right solution, and the alternatives or other materials don't give us enough information to be sure about that. Here are some questions we have that aren't answered, or rough ideas about alternatives we'd want to see discussed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    ACP-acceptedAPI Change Proposal is accepted (seconded with no objections)T-libs-apiapi-change-proposalA proposal to add or alter unstable APIs in the standard libraries

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions