Description
Proposal
Add slice::split_once
and slice::rsplit_once
methods, analogous to the existing str::split_once
and str::rsplit_once
methods.
Problem statement
When doing ad-hoc parsing of a format that isn't guaranteed to be valid unicode, it's often useful to split byte slices on the first occurrence a specific delimiter. There isn't currently an API that expresses this directly for byte slices, although there is for strings.
Motivation, use-cases
There are some examples in aprs-parser-rs crate, which is being refactored from treating its input data as strings to using byte-slices. APRS packets consist of a header and a body, separated by a b':'
byte. This is currently being parsed like this:
let header_delimiter = s
.iter()
.position(|x| *x == b':')
.ok_or_else(|| AprsError::InvalidPacket(s.to_owned()))?;
let (header, rest) = s.split_at(header_delimiter);
let body = &rest[1..];
Solution sketches
There are currently two options to do this in stable rust.
Using position
to find the delimiter's index, then splitting it on that index and explicitly rejecting the first byte of the second slice (which contains the delimiter):
let v = b"first:second";
let split_index = v.iter().position(|&x| x == b':')?;
let (first, second) = v.split_at(split_index);
let second = &second[1..];
Using splitn
:
let v = b"first:second";
let split = v.splitn(2, |&x| x == b':');
let first = split.next()?;
let second = split.next()?;
These options are okay, but not great. They're both relatively verbose and don't express the actual intention very directly. They also have the issue that mistakes aren't necessarily going to show up in the type system.
With strings, there is currently a split_once
method, that handles this exact use case:
let v = "first:second";
let (first, second) = v.split_once(':')?;
A similar method could be added for slices:
pub fn split_once<F>(&self, pred: F) -> Option<(&[T], &[T])>
where F: FnMut(&T) -> bool
{
let index = self.iter().position(pred)?;
Some((&self[..index], &self[index+1..]))
}
Along with an rsplit_once
equivalent. I also think it might make sense to add split_once_mut
and rsplit_once_mut
, however those don't currently exist for str
.
Links and related work
What happens now?
This issue is part of the libs-api team API change proposal process. Once this issue is filed the libs-api team will review open proposals in its weekly meeting. You should receive feedback within a week or two.