Description
Proposal
Problem statement
std::io::Read::take
currently returns a new instance of Read
but not Seek
regardless of whether self
implements Seek
or not. This results in code that may work with self
but not with the instance returned by .take
because the former implements Seek
while the latter doesn't. Moreover there is not reason for this to be the case; Take
is just a wrapper around self
that can read at most limit
bytes. If self
implements Seek
then Take
can also naturally implement Seek
.
This proposal is to add a Seek
instance for Take
.
Motivating examples or use cases
For example, binary files parsers could benefits from this feature. Let's say that we have file format that is a list of records where each record is composed of the length of the record followed by the bytes of the record. It could be useful to read the length of the record and then use .take
to delimit the bytes that the record parser should process. This removes the need to deal with trailing bytes from the record parser resulting in simpler code.
Now let's say that the parser needs to skip certain parts of the record because e.g. obsolete or not needed. seek
is a natural fit for this task but currently Take
doesn't support it even if File
supports seek. The parser would have to read the bytes that it wants to skip and write them to a noop sink.
There is also a reusability reason: any generic code that works with bytes sequences should work with the result of .take
too. For instance, let's write the interface of the parser above in a generic way:
fn record_parser<I: Read + Seek>(i: I) -> Result<Record> { ... }
this function should work for Take
if it wraps an instance of Seek
.
Solution sketch
seek
can be expensive and therefore a requirement for the solution is that a call to Take.seek
results in a single call to seek
on the inner
object of Take
.
Take
currently keeps track only of the remaining limit
bytes that can be read from the inner
object. This info alone is not enough to implement seek
because e.g. we don't know when the Take
"started" so seek(SeekFrom::Start)
cannot be computed.
This solution proposes to keep track of the initial limit
of the Take
in a new field called len
. This can then be used to compute the position of the cursor inside the Take
as len - limit
. The reason to not track position
directly is to avoid adding one more mutating variable that changes together with limit
. len
is constant (unless set_limit
is called).
Once we know the position of the cursor inside Take
, we can compute the offset from the current position based on the argument of seek
. If the argument is SeekFrom::Start
then we seek from -position
, if the argument is SeekFrom::End
then we seek from position+limit
, if the argument is SeekFrom::Current
then we seek from position
.
API
Add a position
method that returns the position within the Take
impl Take {
fn position(&self) -> u64 {
self.len - self.limit
}
}
Add the Seek
implementation for Take
impl<T: Seek> Seek for Take<T> {
...
}
Alternatives
The solution proposed in 97227 keeps track of the position instead of the original len
. It is equivalent to the solution proposed here. The reason why I prefer the above solution is because it's simpler to maintain given that len
never changes while cursor
in 97227 must be updated together with limit
leading to code like:
self.cursor += amt as u64;
self.limit -= amt as u64;
Links and related work
- 97227
- My implementation of the solution proposed here PR #138023