Skip to content

Add has_data_left() to BufRead #85815

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 18, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 31 additions & 0 deletions library/std/src/io/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1953,6 +1953,37 @@ pub trait BufRead: Read {
#[stable(feature = "rust1", since = "1.0.0")]
fn consume(&mut self, amt: usize);

/// Check if the underlying `Read` has any data left to be read.
///
/// This function may fill the buffer to check for data,
/// so this functions returns `Result<bool>`, not `bool`.
///
/// Default implementation calls `fill_buf` and checks that
/// returned slice is empty (which means that there is no data left,
/// since EOF is reached).
///
/// Examples
///
/// ```
/// #![feature(buf_read_has_data_left)]
/// use std::io;
/// use std::io::prelude::*;
///
/// let stdin = io::stdin();
/// let mut stdin = stdin.lock();
///
/// while stdin.has_data_left().unwrap() {
/// let mut line = String::new();
/// stdin.read_line(&mut line).unwrap();
/// // work with line
/// println!("{:?}", line);
/// }
/// ```
#[unstable(feature = "buf_read_has_data_left", reason = "recently added", issue = "86423")]
fn has_data_left(&mut self) -> Result<bool> {
Copy link
Member

@the8472 the8472 May 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

has_remaining_data seems less... informal?

Is there a reason to check the negative case, i.e. it being not empty as opposed to the is_empty that can be found on many other things?

Also, what's the benefit of this method if it requires error handling? If you have to do error handling anyway you might as well do, that way you only need to do error handling once instead of doing it in the loop condition and when reading the line.

let mut line = String::new();
loop {
   match std.read_line(&mut line) {
      Ok(0) => break, // eof
      Ok(_) => {
          // do something with line
      },
      Err(e) => panic!("read error {}", e);
   }
   line.clear()
}

Copy link
Contributor Author

@YuhanLiin YuhanLiin May 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just couldn't think of a good name for the positive case, but I'm open for suggestions.

As for the benefits, the main use case I have for this is for things like deserializing object directly from a file in a loop. The deserialization function will return an error if it encounters an EOF, and depending on the Serde library it's not always easy to tell whether the error is caused by EOF. It's easier to check for EOF separately in the loop condition.

while file.has_data_left()? {
    let obj = rmp_serde::decode::from_read(&mut file)?;
    // rest of the loop
}

As shown above, the additional error handling isn't really a big deal when using question mark syntax (or just unwrapping).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that really solve the stated problem though? serde could still encounter an EOF in the middle of parsing an input.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If serde encounters an EOF in the middle of deserializing an object then it's a syntax error and the input is invalid, at least for that object. If serde finishes parsing the last object in a valid input then all that's left in the reader would be EOF. In that case having EOF checking would prevent serde from attempting to deserialize another object and throwing a syntax error. Having EOF checking between deserialize calls allows the loop to end gracefully when encountering EOF in a valid input.

self.fill_buf().map(|b| !b.is_empty())
}

/// Read all bytes into `buf` until the delimiter `byte` or EOF is reached.
///
/// This function will read bytes from the underlying stream until the
Expand Down
10 changes: 10 additions & 0 deletions library/std/src/io/tests.rs
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,16 @@ fn lines() {
assert!(s.next().is_none());
}

#[test]
fn buf_read_has_data_left() {
let mut buf = Cursor::new(&b"abcd"[..]);
assert!(buf.has_data_left().unwrap());
buf.read_exact(&mut [0; 2]).unwrap();
assert!(buf.has_data_left().unwrap());
buf.read_exact(&mut [0; 2]).unwrap();
assert!(!buf.has_data_left().unwrap());
}

#[test]
fn read_to_end() {
let mut c = Cursor::new(&b""[..]);
Expand Down