Skip to content

Commit fbd2537

Browse files
SkiFire13BurntSushi
authored andcommitted
safety: guard in Input::new against incorrect AsRef implementations
Before this commit, Input::new calls haystack.as_ref() twice, once to get the actual haystack slice and the second time to get its length. It makes the assumption that the second call will return the same slice, but malicious implementations of AsRef can return different slices and thus different lengths. This is important because there's unsafe code relying on the Input's span being inbounds with respect to the haystack, but if the second call to .as_ref() returns a bigger slice this won't be true. For example, this snippet causes Miri to report UB on an unchecked slice access in find_fwd_imp (though it will also panic sometime later when run normally, but at that point the UB already happened): use regex_automata::{Input, meta::{Builder, Config}}; use std::cell::Cell; struct Bad(Cell<bool>); impl AsRef<[u8]> for Bad { fn as_ref(&self) -> &[u8] { if self.0.replace(false) { &[] } else { &[0; 1000] } } } let bad = Bad(Cell::new(true)); let input = Input::new(&bad); let regex = Builder::new() // Not setting this causes some checked access to occur before // the unchecked ones, avoiding the UB .configure(Config::new().auto_prefilter(false)) .build("a+") .unwrap(); regex.find(input); This commit fixes the problem by just calling .as_ref() once and use the length of the returned slice as the span's end value. A regression test has also been added. Closes #1154
1 parent 027eebd commit fbd2537

File tree

2 files changed

+29
-3
lines changed

2 files changed

+29
-3
lines changed

CHANGELOG.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,14 @@
11
1.10.3 (TBD)
22
============
33
This is a new patch release that fixes the feature configuration of optional
4-
dependencies.
4+
dependencies, and fixes an unsound use of bounds check elision.
55

66
Bug fixes:
77

88
* [BUG #1147](https://github.com/rust-lang/regex/issues/1147):
99
Set `default-features=false` for the `memchr` and `aho-corasick` dependencies.
10+
* [BUG #1154](https://github.com/rust-lang/regex/pull/1154):
11+
Fix unsound bounds check elision.
1012

1113

1214
1.10.2 (2023-10-16)

regex-automata/src/util/search.rs

Lines changed: 26 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -110,9 +110,14 @@ impl<'h> Input<'h> {
110110
/// Create a new search configuration for the given haystack.
111111
#[inline]
112112
pub fn new<H: ?Sized + AsRef<[u8]>>(haystack: &'h H) -> Input<'h> {
113+
// Perform only one call to `haystack.as_ref()` to protect from incorrect
114+
// implementations that return different values from multiple calls.
115+
// This is important because there's code that relies on `span` not being
116+
// out of bounds with respect to the stored `haystack`.
117+
let haystack = haystack.as_ref();
113118
Input {
114-
haystack: haystack.as_ref(),
115-
span: Span { start: 0, end: haystack.as_ref().len() },
119+
haystack,
120+
span: Span { start: 0, end: haystack.len() },
116121
anchored: Anchored::No,
117122
earliest: false,
118123
}
@@ -1966,4 +1971,23 @@ mod tests {
19661971
let expected_size = 3 * core::mem::size_of::<usize>();
19671972
assert_eq!(expected_size, core::mem::size_of::<MatchErrorKind>());
19681973
}
1974+
1975+
#[test]
1976+
fn incorrect_asref_guard() {
1977+
struct Bad(std::cell::Cell<bool>);
1978+
1979+
impl AsRef<[u8]> for Bad {
1980+
fn as_ref(&self) -> &[u8] {
1981+
if self.0.replace(false) {
1982+
&[]
1983+
} else {
1984+
&[0; 1000]
1985+
}
1986+
}
1987+
}
1988+
1989+
let bad = Bad(std::cell::Cell::new(true));
1990+
let input = Input::new(&bad);
1991+
assert!(input.end() <= input.haystack().len());
1992+
}
19691993
}

0 commit comments

Comments
 (0)