You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This fixes a bug where one could ask the PikeVM to perform an anchored
search, but in some cases it could return a match where the start of the
match is greater than the start of the search. For example, an anchored
search of the pattern '.c' on the haystack 'abc' starting at '0' would
report a match at '1..3'. No other engine (other than the meta engine,
which we'll address in a subsequent commit) had this bug.
The issue in the pikevm was our simulation of the '(?s-u:.)*?' prefix
for implementing unanchored searches. Namely, instead of using the NFA
itself to implement the unanchored search (it has both unanchored and
anchored start states), the PikeVM simulates it in code for performance
reasons. This simulation was actually incorrect for the anchored case,
because we were re-computing the epsilon closure for every step in the
search. Effectively, we were simulating an unanchored search
unconditionally.
Now the reason why this bug wasn't caught is because the PikeVM only
gets things half wrong. Namely, the regex '[b-z]c' does not match 'abc'
when starting the search at offset '0' and that's correct. The reason is
that the '[b-z]' doesn't match 'a', where as '.' in the aforementioned
regex does. Since the PikeVM doesn't match there, it's current list of
states becomes empty, and *this* case is anchor-aware and knows not to
continue the search in this case. In other words, the PikeVM only
half-implemented the unanchored search simulation. It gets it right in
some cases, but not all.
We fix the bug by requiring that we only do the epsilon closure when the
search is unanchored, or if it's anchored, that the current position is
at the start of the search. We add a regression test from #1036 as well.
Partially resolves#1036
0 commit comments