-
Notifications
You must be signed in to change notification settings - Fork 38
Conversation
Parsing past Eof can lead to infinite loops, as it violates the assumptions of the termination checker. See: #540 This PR makes `Parser.next` assert false when called on Eof. This should not happen as one should check the token before calling `Parser.next`. The one exception is during lookahead, for which we provide a `nextUnsafe` function which does not make progress.
There are 2 functions that the parser can call: - next can raise Eof - nextUnsafe does not raise The former is counted in the progress measure. It cannot be part of infinite loops, as eventually Eof would be reached and the exception thrown. This means that every use of Parser.next needs to be audited for exceptions, since infinite loops have been turned into runtime exceptions. Sometimes it's enough to use Parser.expect if we know exactly what is the current token. Sometimes the current token has just been checked, so a `@doesNotRaise` annotation is added. Sometimes there are several call-sites to the current function that have different invariants, and this is where some changes were required.
@@ -612,25 +614,25 @@ let parseStringLiteral s = | |||
if parse Start 0 0 then Buffer.contents b else s | |||
|
|||
let rec parseLident p = | |||
let recoverLident p = | |||
let recoverLidentNoEof p = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea of this function is that It's the caller's responsibility to make sure it's obvious the current token is not Eof.
| _ -> | ||
begin match recoverLident p with | ||
begin match recoverLidentNoEof p with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK to call: we just checked for Eof in the previous case of the match.
| Some () -> | ||
parseLident p | ||
| None -> | ||
("_", mkLoc startPos p.prevEndPos) | ||
end | ||
|
||
let parseIdent ~msg ~startPos p = | ||
let parseIdentNoNext ~msg ~startPos p = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do Parser.next
after calling this function.
This is because the caller has more context on what the token can be.
("", mkLoc startPos p.prevEndPos) | ||
|
||
let parseIdent ~msg ~startPos p = | ||
let res = parseIdentNoNext ~msg ~startPos p in | ||
Parser.nextUnsafe p; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we don't know, just do nextUnsafe
. That's OK as long as termination analysis is happy.
Parser.next p; | ||
ident | ||
|
||
let parseValuePath p = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what we call when we don't know what the token is.
Parser.next p; | ||
Some (true, PatField (parseRecordPatternField p)) | ||
Parser.expect DotDotDot p; | ||
Some (true, PatField (parseRecordPatternField ~label:(parseValuePath p) p)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we don't know what the current token is.
But, termination analysis is happy so we're good.
| Uident _ | Lident _ -> | ||
Some (false, PatField (parseRecordPatternField p)) | ||
Some (false, PatField (parseRecordPatternField ~label:(parseValuePathNotEof p) p)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we do know the token is not Eof
. If we did not propagate this info, which lets us use next
instead of nextUnsafe
termination analysis would fire.
let lbracket = p.startPos in | ||
Parser.next p; | ||
let stringStart = p.startPos in | ||
and parseBracketAccess p ~lbracket expr startPos = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do the next
in the caller, which has more context on what the token is.
parseBracketAccess p expr startPos | ||
Parser.leaveBreadcrumb p Grammar.ExprArrayAccess; | ||
let lbracket = p.startPos in | ||
Parser.expect Lbracket p; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What was a next
inside the callee, is now an expect
in the caller.
| _ -> () | ||
in | ||
match p.Parser.token with | ||
| Lident _ | Uident _ -> | ||
let startToken = p.token in | ||
let field = parseValuePath p in | ||
let field = parseValuePathNotEof p in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We know this is not Eof
.
This is way too noisy to even think about merging something like this. |
match p.token with | ||
| Backtick -> Parser.next p; () | ||
| _ -> skipTokens () | ||
if p.token <> Eof then ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let rec loop p = | ||
if not (Recover.shouldAbortListParse p) | ||
if not (Recover.shouldAbortListParse p) && p.token <> Eof |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like could be another infinite loop
{typ with | ||
ptyp_attributes = List.concat [typ.ptyp_attributes; attrs]; | ||
ptyp_loc = mkLoc startPos p.prevEndPos} | ||
if p.token <> Eof then ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be turned into an infinite loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does not look like it, it would need to go into parseTypExpr
/parseEs6ArrowType
loop, but that only works if there are fresh =>
or ~
to look at.
There are 2 functions that the parser can call:
The former is counted in the progress measure. It cannot be part of infinite loops, as eventually Eof would be reached and the exception thrown.
This means that every use of Parser.next needs to be audited for exceptions, since infinite loops have been turned into runtime exceptions.
Sometimes it's enough to use Parser.expect if we know exactly what is the current token.
Sometimes the current token has just been checked, so a
@doesNotRaise
annotation is added.Sometimes there are several call-sites to the current function that have different invariants, and this is where some changes were required.
Also audited
expect
. There are 2 variants:expect
: use when it's clear the token is not EofexpectUnsafe
: when it's unclear whether the token is Eof