Skip to content
This repository was archived by the owner on Jun 15, 2023. It is now read-only.

Eof audit (DO NOT MERGE) #543

Closed
wants to merge 7 commits into from
Closed

Eof audit (DO NOT MERGE) #543

wants to merge 7 commits into from

Conversation

cristianoc
Copy link
Contributor

@cristianoc cristianoc commented Jun 14, 2022

There are 2 functions that the parser can call:

  • next can raise Eof
  • nextUnsafe does not raise

The former is counted in the progress measure. It cannot be part of infinite loops, as eventually Eof would be reached and the exception thrown.

This means that every use of Parser.next needs to be audited for exceptions, since infinite loops have been turned into runtime exceptions.
Sometimes it's enough to use Parser.expect if we know exactly what is the current token.
Sometimes the current token has just been checked, so a @doesNotRaise annotation is added.
Sometimes there are several call-sites to the current function that have different invariants, and this is where some changes were required.

Also audited expect. There are 2 variants:

  • expect: use when it's clear the token is not Eof
  • expectUnsafe: when it's unclear whether the token is Eof

Parsing past Eof can lead to infinite loops, as it violates the assumptions of the termination checker.
See: #540

This PR makes `Parser.next` assert false when called on Eof. This should not happen as one should check the token before calling `Parser.next`.

The one exception is during lookahead, for which we provide a `nextUnsafe` function which does not make progress.
There are 2 functions that the parser can call:
- next can raise Eof
- nextUnsafe does not raise

The former is counted in the progress measure. It cannot be part of infinite loops, as eventually Eof would be reached and the exception thrown.

This means that every use of Parser.next needs to be audited for exceptions, since infinite loops have been turned into runtime exceptions.
Sometimes it's enough to use Parser.expect if we know exactly what is the current token.
Sometimes the current token has just been checked, so a `@doesNotRaise` annotation is added.
Sometimes there are several call-sites to the current function that have different invariants, and this is where some changes were required.
@cristianoc cristianoc mentioned this pull request Jun 14, 2022
@cristianoc cristianoc changed the base branch from master to eof June 14, 2022 06:57
@@ -612,25 +614,25 @@ let parseStringLiteral s =
if parse Start 0 0 then Buffer.contents b else s

let rec parseLident p =
let recoverLident p =
let recoverLidentNoEof p =
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea of this function is that It's the caller's responsibility to make sure it's obvious the current token is not Eof.

| _ ->
begin match recoverLident p with
begin match recoverLidentNoEof p with
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK to call: we just checked for Eof in the previous case of the match.

| Some () ->
parseLident p
| None ->
("_", mkLoc startPos p.prevEndPos)
end

let parseIdent ~msg ~startPos p =
let parseIdentNoNext ~msg ~startPos p =
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do Parser.next after calling this function.
This is because the caller has more context on what the token can be.

("", mkLoc startPos p.prevEndPos)

let parseIdent ~msg ~startPos p =
let res = parseIdentNoNext ~msg ~startPos p in
Parser.nextUnsafe p;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we don't know, just do nextUnsafe. That's OK as long as termination analysis is happy.

Parser.next p;
ident

let parseValuePath p =
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what we call when we don't know what the token is.

Parser.next p;
Some (true, PatField (parseRecordPatternField p))
Parser.expect DotDotDot p;
Some (true, PatField (parseRecordPatternField ~label:(parseValuePath p) p))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we don't know what the current token is.
But, termination analysis is happy so we're good.

| Uident _ | Lident _ ->
Some (false, PatField (parseRecordPatternField p))
Some (false, PatField (parseRecordPatternField ~label:(parseValuePathNotEof p) p))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we do know the token is not Eof. If we did not propagate this info, which lets us use next instead of nextUnsafe termination analysis would fire.

let lbracket = p.startPos in
Parser.next p;
let stringStart = p.startPos in
and parseBracketAccess p ~lbracket expr startPos =
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do the next in the caller, which has more context on what the token is.

parseBracketAccess p expr startPos
Parser.leaveBreadcrumb p Grammar.ExprArrayAccess;
let lbracket = p.startPos in
Parser.expect Lbracket p;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was a next inside the callee, is now an expect in the caller.

| _ -> ()
in
match p.Parser.token with
| Lident _ | Uident _ ->
let startToken = p.token in
let field = parseValuePath p in
let field = parseValuePathNotEof p in
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We know this is not Eof.

@cristianoc cristianoc requested a review from IwanKaramazow June 14, 2022 07:11
@cristianoc
Copy link
Contributor Author

This is way too noisy to even think about merging something like this.
But it serves as an indication of the gap between what's out there and what one would need in order to get stronger static guarantees.

match p.token with
| Backtick -> Parser.next p; ()
| _ -> skipTokens ()
if p.token <> Eof then (
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this is an infinite loop:

let foo = x =>
  switch x {
  | `${

See #542

let rec loop p =
if not (Recover.shouldAbortListParse p)
if not (Recover.shouldAbortListParse p) && p.token <> Eof
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like could be another infinite loop

{typ with
ptyp_attributes = List.concat [typ.ptyp_attributes; attrs];
ptyp_loc = mkLoc startPos p.prevEndPos}
if p.token <> Eof then (
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be turned into an infinite loop?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does not look like it, it would need to go into parseTypExpr/parseEs6ArrowType loop, but that only works if there are fresh => or ~ to look at.

Base automatically changed from eof to master June 16, 2022 00:04
@cristianoc cristianoc marked this pull request as draft June 16, 2022 00:04
@cristianoc cristianoc changed the title Eof audit Eof audit (DO NOT MERGE) Jun 16, 2022
@cristianoc cristianoc closed this Jul 21, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant