Eof audit (DO NOT MERGE) #543

cristianoc · 2022-06-14T06:55:21Z

There are 2 functions that the parser can call:

next can raise Eof
nextUnsafe does not raise

The former is counted in the progress measure. It cannot be part of infinite loops, as eventually Eof would be reached and the exception thrown.

This means that every use of Parser.next needs to be audited for exceptions, since infinite loops have been turned into runtime exceptions.
Sometimes it's enough to use Parser.expect if we know exactly what is the current token.
Sometimes the current token has just been checked, so a @doesNotRaise annotation is added.
Sometimes there are several call-sites to the current function that have different invariants, and this is where some changes were required.

Also audited expect. There are 2 variants:

expect: use when it's clear the token is not Eof
expectUnsafe: when it's unclear whether the token is Eof

Parsing past Eof can lead to infinite loops, as it violates the assumptions of the termination checker. See: #540 This PR makes `Parser.next` assert false when called on Eof. This should not happen as one should check the token before calling `Parser.next`. The one exception is during lookahead, for which we provide a `nextUnsafe` function which does not make progress.

There are 2 functions that the parser can call: - next can raise Eof - nextUnsafe does not raise The former is counted in the progress measure. It cannot be part of infinite loops, as eventually Eof would be reached and the exception thrown. This means that every use of Parser.next needs to be audited for exceptions, since infinite loops have been turned into runtime exceptions. Sometimes it's enough to use Parser.expect if we know exactly what is the current token. Sometimes the current token has just been checked, so a `@doesNotRaise` annotation is added. Sometimes there are several call-sites to the current function that have different invariants, and this is where some changes were required.

cristianoc · 2022-06-14T06:59:43Z

src/res_core.ml

@@ -612,25 +614,25 @@ let parseStringLiteral s =
    if parse Start 0 0 then Buffer.contents b else s

 let rec parseLident p =
-  let recoverLident p =
+  let recoverLidentNoEof p =


The idea of this function is that It's the caller's responsibility to make sure it's obvious the current token is not Eof.

cristianoc · 2022-06-14T07:02:23Z

src/res_core.ml

  | _ ->
-    begin match recoverLident p with
+    begin match recoverLidentNoEof p with


OK to call: we just checked for Eof in the previous case of the match.

cristianoc · 2022-06-14T07:03:04Z

src/res_core.ml

    | Some () ->
      parseLident p
    | None ->
      ("_", mkLoc startPos p.prevEndPos)
    end

-let parseIdent ~msg ~startPos p =
+let parseIdentNoNext ~msg ~startPos p =


We do Parser.next after calling this function.
This is because the caller has more context on what the token can be.

cristianoc · 2022-06-14T07:03:39Z

src/res_core.ml

    ("", mkLoc startPos p.prevEndPos)

+let parseIdent ~msg ~startPos p =
+  let res = parseIdentNoNext ~msg ~startPos p in
+  Parser.nextUnsafe p;


When we don't know, just do nextUnsafe. That's OK as long as termination analysis is happy.

cristianoc · 2022-06-14T07:04:44Z

src/res_core.ml

-  Parser.next p;
+  ident
+
+let parseValuePath p =


This is what we call when we don't know what the token is.

cristianoc · 2022-06-14T07:06:55Z

src/res_core.ml

-    Parser.next p;
-    Some (true, PatField (parseRecordPatternField p))
+    Parser.expect DotDotDot p;
+    Some (true, PatField (parseRecordPatternField ~label:(parseValuePath p) p))


Here we don't know what the current token is.
But, termination analysis is happy so we're good.

cristianoc · 2022-06-14T07:07:20Z

src/res_core.ml

  | Uident _ | Lident _ ->
-    Some (false, PatField (parseRecordPatternField p))
+    Some (false, PatField (parseRecordPatternField ~label:(parseValuePathNotEof p) p))


Here we do know the token is not Eof. If we did not propagate this info, which lets us use next instead of nextUnsafe termination analysis would fire.

cristianoc · 2022-06-14T07:08:44Z

src/res_core.ml

-  let lbracket = p.startPos in
-  Parser.next p;
-  let stringStart = p.startPos in
+and parseBracketAccess p ~lbracket expr startPos =


We do the next in the caller, which has more context on what the token is.

cristianoc · 2022-06-14T07:09:29Z

src/res_core.ml

-      parseBracketAccess p expr startPos
+      Parser.leaveBreadcrumb p Grammar.ExprArrayAccess;
+      let lbracket = p.startPos in
+      Parser.expect Lbracket p;    


What was a next inside the callee, is now an expect in the caller.

cristianoc · 2022-06-14T07:10:05Z

src/res_core.ml

  | _ -> ()
  in
  match p.Parser.token with
  | Lident _ | Uident _ ->
    let startToken = p.token in
-    let field = parseValuePath p in
+    let field = parseValuePathNotEof p in


We know this is not Eof.

cristianoc · 2022-06-14T10:10:17Z

This is way too noisy to even think about merging something like this.
But it serves as an indication of the gap between what's out there and what one would need in order to get stronger static guarantees.

cristianoc · 2022-06-14T23:28:29Z

src/res_core.ml

-      match p.token with
-      | Backtick -> Parser.next p; ()
-      | _ -> skipTokens ()
+      if p.token <> Eof then (


And this is an infinite loop:

let foo = x => switch x { | `${

See #542

cristianoc · 2022-06-14T23:32:36Z

src/res_core.ml

        let rec loop p =
-          if not (Recover.shouldAbortListParse p)
+          if not (Recover.shouldAbortListParse p) && p.token <> Eof


This looks like could be another infinite loop

cristianoc · 2022-06-14T23:37:27Z

src/res_core.ml

-    {typ with
-      ptyp_attributes = List.concat [typ.ptyp_attributes; attrs];
-      ptyp_loc = mkLoc startPos p.prevEndPos}
+    if p.token <> Eof then (


Could this be turned into an infinite loop?

Does not look like it, it would need to go into parseTypExpr/parseEs6ArrowType loop, but that only works if there are fresh => or ~ to look at.

cristianoc added 2 commits June 14, 2022 03:35

cristianoc mentioned this pull request Jun 14, 2022

Do not parse past Eof #542

Merged

cristianoc changed the base branch from master to eof June 14, 2022 06:57

cristianoc commented Jun 14, 2022

View reviewed changes

cristianoc requested a review from IwanKaramazow June 14, 2022 07:11

cristianoc added 5 commits June 14, 2022 11:01

Half way through auditing Parser.expect.

89c3d7f

More audit of expect.

31e1e18

Complete audit of expect.

3e40870

Comments.

18892cd

Restore error recovery for arrow, except in case of Eof.

57f8e4f

cristianoc commented Jun 14, 2022

View reviewed changes

cristianoc force-pushed the eof branch from 3500eab to 53ee86b Compare June 15, 2022 00:31

Base automatically changed from eof to master June 16, 2022 00:04

cristianoc marked this pull request as draft June 16, 2022 00:04

cristianoc changed the title ~~Eof audit~~ Eof audit (DO NOT MERGE) Jun 16, 2022

cristianoc mentioned this pull request Jul 3, 2022

checking Eof before calling parseIdent #596

Merged

cristianoc closed this Jul 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Eof audit (DO NOT MERGE) #543

Eof audit (DO NOT MERGE) #543

Uh oh!

cristianoc commented Jun 14, 2022 •

edited

Loading

Uh oh!

cristianoc Jun 14, 2022

Uh oh!

cristianoc Jun 14, 2022

Uh oh!

cristianoc Jun 14, 2022

Uh oh!

cristianoc Jun 14, 2022

Uh oh!

cristianoc Jun 14, 2022

Uh oh!

cristianoc Jun 14, 2022

Uh oh!

cristianoc Jun 14, 2022

Uh oh!

cristianoc Jun 14, 2022

Uh oh!

cristianoc Jun 14, 2022

Uh oh!

cristianoc Jun 14, 2022

Uh oh!

cristianoc commented Jun 14, 2022

Uh oh!

cristianoc Jun 14, 2022

Uh oh!

cristianoc Jun 14, 2022

Uh oh!

cristianoc Jun 14, 2022

Uh oh!

cristianoc Jun 16, 2022

Uh oh!

Uh oh!

Eof audit (DO NOT MERGE) #543

Eof audit (DO NOT MERGE) #543

Uh oh!

Conversation

cristianoc commented Jun 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cristianoc commented Jun 14, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cristianoc commented Jun 14, 2022 •

edited

Loading