Skip to content
This repository was archived by the owner on Jun 15, 2023. It is now read-only.

parser crashes on polymorphic variant type #540

Merged
merged 4 commits into from
Jun 15, 2022

Conversation

Sehun0819
Copy link
Contributor

type x = [<y>

Parser crashes with stackoverflow on incomplete polymorphic variant type like above.
Because the closure loop is recursively called in wildcard case, Eofcan't escape.

@cristianoc
Copy link
Contributor

@Sehun0819 thanks for reporting this!
Would you add a test case too?
Just drop a file into tests/parsing/infiniteLoops and check in the resulting generated file.

Separately, I'd like to investigate how that did get past the infinite loop analysis.

@cristianoc
Copy link
Contributor

Looks like the scanner keeps on returning Eof forever if you keep on calling it, violating one of the assumptions of termination analysis.
CC @IwanKaramazow

@IwanKaramazow
Copy link
Contributor

Can you write the loop through parseRegion?
That should take care of some common cases when parsing “lists of things” and brings recovery out of the box

cristianoc added a commit that referenced this pull request Jun 14, 2022
Parsing past Eof can lead to infinite loops, as it violates the assumptions of the termination checker.
See: #540

This PR makes `Parser.next` assert false when called on Eof. This should not happen as one should check the token before calling `Parser.next`.

The one exception is during lookahead, for which we provide a `nextUnsafe` function which does not make progress.
@Sehun0819
Copy link
Contributor Author

@cristianoc I added a new commit.
Is it what you meant?

@cristianoc
Copy link
Contributor

@cristianoc I added a new commit.
Is it what you meant?

Yes thanks for that.

@cristianoc
Copy link
Contributor

cristianoc commented Jun 14, 2022

There's still:

Can you write the loop through parseRegion?

to take care of.
Is the request clear enough?

Then I can merge.

@cristianoc
Copy link
Contributor

If one parses a “list of things” a.k.a. a region, then you should use parseRegion, with let rec loop you’re on your own. parseRegion automatically takes take of the Eof case.

@Sehun0819
Copy link
Contributor Author

You mean,

parseRegion p
  ~grammar: _
  ~f: parseHashIdent

something like above?

2 questions come to mind,

  1. The callback function f should return an option, while parseHashIdent doesn’t.
 Maybe I should define a wrapper closure.
  2. I have no idea which grammar corresponds to the ‘hash ident’.

It would be possible to make things work somehow, but using workaround seems not a good answer for you.
Could you suggest a way to deal with it?

@cristianoc
Copy link
Contributor

You mean,

parseRegion p
  ~grammar: _
  ~f: parseHashIdent

something like above?

2 questions come to mind,

  1. The callback function f should return an option, while parseHashIdent doesn’t.
 Maybe I should define a wrapper closure.
  2. I have no idea which grammar corresponds to the ‘hash ident’.

It would be possible to make things work somehow, but using workaround seems not a good answer for you. Could you suggest a way to deal with it?

Yes that's the idea, which I'm quoting from @IwanKaramazow .
For 1, yes parseHashIdent can be wrapped into a closure. The expectation is that it should return None when there are no more elements of that type to be parsed. Let me take a closer look see if this actually makes sense in this case.
For 2 too, let me take a look at the grammars out there and get back.

@cristianoc
Copy link
Contributor

cristianoc commented Jun 14, 2022

For 1, the closure would return None on Rbracket, and Some (ident) otherwise.
For 2, the grammar is just for debugging. One can define a new grammar e.g. TagNames in ResGrammar and just extend the toString function used for debugging.

@Sehun0819
Copy link
Contributor Author

I tried to replace let rec loop with (parseRegion + parseHashIdent wrapper), and failed to pass termination analysis because the callback ~f:parseHashIdentOpt was not recursive, while the analyzer expected it to be recursive. Because parseHashIdent is not a recursive function, I had no idea how to make them work in a natural way.
(You can see this version at my another branch)

Instead, I reorganized pattern matching priority so that the recursive call to loop always happen in appropriate situation, without any special treatment on token Eof. Is it acceptable?

@cristianoc
Copy link
Contributor

@Sehun0819 I've expanded the parseRegion case here: #545
Had to take a look at how other examples are handles, and figured one needs to have isListTerminator adapted too in the grammar.

@cristianoc
Copy link
Contributor

Feel free to incorporate here and we can merge.

@Sehun0819
Copy link
Contributor Author

Thank you!
Looks much better :)

@cristianoc cristianoc merged commit dd8339a into rescript-lang:master Jun 15, 2022
cristianoc added a commit that referenced this pull request Jun 15, 2022
Parsing past Eof can lead to infinite loops, as it violates the assumptions of the termination checker.
See: #540

This PR makes `Parser.next` assert false when called on Eof. This should not happen as one should check the token before calling `Parser.next`.

The one exception is during lookahead, for which we provide a `nextUnsafe` function which does not make progress.
cristianoc added a commit that referenced this pull request Jun 16, 2022
Parsing past Eof can lead to infinite loops, as it violates the assumptions of the termination checker.
See: #540

This PR makes `Parser.next` assert false when called on Eof. This should not happen as one should check the token before calling `Parser.next`.

The one exception is during lookahead, for which we provide a `nextUnsafe` function which does not make progress.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants