Skip to content

Allow comments between intersection types and by-ref params #10125

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

iluuu1994
Copy link
Member

Follow up to GH-10084. Found a way to do this after all. /cc @olleharstedt

@iluuu1994 iluuu1994 requested a review from bwoebi December 18, 2022 00:06
@iluuu1994 iluuu1994 force-pushed the allow-comments-between-intersection-types branch from 3671e1a to dc6daf2 Compare December 18, 2022 00:07
@iluuu1994
Copy link
Member Author

/cc @cmb69 Also mentioning you as you expressed distaste for a regex solution. Let me know if this is fine or if you prefer to see this compared to GLP first. I agree that we should switch to GLP sooner or later.

@olleharstedt
Copy link

Impressive :)

@cmb69
Copy link
Member

cmb69 commented Dec 19, 2022

Let's say that I'm generally not happy with complicating the parser with work-arounds, but this is okay for me.

Copy link
Member

@Girgias Girgias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is kinda cursed... but sure why not.

@iluuu1994
Copy link
Member Author

FWIW I also just tried switching to Bison GLR and it's very simple. Bison also allows declaring the expected number of conflicts using %expect[-rr] n to avoid introducing unwanted conflicts in future additions. There is a performance impact here, as the parser clones itself when encountering a conflict and exploring all branches. However that will be highly dependent on the number of conflicts and the program itself.

master...iluuu1994:bison-glr

@bwoebi Would be interested in your opinion here.

@Girgias
Copy link
Member

Girgias commented Jan 26, 2023

I can imagine @dseguy be frustrating about merging back the ampersand tokens, as I know he found it useful for Exakat. :)

@cmb69
Copy link
Member

cmb69 commented Jan 26, 2023

master...iluuu1994:bison-glr

Cool! (I guess more could be simplified.)

There is a performance impact here, as the parser clones itself when encountering a conflict and exploring all branches. However that will be highly dependent on the number of conflicts and the program itself.

We would need to measure, but assuming OPcache is enabled, some performance degradation in the parser shouldn't be a big deal.

But given that the tokens would change, I don't think we can introduce that in a minor version.

@iluuu1994
Copy link
Member Author

iluuu1994 commented Jan 26, 2023

@cmb69

I guess more could be simplified.

Yes, there are more lexer/parser hacks we could get rid of. I just wanted to see if it worked.

But given that the tokens would change, I don't think we can introduce that in a minor version.

The T_AMPERSAND was replaced with the context-sensitive tokens in PHP 8.1 with intersection types. Not saying this shouldn't wait for PHP 9, just that we don't have any rules on that.

@cmb69
Copy link
Member

cmb69 commented Jan 26, 2023

The T_AMPERSAND was replaced with the context-sensitive tokens in PHP 8.1 with intersection types.

Ah, indeed! So maybe it might be an option to switch to GLR in the next minor. Should be discussed on the ML, anyway.

@bwoebi
Copy link
Member

bwoebi commented Jan 26, 2023

glr

:-)

So, the good thing about glr is that it allows bison to do much more lookahead and the bad thing is that you may have more lookahead.

In general, a GLR parser can take quadratic or cubic worst-case time, and the current Bison parser even takes exponential time and space for some grammars. ~ https://www.gnu.org/software/bison/manual/html_node/Simple-GLR-Parsers.html

The problem is that you now potentially run into degenerate cases (which is hard to test for, given that e.g. it might occur in big generated codes), if the grammar is not carefully constrained, especially with grammars as complex as PHPs.

Even if it's not a problem today (as our grammar currently is well constrained, given it runs at all with no GLR), that's a problem for future developments because it's very hard to estimate how "well constrained" a new addition is, until you get the bug reports from users that their code which worked well up to now suddenly takes a second to compile.

And that's why we haven't used GLR yet. It's a nice catch-all solution, with hidden drawbacks.

Even though I'd love to see GLR because it makes things simpler from a dev perspective... I'd rather not.

@dseguy
Copy link

dseguy commented Jan 26, 2023

I can imagine @dseguy be frustrating about merging back the ampersand tokens, as I know he found it useful for Exakat. :)

It is quite wicked syntax, indeed.
This is going to require some more processing, though nothing impossible.

@iluuu1994
Copy link
Member Author

@bwoebi Thanks for your insight. I'm fine with this approach then. For generics we can use the turbofish syntax, if it ever comes to that. I don't know of any other addition atm that would require a GLR parser.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants