-
Notifications
You must be signed in to change notification settings - Fork 171
Add initial tokenizer for Python #310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
03323a8
to
d2574a5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Let's merge this once we fix the CI
Hi @certik, I updated the tokenizer to recognise Comments, DocString, Symbols and also made the CI pass. |
I tested this with many Python files. I got some errors, which I fixed and will push here. |
There is a conflict between String and Docstring in the tokenizer, we need to fix that. |
Can you get the CI passing again please? |
b05ef58
to
4a0ae9f
Compare
I left a fix above, after the update I think this is good to go in. After it is merged, we should do three more steps before moving to the parser:
It doesn't have to be perfect, it won't be perfect, but I would definitely do the above three things. Then we can move to the parser and start parsing it. And then we'll iterate on the design of the tokenizer as needed. I posted this comment at #298 (comment). |
Updade the tests
Perfect, I will work on |
Thanks! I think this looks good, now we just have to iteratively improve it, the TODO list is written up at: #298 (comment) |
One can use it using
lpython --show-tokens somefile
. Currently the tokenizer is a Fortran tokenizer, so we need to adapt it to become a Python tokenizer.Towards #298.