bpo-43892: Make match patterns explicit in AST #25585

ncoghlan · 2021-04-25T08:43:53Z

Separates "pattern" and "expr" nodes in AST
Adds a new "Pattern matching" section in the ast docs,
separate from the "Control flow" section
AST node names are inspired by the AST proposed
in PEP 642, but adjusted to better match the semantics
of PEP 634 and to be more consistent with the existing AST
Grammar definition has been updated to emit the
newly defined AST nodes
Parser now emits SyntaxError for ill-formed complex literals
(so the AST optimiser no longer has to do it)
Code generator consumes the new nodes
AST validator consumes the new nodes
AST unparser consumes new nodes
test_unparse always checks the AST roundtrip for test_patma.py
Add some clarifying comments to the unparsing code

https://bugs.python.org/issue43892

* Separates "pattern" and "expr" nodes in AST * AST node names are derived from the AST proposed in PEP 642, except that MatchValue always matches by equality and a separate MatchConstant node is defined for matching by identity * Grammar definition has been updated to emit the newly defined AST nodes * TODO: update code generator to consume new nodes * TODO: update AST validator to consume new nodes * TODO: update AST unparser to consume new nodes

…pattern-matching-ast

ncoghlan · 2021-04-26T12:33:53Z

Merged in master, adjusting for the recursion tracking in the AST validator and optimiser
Updated the AST documentation (the MatchAs and MatchOr doctests failed, highlighting that the new nodes also needed documentation)

brandtbucher · 2021-04-27T18:43:25Z

@pablogsal, do you want to see buildbots on this before merging?

pablogsal · 2021-04-27T19:07:13Z

@pablogsal, do you want to see buildbots on this before merging?

Either that or run manually a bunch of related tests yourself with -R : and paste the output here :)

brandtbucher · 2021-04-27T20:07:11Z

This good?

$ ./python -m test -R : test_ast test_symtable test_peg_generator test_patma test_grammar test_unparse test_compile test_syntax
0:00:00 load avg: 2.04 Run tests sequentially
0:00:00 load avg: 2.04 [1/8] test_ast
beginning 9 repetitions
123456789
.........
0:00:21 load avg: 1.82 [2/8] test_symtable
beginning 9 repetitions
123456789
.........
0:00:21 load avg: 1.82 [3/8] test_peg_generator
beginning 9 repetitions
123456789
/home/bucher/src/cpython/Lib/test/test_peg_generator/test_c_parser.py:4: DeprecationWarning: The distutils package is deprecated and slated for removal in Python 3.12. Use setuptools or check PEP 632 for potential alternatives
  from distutils.tests.support import TempdirManager
/home/bucher/src/cpython/Lib/test/support/__init__.py:1671: DeprecationWarning: The distutils.sysconfig module is deprecated, use sysconfig instead
  from distutils import ccompiler, sysconfig, spawn, errors
.........
0:08:10 load avg: 1.41 [4/8] test_patma -- test_peg_generator passed in 7 min 48 sec
beginning 9 repetitions
123456789
.........
0:08:10 load avg: 1.41 [5/8] test_grammar
beginning 9 repetitions
123456789
.........
0:08:10 load avg: 1.41 [6/8] test_unparse
beginning 9 repetitions
123456789
.........
0:08:36 load avg: 1.40 [7/8] test_compile
beginning 9 repetitions
123456789
.........
0:09:14 load avg: 1.27 [8/8] test_syntax -- test_compile passed in 38.1 sec
beginning 9 repetitions
123456789
.........

== Tests result: SUCCESS ==

All 8 tests OK.

Total duration: 9 min 15 sec
Tests result: SUCCESS

pablogsal · 2021-04-27T20:13:46Z

enerator test_patma test_grammar test_unparse test_compile test_syntax
0:00:00 load avg: 2.04 Run tests sequentially
0:00:00 load avg: 2.04 [1/8] test_ast

Yup yup, go ahead! 🚀

pablogsal · 2021-04-27T20:15:13Z

Python/pythonrun.c

@@ -1366,7 +1366,7 @@ Py_CompileStringObject(const char *str, PyObject *filename, int start,
        return NULL;

    mod = _PyParser_ASTFromString(str, filename, start, flags, arena);
-    if (mod == NULL) {
+    if (mod == NULL || !_PyAST_Validate(mod)) {


Wait, why is this change here? THis is going to slow down quite a lot Py_CompileStringObject code path. Notice that the parser already calls this on input only on debug mode precisely because of performance concerns.

The design principle is that the ast that comes from the parser must be always correct and I think we should certainly preserve that.

Let's remove it for now, then. I understood it as being motivated by the points in this comment.

Regading this comment there is an important fact to highlight:

compiling from strings didn't attempt to validate them because it never attempted to validate anything

The actual reason is because by design the parser code must return a valid AST (there is a debug-mode validation step to ensure that this contract is true).

Hm, looks like the new code might have been relying on that pass to give us a SyntaxError in test_patma_240. I'll try to figure out what's going on.

The parser can't verify that a complex literal is actually a complex literal because the tokeniser just emits NUMBER for both real and imaginary numbers. As a result, 0+0 makes it through the parsing rules for pattern matching.

The compiler can't reject the syntax because it never sees it - the value has turned into the legal constant 0 by the time the pattern gets to it.

The original pattern matching implementation worked around that limitation by putting a validation step in the AST constant folding pass: instead of using the standard constant folding for expressions, it reimplemented the subset required to fold the literals that pattern matching supported and emitted a syntax error for binary operations that weren't complex numbers.

This PR switches over to using the standard expression folding functions in the optimisation pass, so forcing validation was needed to keep "0+0" illegal in value patterns and mapping pattern keys.

I'll try replace it with a more targeted fix that adds a helper function to pegen that ensures the AST coming out of the parser is valid rather than relying on the AST validator or the AST constant folding to reject it.

The compiler can't reject the syntax because it never sees it - the value has turned into the legal constant 0 by the time the pattern gets to it.

We probably shouldn't fold the AST for patterns in that case so the compiler can handle it. Or I am missing something?

I just pushed a change that restores the invariant of the parser always producing a valid AST without having to weaken the checks on the AST nodes for value patterns and mapping patterns keys.

The trick is a new _PyAST_EnsureImaginary helper function that accepts and returns expr_ty.

It looks like I didn't get the reversion/merge with Brandt's changes quite right though, so I'll post an example after cleaning that up.

I had reverted the validator and compiler changes, but the AST optimiser changes needed to be reverted as well.
This is how compilation from a string behaves with the parser enhanced to ensure that complex literals actually are complex numbers:

>>> match x: ... case 0+0: SyntaxError: Imaginary number required in complex literal

So with that everything is handling the responsibilities it should be handling:

parser rejects "0+0" when compiling from a string rather than producing an invalid AST

validator rejects "0+0" when compiling from a prebuilt AST tree

optimiser and compiler have enough checks to protect themselves from segfaults, but are otherwise pretty tolerant

After the first iteration, I realised the imaginary number checking would work better as a pegen helper rather than as an AST helper:

>>> match x: ... case 0+0: File "<stdin>", line 2 case 0+0: ^ SyntaxError: Imaginary number required in complex literal

pablogsal · 2021-04-27T20:19:24Z

Grammar/python.gram

+positional_patterns[asdl_pattern_seq*]:
+    | args[asdl_pattern_seq*]=','.pattern+ { args }
+keyword_patterns[asdl_seq*]:
+    | keywords[asdl_seq*]=','.keyword_pattern+ { keywords }


You can probably change this to just ','.keyword_pattern+. Same for similar stuff around this like | items=','.key_value_pattern+ { items }. But we can clean this afterwards, for sure.

Yeah, this and PEP 642 were my first real ventures into working with the new parser. It's really nice, but it isn't always obvious when it will need help with typecasting and when it will be able to figure it on its own (e.g. I think this node used to be a more specific type than adsl_seq, so it would have needed the typecast)

Yeah, this and PEP 642 were my first real ventures into working with the new parser. It's really nice, but it isn't always obvious when it will need help with typecasting and when it will be able to figure it on its own (e.g. I think this node used to be a more specific type than adsl_seq, so it would have needed the typecast)

I'm preparing a new documentation for working with the parser :)

How has been your experience so far? Any feedback (good and bad) is good feedback ;)

brandtbucher · 2021-04-28T03:48:49Z

Okay, I disabled some overly-aggressive optimizations for value patterns and mapping patterns in the AST optimizer and moved validation of numeric literals back into the compiler. Reran the refleak tests locally and everything looks good.

brandtbucher · 2021-04-28T05:33:56Z

@ncoghlan, it's almost time for me to go to sleep and for you to wake up. I'm happy with this now, so go ahead and merge if you agree!

ncoghlan · 2021-04-28T12:32:02Z

@brandtbucher The parser potentially producing an illegal AST tree never sat well with me, and it finally occurred to me today how to fix it: filter the required-to-be-imaginary parse nodes through an AST helper function that raises a syntax error if they're not complex numbers. Between the tokenizer ensuring that the nodes are literals, and the helper function ensuring that they're complex number, we get a combined assurance that the value is an imaginary number.

So I reworked things to frontload the validation again, while keeping your other changes.

ncoghlan · 2021-04-28T13:05:41Z

Updated refleak hunting results (still clean, although it appears my laptop is slightly slower than Brandt's machine):

$ ./python -m test -R : test_ast test_symtable test_peg_generator test_patma test_grammar test_unparse test_compile test_syntax
0:00:00 load avg: 0.68 Run tests sequentially
0:00:00 load avg: 0.68 [1/8] test_ast
beginning 9 repetitions
123456789
.........
0:00:20 load avg: 0.79 [2/8] test_symtable
beginning 9 repetitions
123456789
.........
0:00:20 load avg: 0.79 [3/8] test_peg_generator
beginning 9 repetitions
123456789
/home/ncoghlan/devel/cpython/Lib/test/test_peg_generator/test_c_parser.py:4: DeprecationWarning: The distutils package is deprecated and slated for removal in Python 3.12. Use setuptools or check PEP 632 for potential alternatives
  from distutils.tests.support import TempdirManager
/home/ncoghlan/devel/cpython/Lib/test/support/__init__.py:1671: DeprecationWarning: The distutils.sysconfig module is deprecated, use sysconfig instead
  from distutils import ccompiler, sysconfig, spawn, errors
.........
0:10:38 load avg: 1.39 [4/8] test_patma -- test_peg_generator passed in 10 min 17 sec
beginning 9 repetitions
123456789
.........
0:10:38 load avg: 1.39 [5/8] test_grammar
beginning 9 repetitions
123456789
.........
0:10:39 load avg: 1.39 [6/8] test_unparse
beginning 9 repetitions
123456789
.........
0:10:58 load avg: 1.28 [7/8] test_compile
beginning 9 repetitions
123456789
.........
0:11:31 load avg: 1.15 [8/8] test_syntax -- test_compile passed in 33.4 sec
beginning 9 repetitions
123456789
.........

== Tests result: SUCCESS ==

All 8 tests OK.

Total duration: 11 min 32 sec
Tests result: SUCCESS

ncoghlan · 2021-04-28T13:08:28Z

Grammar/python.gram

 signed_number[expr_ty]:
    | NUMBER
    | '-' number=NUMBER { _PyAST_UnaryOp(USub, number, EXTRA) }

-capture_pattern[expr_ty]:
+imaginary_number[expr_ty]:
+    | imag=NUMBER { _PyPegen_ensure_imaginary(p, imag) }


@brandtbucher @pablogsal This is the new helper that ensures the parser won't accept "0+0" and similar constructs as "complex literals".

brandtbucher · 2021-04-28T16:25:15Z

Hm, this still allows some illegal value expressions:

>>> match 1j:
...     case 1j+0j:
...         print(":(")
... 
:(

(We should probably add a test for this.)

I personally still prefer checking these in the compiler (as I left the branch last night), since it keeps us from having to put the same logic for rejecting illegal value patterns in two places (the parser and the validator).

Note that there are already lots of illegal trees that make it as far as the compiler: yield/async/await/return outside of functions, break/continue outside of loops, illegal * expressions, etc. I see the job of the validator as being "make sure this tree can't crash the compiler", not "make sure this tree is valid Python".

I'm almost ready to drop complex literal value patterns entirely and wait until somebody asks for them. And then I can just tell them to use a dotted name instead. 😉

pablogsal · 2021-04-28T17:40:04Z

Hm, this still allows some illegal value expressions:
>>> match 1j:
...     case 1j+0j:
...         print(":(")
... 
:(
(We should probably add a test for this.)

I personally still prefer checking these in the compiler (as I left the branch last night), since it keeps us from having to put the same logic for rejecting illegal value patterns in two places (the parser and the validator).

Note that there are already lots of illegal trees that make it as far as the compiler: yield/async/await/return outside of functions, break/continue outside of loops, illegal * expressions, etc. I see the job of the validator as being "make sure this tree can't crash the compiler", not "make sure this tree is valid Python".

The key is that the AST check is to check manually crafted AST objects. Otherwise the compiler and the parser are obviously coupled, so the compiler knows what needs to check and what not, which is how is normally done and keeps it efficient.

The slightly sad thing that I want to improve is more Syntax errors to the grammar so the grammar represents Python and not "almost Python".

I'm almost ready to drop complex literal value patterns entirely and wait until somebody asks for them. And then I can just tell them to use a dotted name instead. 😉

You have until this Sunday to decide 😉

brandtbucher · 2021-04-28T17:57:38Z

Looping in @gvanrossum. Are complex literal value patterns worth the complexity right now? I think we just added them because we could.

If so, then we can throw out BinOp values entirely, and only accept:

strings
byte-strings
floats
negated floats
ints
negated ints
imaginary numbers
negated imaginary numbers

So, continue to allow (4.2j, and -4.2j, but not -4.2+4.2j).

gvanrossum · 2021-04-28T18:01:54Z

I find it important that we can spell all values for these types. So I think we should keep it. (But I do not think that checking for all invalid cases needs to be implemented before beta 1.)

brandtbucher · 2021-04-29T05:32:58Z

Okay, that makes sense.

Unfortunately, I'm not sure how to regenerate the parser on somebody else's PR. Since there's already a lot going on here, I am going to just land this as-is and open a smaller follow-up PR for the cleanup.

ncoghlan · 2021-04-29T11:24:23Z

That approach makes sense to me, too. Thanks for the reviews @brandtbucher, @pablogsal!

ncoghlan added 18 commits April 20, 2021 23:40

Merge remote-tracking branch 'origin/master' into bpo-43892-explicit-…

6f62cc9

…pattern-matching-ast

Everything except symtable.c now compiles

2d25ff7

Link to the validator ticket

2a319fd

Ensure unparsing tests always cover pattern matching

14a5b2b

Get symtable.c compiling

ba2932f

Require end attributes on pattern nodes

1dccb6f

MatchConstant -> MatchSingleton

207a43e

Formatting tweak

ba104d1

Fix segfault in wildcard name bindings

7cdb735

Get test_patma compiling

efc0f73

Reject matching 0+0 and f-strings

a0e13da

Merge remote-tracking branch 'origin/master' into bpo-43892-explicit-…

1680513

…pattern-matching-ast

Dedicated MatchMapping arg for storing remaining keys

54940a3

Merge remote-tracking branch 'origin/master' into bpo-43892-explicit-…

0b21140

…pattern-matching-ast

Use 'name' for identifier AST fields, not 'target'

d6e69f0

Implement unparsing

5a0b4f2

Add news entry

e07781e

ncoghlan requested a review from brandtbucher April 25, 2021 08:43

ncoghlan requested review from isidentical, lysnikolaou, markshannon and pablogsal as code owners April 25, 2021 08:43

the-knights-who-say-ni added the CLA signed label Apr 25, 2021

bedevere-bot added the awaiting core review label Apr 25, 2021

ncoghlan mentioned this pull request Apr 25, 2021

WIP bpo-43892: Make match patterns explicit in AST ncoghlan/cpython#9

Closed

ncoghlan added 2 commits April 26, 2021 21:24

Merge remote-tracking branch 'origin/master' into bpo-43892-explicit-…

5cb9f0b

…pattern-matching-ast

Update AST documentation

b17e7f4

Update recursion depth for nested patterns in AST optimiser

466fccf

pablogsal reviewed Apr 27, 2021

View reviewed changes

brandtbucher added 5 commits April 27, 2021 13:56

Revert AST validation step

490db51

Reorder tests

ea4ea93

Remove overly-aggressive optimizations

e4cc5d7

Numeric literals are validated in the compiler

4faea47

Validate numeric literals in the compiler

fe26e14

ncoghlan added 4 commits April 28, 2021 22:09

Revert to checking subexpressions in the validator

ac154fa

Ensure parser rejects 0+0 in match patterns

3ef2880

Don't add renamed test case back

8ed9847

Restore optimisations as compiler is not longer going validation

46e7895

Move syntax checking helper to pegen

8ba335e

ncoghlan requested a review from brandtbucher April 28, 2021 12:49

ncoghlan commented Apr 28, 2021

View reviewed changes

brandtbucher merged commit 1e7b858 into python:master Apr 29, 2021

bedevere-bot removed the awaiting merge label Apr 29, 2021

Uh oh!

bpo-43892: Make match patterns explicit in AST #25585

bpo-43892: Make match patterns explicit in AST #25585

Uh oh!

Conversation

ncoghlan commented Apr 25, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ncoghlan commented Apr 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

brandtbucher commented Apr 27, 2021

Uh oh!

pablogsal commented Apr 27, 2021

Uh oh!

brandtbucher commented Apr 27, 2021

Uh oh!

pablogsal commented Apr 27, 2021

Uh oh!

pablogsal Apr 27, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ncoghlan Apr 28, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pablogsal Apr 27, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ncoghlan Apr 28, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

brandtbucher commented Apr 28, 2021

Uh oh!

brandtbucher commented Apr 28, 2021

Uh oh!

ncoghlan commented Apr 28, 2021

Uh oh!

ncoghlan commented Apr 28, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

brandtbucher commented Apr 28, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pablogsal commented Apr 28, 2021

Uh oh!

brandtbucher commented Apr 28, 2021

Uh oh!

gvanrossum commented Apr 28, 2021

Uh oh!

brandtbucher commented Apr 29, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ncoghlan commented Apr 29, 2021

Uh oh!

Uh oh!

ncoghlan commented Apr 25, 2021 •

edited

Loading

ncoghlan commented Apr 26, 2021 •

edited

Loading

pablogsal Apr 27, 2021 •

edited

Loading

ncoghlan Apr 28, 2021 •

edited

Loading

pablogsal Apr 27, 2021 •

edited

Loading

ncoghlan Apr 28, 2021 •

edited

Loading

brandtbucher commented Apr 28, 2021 •

edited

Loading

brandtbucher commented Apr 29, 2021 •

edited

Loading