Skip to content

bpo-43892: Make match patterns explicit in AST #25585

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
1d729b4
WIP bpo-43892: Make match patterns explicit in AST
ncoghlan Apr 20, 2021
6f62cc9
Merge remote-tracking branch 'origin/master' into bpo-43892-explicit-…
ncoghlan Apr 22, 2021
2d25ff7
Everything except symtable.c now compiles
ncoghlan Apr 22, 2021
2a319fd
Link to the validator ticket
ncoghlan Apr 22, 2021
14a5b2b
Ensure unparsing tests always cover pattern matching
ncoghlan Apr 22, 2021
ba2932f
Get symtable.c compiling
ncoghlan Apr 23, 2021
1dccb6f
Require end attributes on pattern nodes
ncoghlan Apr 23, 2021
207a43e
MatchConstant -> MatchSingleton
ncoghlan Apr 23, 2021
ba104d1
Formatting tweak
ncoghlan Apr 23, 2021
7cdb735
Fix segfault in wildcard name bindings
ncoghlan Apr 23, 2021
efc0f73
Get test_patma compiling
ncoghlan Apr 23, 2021
a0e13da
Reject matching 0+0 and f-strings
ncoghlan Apr 23, 2021
1680513
Merge remote-tracking branch 'origin/master' into bpo-43892-explicit-…
ncoghlan Apr 23, 2021
54940a3
Dedicated MatchMapping arg for storing remaining keys
ncoghlan Apr 23, 2021
0b21140
Merge remote-tracking branch 'origin/master' into bpo-43892-explicit-…
ncoghlan Apr 25, 2021
d6e69f0
Use 'name' for identifier AST fields, not 'target'
ncoghlan Apr 25, 2021
5a0b4f2
Implement unparsing
ncoghlan Apr 25, 2021
e07781e
Add news entry
ncoghlan Apr 25, 2021
5cb9f0b
Merge remote-tracking branch 'origin/master' into bpo-43892-explicit-…
ncoghlan Apr 26, 2021
b17e7f4
Update AST documentation
ncoghlan Apr 26, 2021
466fccf
Update recursion depth for nested patterns in AST optimiser
ncoghlan Apr 26, 2021
6a9fbc8
Fix remaining AST doctest
ncoghlan Apr 26, 2021
e8a181a
Validate the AST emitted by the parser
ncoghlan Apr 26, 2021
0563d22
Merge remote-tracking branch 'origin/master' into bpo-43892-explicit-…
ncoghlan Apr 26, 2021
aa429c2
AST validator did not handle FunctionType nodes
ncoghlan Apr 26, 2021
e4dfa5a
Apply suggestions from code review
ncoghlan Apr 27, 2021
2c289e7
Replace MatchAlways() with MatchAs(NULL, NULL)
ncoghlan Apr 27, 2021
4ed22ca
Remove and ignore local doctest execution artifacts
ncoghlan Apr 27, 2021
8e5083c
Fix trailing whitespace
ncoghlan Apr 27, 2021
398929d
Apply suggestions from code review
brandtbucher Apr 27, 2021
fe182d4
Delete Doc/.gitignore
brandtbucher Apr 27, 2021
2b07617
Update assert
brandtbucher Apr 27, 2021
b8add8a
Doon asdl_seq_LEN
brandtbucher Apr 27, 2021
490db51
Revert AST validation step
brandtbucher Apr 27, 2021
ea4ea93
Reorder tests
brandtbucher Apr 28, 2021
e4cc5d7
Remove overly-aggressive optimizations
brandtbucher Apr 28, 2021
4faea47
Numeric literals are validated in the compiler
brandtbucher Apr 28, 2021
fe26e14
Validate numeric literals in the compiler
brandtbucher Apr 28, 2021
ac154fa
Revert to checking subexpressions in the validator
ncoghlan Apr 28, 2021
3ef2880
Ensure parser rejects 0+0 in match patterns
ncoghlan Apr 28, 2021
8ed9847
Don't add renamed test case back
ncoghlan Apr 28, 2021
46e7895
Restore optimisations as compiler is not longer going validation
ncoghlan Apr 28, 2021
8ba335e
Move syntax checking helper to pegen
ncoghlan Apr 28, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
318 changes: 282 additions & 36 deletions Doc/library/ast.rst

Large diffs are not rendered by default.

155 changes: 101 additions & 54 deletions Grammar/python.gram
Original file line number Diff line number Diff line change
Expand Up @@ -234,20 +234,20 @@ case_block[match_case_ty]:
_PyAST_match_case(pattern, guard, body, p->arena) }
guard[expr_ty]: 'if' guard=named_expression { guard }

patterns[expr_ty]:
| values[asdl_expr_seq*]=open_sequence_pattern {
_PyAST_Tuple(values, Load, EXTRA) }
patterns[pattern_ty]:
| patterns[asdl_pattern_seq*]=open_sequence_pattern {
_PyAST_MatchSequence(patterns, EXTRA) }
| pattern
pattern[expr_ty]:
pattern[pattern_ty]:
| as_pattern
| or_pattern
as_pattern[expr_ty]:
| pattern=or_pattern 'as' target=capture_pattern {
as_pattern[pattern_ty]:
| pattern=or_pattern 'as' target=pattern_capture_target {
_PyAST_MatchAs(pattern, target->v.Name.id, EXTRA) }
or_pattern[expr_ty]:
| patterns[asdl_expr_seq*]='|'.closed_pattern+ {
or_pattern[pattern_ty]:
| patterns[asdl_pattern_seq*]='|'.closed_pattern+ {
asdl_seq_LEN(patterns) == 1 ? asdl_seq_GET(patterns, 0) : _PyAST_MatchOr(patterns, EXTRA) }
closed_pattern[expr_ty]:
closed_pattern[pattern_ty]:
| literal_pattern
| capture_pattern
| wildcard_pattern
Expand All @@ -257,78 +257,125 @@ closed_pattern[expr_ty]:
| mapping_pattern
| class_pattern

literal_pattern[expr_ty]:
# Literal patterns are used for equality and identity constraints
literal_pattern[pattern_ty]:
| value=signed_number !('+' | '-') { _PyAST_MatchValue(value, EXTRA) }
| value=complex_number { _PyAST_MatchValue(value, EXTRA) }
| value=strings { _PyAST_MatchValue(value, EXTRA) }
| 'None' { _PyAST_MatchSingleton(Py_None, EXTRA) }
| 'True' { _PyAST_MatchSingleton(Py_True, EXTRA) }
| 'False' { _PyAST_MatchSingleton(Py_False, EXTRA) }

# Literal expressions are used to restrict permitted mapping pattern keys
literal_expr[expr_ty]:
| signed_number !('+' | '-')
| real=signed_number '+' imag=NUMBER { _PyAST_BinOp(real, Add, imag, EXTRA) }
| real=signed_number '-' imag=NUMBER { _PyAST_BinOp(real, Sub, imag, EXTRA) }
| complex_number
| strings
| 'None' { _PyAST_Constant(Py_None, NULL, EXTRA) }
| 'True' { _PyAST_Constant(Py_True, NULL, EXTRA) }
| 'False' { _PyAST_Constant(Py_False, NULL, EXTRA) }

complex_number[expr_ty]:
| real=signed_number '+' imag=imaginary_number { _PyAST_BinOp(real, Add, imag, EXTRA) }
| real=signed_number '-' imag=imaginary_number { _PyAST_BinOp(real, Sub, imag, EXTRA) }

signed_number[expr_ty]:
| NUMBER
| '-' number=NUMBER { _PyAST_UnaryOp(USub, number, EXTRA) }

capture_pattern[expr_ty]:
imaginary_number[expr_ty]:
| imag=NUMBER { _PyPegen_ensure_imaginary(p, imag) }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brandtbucher @pablogsal This is the new helper that ensures the parser won't accept "0+0" and similar constructs as "complex literals".


capture_pattern[pattern_ty]:
| target=pattern_capture_target { _PyAST_MatchAs(NULL, target->v.Name.id, EXTRA) }

pattern_capture_target[expr_ty]:
| !"_" name=NAME !('.' | '(' | '=') {
_PyPegen_set_expr_context(p, name, Store) }

wildcard_pattern[expr_ty]:
| "_" { _PyAST_Name(CHECK(PyObject*, _PyPegen_new_identifier(p, "_")), Store, EXTRA) }
wildcard_pattern[pattern_ty]:
| "_" { _PyAST_MatchAs(NULL, NULL, EXTRA) }

value_pattern[expr_ty]:
| attr=attr !('.' | '(' | '=') { attr }
value_pattern[pattern_ty]:
| attr=attr !('.' | '(' | '=') { _PyAST_MatchValue(attr, EXTRA) }
attr[expr_ty]:
| value=name_or_attr '.' attr=NAME {
_PyAST_Attribute(value, attr->v.Name.id, Load, EXTRA) }
name_or_attr[expr_ty]:
| attr
| NAME

group_pattern[expr_ty]:
group_pattern[pattern_ty]:
| '(' pattern=pattern ')' { pattern }

sequence_pattern[expr_ty]:
| '[' values=maybe_sequence_pattern? ']' { _PyAST_List(values, Load, EXTRA) }
| '(' values=open_sequence_pattern? ')' { _PyAST_Tuple(values, Load, EXTRA) }
sequence_pattern[pattern_ty]:
| '[' patterns=maybe_sequence_pattern? ']' { _PyAST_MatchSequence(patterns, EXTRA) }
| '(' patterns=open_sequence_pattern? ')' { _PyAST_MatchSequence(patterns, EXTRA) }
open_sequence_pattern[asdl_seq*]:
| value=maybe_star_pattern ',' values=maybe_sequence_pattern? {
_PyPegen_seq_insert_in_front(p, value, values) }
| pattern=maybe_star_pattern ',' patterns=maybe_sequence_pattern? {
_PyPegen_seq_insert_in_front(p, pattern, patterns) }
maybe_sequence_pattern[asdl_seq*]:
| values=','.maybe_star_pattern+ ','? { values }
maybe_star_pattern[expr_ty]:
| patterns=','.maybe_star_pattern+ ','? { patterns }
maybe_star_pattern[pattern_ty]:
| star_pattern
| pattern
star_pattern[expr_ty]:
| '*' value=(capture_pattern | wildcard_pattern) {
_PyAST_Starred(value, Store, EXTRA) }

mapping_pattern[expr_ty]:
| '{' items=items_pattern? '}' {
_PyAST_Dict(CHECK(asdl_expr_seq*, _PyPegen_get_keys(p, items)), CHECK(asdl_expr_seq*, _PyPegen_get_values(p, items)), EXTRA) }
star_pattern[pattern_ty]:
| '*' target=pattern_capture_target {
_PyAST_MatchStar(target->v.Name.id, EXTRA) }
| '*' wildcard_pattern {
_PyAST_MatchStar(NULL, EXTRA) }

mapping_pattern[pattern_ty]:
| '{' '}' {
_PyAST_MatchMapping(NULL, NULL, NULL, EXTRA) }
| '{' rest=double_star_pattern ','? '}' {
_PyAST_MatchMapping(NULL, NULL, rest->v.Name.id, EXTRA) }
| '{' items=items_pattern ',' rest=double_star_pattern ','? '}' {
_PyAST_MatchMapping(
CHECK(asdl_expr_seq*, _PyPegen_get_pattern_keys(p, items)),
CHECK(asdl_pattern_seq*, _PyPegen_get_patterns(p, items)),
rest->v.Name.id,
EXTRA) }
| '{' items=items_pattern ','? '}' {
_PyAST_MatchMapping(
CHECK(asdl_expr_seq*, _PyPegen_get_pattern_keys(p, items)),
CHECK(asdl_pattern_seq*, _PyPegen_get_patterns(p, items)),
NULL,
EXTRA) }
items_pattern[asdl_seq*]:
| items=','.key_value_pattern+ ','? { items }
key_value_pattern[KeyValuePair*]:
| key=(literal_pattern | value_pattern) ':' value=pattern {
_PyPegen_key_value_pair(p, key, value) }
| double_star_pattern
double_star_pattern[KeyValuePair*]:
| '**' value=capture_pattern { _PyPegen_key_value_pair(p, NULL, value) }

class_pattern[expr_ty]:
| func=name_or_attr '(' ')' { _PyAST_Call(func, NULL, NULL, EXTRA) }
| func=name_or_attr '(' args=positional_patterns ','? ')' {
_PyAST_Call(func, args, NULL, EXTRA) }
| func=name_or_attr '(' keywords=keyword_patterns ','? ')' {
_PyAST_Call(func, NULL, keywords, EXTRA) }
| func=name_or_attr '(' args=positional_patterns ',' keywords=keyword_patterns ','? ')' {
_PyAST_Call(func, args, keywords, EXTRA) }
positional_patterns[asdl_expr_seq*]:
| args[asdl_expr_seq*]=','.pattern+ { args }
keyword_patterns[asdl_keyword_seq*]:
| keywords[asdl_keyword_seq*]=','.keyword_pattern+ { keywords }
keyword_pattern[keyword_ty]:
| arg=NAME '=' value=pattern { _PyAST_keyword(arg->v.Name.id, value, EXTRA) }
| items=','.key_value_pattern+ { items }
key_value_pattern[KeyPatternPair*]:
| key=(literal_expr | attr) ':' pattern=pattern {
_PyPegen_key_pattern_pair(p, key, pattern) }
double_star_pattern[expr_ty]:
| '**' target=pattern_capture_target { target }

class_pattern[pattern_ty]:
| cls=name_or_attr '(' ')' {
_PyAST_MatchClass(cls, NULL, NULL, NULL, EXTRA) }
| cls=name_or_attr '(' patterns=positional_patterns ','? ')' {
_PyAST_MatchClass(cls, patterns, NULL, NULL, EXTRA) }
| cls=name_or_attr '(' keywords=keyword_patterns ','? ')' {
_PyAST_MatchClass(
cls, NULL,
CHECK(asdl_identifier_seq*, _PyPegen_map_names_to_ids(p,
CHECK(asdl_expr_seq*, _PyPegen_get_pattern_keys(p, keywords)))),
CHECK(asdl_pattern_seq*, _PyPegen_get_patterns(p, keywords)),
EXTRA) }
| cls=name_or_attr '(' patterns=positional_patterns ',' keywords=keyword_patterns ','? ')' {
_PyAST_MatchClass(
cls,
patterns,
CHECK(asdl_identifier_seq*, _PyPegen_map_names_to_ids(p,
CHECK(asdl_expr_seq*, _PyPegen_get_pattern_keys(p, keywords)))),
CHECK(asdl_pattern_seq*, _PyPegen_get_patterns(p, keywords)),
EXTRA) }
positional_patterns[asdl_pattern_seq*]:
| args[asdl_pattern_seq*]=','.pattern+ { args }
keyword_patterns[asdl_seq*]:
| keywords[asdl_seq*]=','.keyword_pattern+ { keywords }
Copy link
Member

@pablogsal pablogsal Apr 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can probably change this to just ','.keyword_pattern+. Same for similar stuff around this like | items=','.key_value_pattern+ { items }. But we can clean this afterwards, for sure.

Copy link
Contributor Author

@ncoghlan ncoghlan Apr 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this and PEP 642 were my first real ventures into working with the new parser. It's really nice, but it isn't always obvious when it will need help with typecasting and when it will be able to figure it on its own (e.g. I think this node used to be a more specific type than adsl_seq, so it would have needed the typecast)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this and PEP 642 were my first real ventures into working with the new parser. It's really nice, but it isn't always obvious when it will need help with typecasting and when it will be able to figure it on its own (e.g. I think this node used to be a more specific type than adsl_seq, so it would have needed the typecast)

I'm preparing a new documentation for working with the parser :)

How has been your experience so far? Any feedback (good and bad) is good feedback ;)

keyword_pattern[KeyPatternPair*]:
| arg=NAME '=' value=pattern { _PyPegen_key_pattern_pair(p, arg, value) }

return_stmt[stmt_ty]:
| 'return' a=[star_expressions] { _PyAST_Return(a, EXTRA) }
Expand Down
108 changes: 89 additions & 19 deletions Include/internal/pycore_ast.h

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading