Skip to content

Writing Semantic action rules

R. Bernstein edited this page Sep 20, 2017 · 4 revisions

The Earley Algorithm parser builds an Abstract Syntax Tree as it parses. Then in semantic analysis this tree is traversed to accomplish whatever action is needed. For example in uncompyle6 the action is to reproduce Python source code from a parse of (modified) CPython bytecode instructions. In our examples expr and expr2, the example programs, semantic analysis evaluates arithmetic expressions.

The semantic actions class in SPARK is usually derived from the GenericASTTraversal class. That class provides methods for pre- and post-order tree traversal. Most of the time pre-order traversal is used. Since pre-order traversal also allows for hooks to be run after all nodes complete, it can do a more than a strict pre-order traversal.

The way to indicate a semantic action to take on an GenericASTTraversal nonterminal is to create a method in the subclass of GenericaASTTraversal and preface the nonterminal name with n_. For example to have a semantic action for nonterminal stmt, you would define method n_stmt.

After a while, you'll probably find many routines which do similar sorts of things. For example, in the Python 2 grammar there are grammar rules for continue, pass, and break that do about the same thing: they should start on a new line because they start a new statement, they print their corresponding reserved word, and end the line since they end that statement.

In uncompyle6, we often have a tree node which should descend n-1 of its children in order and separate the string fragment results from each with a comma and a space. The last node is usually an operation that is performed on prior instructions, and so that is usually handled by the grammar rule itself.

Coming back to the Python 2 grammar, you could write for the pass statement rule:

def n_pass(self, n):
  self.write(self.indent)  # start a new line
  self.writeln('pass') # write "pass" and finish line

and do the corresponding thing for continue and break.

To reduce tedium, what's generally done instead is to put rules into a table and have C-style sprintf format specifiers to indicate these common kinds of actions to take. Doing this in the Python2 grammar we instead have rules:

TABLE_DIRECT = {
    'break_stmt':       ( '%|break\n', ),
    'continue_stmt':    ( '%|continue\n', ),
    'pass_stmt':        ( '%|pass\n', ),
    ...
    }

The %| format specifier indicates starting on a new line.

The kinds of specifiers is very much tied to how one write the grammar and what kind of activity is needed. For example in uncompyle6 I often need to copy attributes from one node over to one of its siblings when I need be able to give a deparse by program offset. That kind of operation however is not needed when deparsing the entire program, so no offset information needs to be stored.

The method template_engine in the semantic rules is where we define such table-driven rules.

See also Table-driven Semantic Actions from the uncompyle6 documentation for a more description full-blown system of table-directed semantic actions.

Clone this wiki locally