CS 280Data Structures Professor John Peterson
Project “Tree 1” Questions? Must be in by Wednesday – solutions will be posted after class Tree 2 is almost ready – there’s some simple stuff to do in the wiki already.
Parse Trees One of the big deals in computer science is context free languages – we use these to create recursive structures (trees) from linear ones (strings or sequences). There is a whole lot of theory underneath – we’ll skip most of it and concentrate on the practical stuff.
The Problem Given: • A sequence of tokens • A grammar that gives structure to these tokens Produce: • A parse tree that covers the sequence
Grammars • Names: the left side of a production is a name – this name can be used in other productions • Constants: specific pieces of the underlying token-level language • Sequence: x y means that y follows x • Choice: (x | y) means either x or y may appear here • Optionals: [x] means x may appear here • Repetition: (x)* means that an arbitrary number of x’s are repeated
Example: Java Tokens: a = a + b * c; Grammar: statement = assignment assignment = var‘=‘addexp ‘;’ addexp = mulexp (‘+’mulexp)* mulexp = aexp (‘*’aexp)* aexp = var | num | ‘(‘addexp‘)’
How Does this Work? You need to know where to start (“statement”) This grammar is constructed so that you can always decide what to do based on the next token (peek). When you have a choice, always go as far as possible. If you get to a place where the current token doesn’t fit into the grammar, you have a “parse error”.
Parsing Theory Not all grammars are “easy” to parse Grammars can handle things like operator precedence Grammars can be ambiguous – we’ll avoid these The grammar “inverts” the recursive “print” for a datatype. There are other ways to represent the same thing – railroad diagrams.
Examples Let’s create syntax rules for other Java constructs.
Parsing We’re going to use the simplest method of parsing: recursive descent Each production becomes a function which processes some tokens from the input stream and returns some value. In general, we have to turn the abstract tree defined by the grammar into a concrete data object. So we need to figure out how to represent each different kind of object associated with the productions.
A Simple Programming Language All we need are the following: • Definitions of names, like x = 2 • Function calls, like f(2, 3) or x+y • Functions, like f(x) = x + 1 • Built-in functions, like + or sqrt • Simple data structures – we’ll use tuples, as notated (x, y, z). These represent things like points, colors, whatever!
An Evaluation Tree To create a tree that corresponds to an executable program, we need to figure out what sort of tree nodes are needed. • Constants: data (like 1.2) and functions (like +) • Variable references: a • Function calls: the function is a variable / constant. • Definitions, like f(x) = y We’ll wrap these up into an abstract class
Using an Abstract Class Big idea: create an abstract class which is extended into specific concrete classes. The abstract class is a name that can stand for any of the concrete subclasses Placing virtual methods in the abstract class indicates that the subclasses must implement these methods. This is the SAME as an interface! Advantage: we can place default method definitions / operations in the abstract class