abstract syntax l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Abstract Syntax PowerPoint Presentation
Download Presentation
Abstract Syntax

Loading in 2 Seconds...

play fullscreen
1 / 109

Abstract Syntax - PowerPoint PPT Presentation


  • 337 Views
  • Uploaded on

Abstract Syntax. CMSC CS431. Abstract Syntax Trees. So far a parser traces the derivation of a sequence of tokens The rest of the compiler needs a structural representation of the program Abstract syntax trees Like parse trees but ignore some details Abbreviated as AST.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Abstract Syntax' - hera


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
abstract syntax

Abstract Syntax

CMSC

CS431

abstract syntax trees
Abstract Syntax Trees
  • So far a parser traces the derivation of a sequence of tokens
  • The rest of the compiler needs a structural representation of the program
  • Abstract syntax trees
    • Like parse trees but ignore some details
    • Abbreviated as AST
abstract syntax tree cont
Abstract Syntax Tree. (Cont.)
  • Consider the grammar

E  int | ( E ) | E + E

  • And the string

5 + (2 + 3)

  • After lexical analysis (a list of tokens)

int5 ‘+’ ‘(‘ int2 ‘+’ int3 ‘)’

  • During parsing we build a parse tree …
evaluation of semantic rules
Evaluation of semantic rules
  • Parse-tree methods (compile time)
    • Build a parse tree for each input
    • Build a dependency graph from the parse tree
    • Obtain evaluation order from a topological order of the dependency graph
  • Rule-based methods (compiler-construction time)
    • Predetermine the order of attribute evaluation for each production
  • Oblivious methods (compiler-construction time)
    • Evaluation order is independent of semantic rules
    • Evaluation order forced by parsing methods
    • Restrictive in acceptable attribute definitions
compile time semantic evaluation
Compile-time semantic evaluation

Source Program

Lexical Analyzer

Program input

Tokens

Syntax Analyzer

Parse tree /

Abstract syntax tree

Semantic Analyzer

Results

interpreters

compilers

Intermediate Code Generator

Attributed AST

Code Optimizer

Code Generator

Target Program

abstract syntax vs concrete syntax
Abstract Syntax vs. Concrete Syntax
  • Concrete syntax: the syntax programmers write
    • Example: different notations of expressions
      • Prefix + 5 * 15 20
      • Infix 5 + 15 * 20
      • Postfix 5 15 20 * +
  • Abstract syntax: the syntax recognized by compilers
    • Identifies only the meaningful components
      • The operation
      • The components of the operation

e

Parse Tree for

5+15*20

Abstract Syntax Tree for 5 + 15 * 20

e

e

+

+

e

*

e

5

*

5

20

20

15

15

abstract syntax trees7
Abstract syntax trees
  • Condensed form of parse tree for representing language constructs
    • Operators and keywords do not appear as leaves
      • They define the meaning of the interior (parent) node
    • Chains of single productions may be collapsed

S

If-then-else

B

THEN

S1

ELSE

S2

IF

B

S1

S2

E

+

+

T

E

3

5

5

T

3

constructing ast
Constructing AST
  • Use syntax-directed definitions
    • Problem: construct an AST for each expression
    • Attribute grammar approach
      • Associate each non-terminal with an AST
        • Each AST: a pointer to a node in AST

E.nptr T.nptr

    • Definitions: how to compute attribute?
      • Bottom-up: synthesized attribute

if we know the AST of each child, how to compute the AST of the parent?

Grammar:

E ::= E + T | E – T | T

T ::= (E) | id | num

constructing ast for expressions by hand
Constructing AST for expressions (by Hand)
  • Associate each non-terminal with an AST
    • E.nptr, T.nptr: a pointer to ASTtree
  • Synthesized attribute definition:
    • If we know the AST of each child, how to compute the AST of the parent?
example constructing ast
Example: constructing AST

Bottom-up parsing: evaluate attribute at each reduction

1. reduce 5 to T1 using T::=num:

T1.nptr = leaf(5)

2. reduce T1 to E1 using E::=T:

E1.nptr = T1.nptr = leaf(5)

3. reduce 15 to T2 using T::=num:

T2.nptr=leaf(15)

4. reduce T2 to E2 using E::=T:

E2.nptr=T2.nptr = leaf(15)

5. reduce b to T3 using T::=num:

T3.nptr=leaf(b)

6. reduce E2-T3 to E3 using E::=E-T:

E3.nptr=node(‘-’,leaf(15),leaf(b))

7. reduce (E3) to T4 using T::=(E):

T4.nptr=node(‘-’,leaf(15),leaf(b))

8. reduce E1+T4 to E5 using E::=E+T:

E5.nptr=node(‘+’,leaf(5),

node(‘-’,leaf(15),leaf(b)))

Parse tree for 5+(15-b)

E5

E1

+

T4

T1

(

E3

)

E2

-

T3

5

T2

b

15

implementing ast in c
Implementing AST in C

E ::= E + T | E – T | T

T ::= (E) | id | num

  • Define different kinds of AST nodes
    • typedef enum {PLUS, MINUS, ID, NUM} ASTNodeTag;
  • Define AST node

typedef struct ASTnode {

AstNodeTag kind;

union { symbol_table_entry* id_entry;

int num_value;

struct ASTnode* opds[2];

} description;

};

  • Define AST node construction routines
    • ASTnode* mkleaf_id(symbol_table_entry* e);
    • ASTnode* mkleaf_num(int n);
    • ASTnode* mknode_plus(struct ASTnode* opd1, struct ASTNode* opd2);
    • ASTnode* mknode_minus(struct ASTnode* opd1, struct ASTNode* opd2);

Grammar:

implementing ast in java
Implementing AST in Java

E ::= E + T | E – T | T

T ::= (E) | id | num

  • Define different kinds of AST nodes
    • typedef enum {PLUS, MINUS, ID, NUM} ASTNodeTag;
  • Define AST node

class ASTexpression {

public ASTNodeTag kind();

};

class ASTidentifier inherit ASTexpression { private symbol_table_entry* id_entry; … }

class ASTvalue inherit ASTexpression { private int num_value; … }

class ASTplus inherit ASTexpression { private ASTnode* opds[2]; … }

Class ASTminus inherit ASTexpression { private ASTnode* opds[2]; ... }

  • Define AST node construction routines
    • ASTexpression* mkleaf_id(symbol_table_entry* e);
    • ASTexpression* mkleaf_num(int n);
    • ASTexpression* mknode_plus(struct ASTnode* opd1, struct ASTNode* opd2);
    • ASTexpression* mknode_minus(struct ASTnode* opd1, struct ASTNode* opd2);

Grammar:

more asts
More ASTs

Abstract syntax:

S::= if-else E S S | while E S | E | ε

E::= var | num | true | false | E bop E | uop E

bop ::= < | <= | > | >= | && | = | + | * | ….

uop ::= - | * | & | …

class ASTstmt {…}

class ASTifElse inherit ASTstmt {

private ASTexpr* cond;

private ASTstmt* tbranch;

private ASTstmt* fbranch; …}

class ASTwhile inherit ASTstmt {

private ASTexpr* cond;

private ASTstmt* body;…}

class ASTexpr inherit ASTstmt {…}

class ASTvar inherit ASTexpr {…}

Abstract syntax tree

if-else

<

while

a

b

=

<

*

b

a

100

2

a

ast covered
AST Covered
  • We built AST by hand in the 1st Project
  • Lets start to see about an automated approach.
  • Lets see what the Galles text (see slide 15-42) comes to say about AST
  • Later we will look at some code
semantic actions an example
Semantic Actions: An Example
  • Consider the grammar

E  int | E + E | ( E )

  • For each symbol X define an attribute X.val
    • For terminals, val is the associated lexeme
    • For non-terminals, val is the expression’s value (and is computed from values of subexpressions)
  • We annotate the grammar with actions:

E  int { E.val = int.val }

| E1 + E2 { E.val = E1.val + E2.val }

| ( E1 ) { E.val = E1.val }

semantic actions an example cont
Semantic Actions: An Example (Cont.)
  • String: 5 + (2 + 3)
  • Tokens: int5 ‘+’ ‘(‘ int2 ‘+’ int3 ‘)’

Productions Equations

E  E1 + E2 E.val = E1.val + E2.val

E1 int5 E1.val = int5.val = 5

E2 ( E3) E2.val = E3.val

E3 E4 + E5 E3.val = E4.val + E5.val

E4  int2 E4.val = int2.val = 2

E5 int3 E5.val = int3.val = 3

semantic actions notes
Semantic Actions: Notes
  • Semantic actions specify a system of equations
    • Order of resolution is not specified
  • Example:

E3.val = E4.val + E5.val

    • Must compute E4.val and E5.val before E3.val
    • We say that E3.val depends on E4.val and E5.val
  • The parser must find the order of evaluation
dependency graph
Dependency Graph

+

E

  • Each node labeled E has one slot for the val attribute
  • Note the dependencies

E2

E1

+

int5

5

(

E3

+

)

+

E4

E5

2

int2

int3

3

evaluating attributes
Evaluating Attributes
  • An attribute must be computed after all its successors in the dependency graph have been computed
    • In previous example attributes can be computed bottom-up
  • Such an order exists when there are no cycles
    • Cyclically defined attributes are not legal
semantic actions notes cont
Semantic Actions: Notes (Cont.)
  • Synthesized attributes
    • Calculated from attributes of descendents in the parse tree
    • E.val is a synthesized attribute
    • Can always be calculated in a bottom-up order
  • Grammars with only synthesized attributes are called S-attributed grammars
    • Most frequent kinds of grammars
semantic actions top down approach
Semantic Actions :Top-down Approach
  • Recursive-descent interpreter
  • Consider this grammar

S -> E $

E -> T E’ E’-> +T E’ E’ -> - T E’ E->

T -> F T’ T’ -> * F T’ T’ -> / F T’ T’ ->

F -> id F -> num F -> ( E )

  • Needs “type” of non-terminals and tokens
recursive descent interpreter
Recursive-descent interpreter

int T() { switch (tok.kind) {

case ID: case NUM: case LPAREN

return Tprime( F() );

default:print(“expected ID, NUM, or left-paren”);

skipto(T_follow); return 0; }}

int Tprime(int a) {switch (tok.kind) {

case TIMES: eat(TIMES); return Tprime(a*F());

case DIVIDE: eat(DIVIDE); return Tprime(a/F());

case PLUS: case MINUS: case RPAREN: case EOF:

return a;

default: /* error handling */ …… }}

javacc version
JavaCC version
  • Grammar

S -> E $

E -> T ( + T | - T)*

T -> F ( * F | - F)*

F -> id | num | ( E )

Note:

E –> T E’ E’ -> + T E’ | - T E’ | e

javacc version24
JavaCC version –

void Start() :

{ int i; }

{ i=Exp() <EOF> {System.out.println(i); }

}

int Exp() :

{ int a, i; }

{ a=Term() ( “+” i=Term() { a=a+i; }

| “-” i=Term() { a=a+i; } )*

{ return a; }

}

Int Factor() :

{ Token t; int i; }

{ t = <IDENTIFIER > {return lookup(t.image); }

| t=<INTEGER_LITERAL> {return Integer.parseInt(t.image);}

| “(“ i=Exp() “)” {return i; }

}

semantic actions reduce and shift
Semantic Actions – Reduce and Shift
  • We can now illustrate how semantic actions are implemented for LR parsing
  • Keep attributes on the stack
  • On shift a, push attribute for a on stack
  • On reduce X ® a
    • pop attributes for a
    • compute attribute for X
    • and push it on the stack
performing semantic actions example
Performing Semantic Actions. Example
  • Recall the example from previous lecture

E ® T + E1 { E.val = T.val + E1.val }

| T { E.val = T.val }

T ® int * T1 { T.val = int.val * T1.val }

| int { T.val = int.val }

  • Consider the parsing of the string 3 * 5 + 8
performing semantic actions example27
Performing Semantic Actions. Example

|int * int + int shift

int3| * int + int shift

int3 * | int + int shift

int3 * int5| + int reduce T ® int

int3 * T5| + int reduce T ® int * T

T15| + int shift

T15 + | int shift

T15 + int8| reduce T ® int

T15 + T8|reduce E ® T

T15 + E8|reduce E ® T + E

E23|accept

inherited attributes
Inherited Attributes
  • Another kind of attribute
  • Calculated from attributes of parent and/or siblings in the parse tree
  • Example: a line calculator
a line calculator
A Line Calculator
  • Each line contains an expression

E  int | E + E

  • Each line is terminated with the = sign

L  E = | + E =

  • In second form the value of previous line is used as starting value
  • A program is a sequence of lines

P   | P L

attributes for the line calculator
Attributes for the Line Calculator
  • Each E has a synthesized attribute val
    • Calculated as before
  • Each L has a synthesized attribute val

L  E = { L.val = E.val }

| + E = { L.val = E.val + L.prev }

  • We need the value of the previous line
  • We use an inherited attributeL.prev
attributes for the line calculator cont
Attributes for the Line Calculator (Cont.)
  • Each P has a synthesized attribute val
    • The value of its last line

P   { P.val = 0 }

| P1 L { P.val = L.val;

L.prev = P1.val }

    • Each L has an inherited attribute prev
    • L.prev is inherited from sibling P1.val
  • Example …
example of inherited attributes
Example of Inherited Attributes

P

  • val synthesized
  • prev inherited
  • All can be computed in depth-first order

+

L

P

=

+

E3

+

0

+

E4

E5

2

int2

int3

3

semantic actions notes cont33
Semantic Actions: Notes (Cont.)
  • Semantic actions can be used to build ASTs
  • And many other things as well
    • Also used for type checking, code generation, …
  • Process is called syntax-directed translation
    • Substantial generalization over CFGs
constructing an ast
Constructing An AST
  • We first define the AST data type
    • Supplied by us for the project
  • Consider an abstract tree type with two constructors:

n

mkleaf(n)

=

PLUS

mkplus(

,

) =

T1

T2

T1

T2

constructing a parse tree
Constructing a Parse Tree
  • We define a synthesized attribute ast
    • Values of ast values are ASTs
    • We assume that int.lexval is the value of the integer lexeme
    • Computed using semantic actions

E  int E.ast = mkleaf(int.lexval)

| E1 + E2 E.ast = mkplus(E1.ast, E2.ast)

| ( E1 ) E.ast = E1.ast

parse tree example

PLUS

PLUS

5

2

3

Parse Tree Example
  • Consider the string int5 ‘+’ ‘(‘ int2 ‘+’ int3 ‘)’
  • A bottom-up evaluation of the ast attribute:

E.ast = mkplus(mkleaf(5),

mkplus(mkleaf(2), mkleaf(3))

review
Review
  • We can specify language syntax using CFG
  • A parser will answer whether s  L(G)
  • … and will build a parse tree
  • … which we convert to an AST
  • … and pass on to the rest of the compiler
abtract parse trees expression grammar
Abtract Parse Trees : Expression Grammar

Abstract Syntax

E -> E + E

E -> E – E

E -> E * E

E -> E / E

E -> id

E -> num

ast node types
AST : Node types

public abstract class Exp {

public abstract int eval():

}

public class PlusExp extends Exp {

private Exp e1, e2;

public PlusExp(Exp a1, Exp a2) { e1=a1; d2=a2; }

public int eval() {

return e1.eval()+e2.eval():

}

}

public class Identifier extends Exp {private String f0;

public Indenfifier(String n0) { f0 = n0; }

public int eval() {

return lookup(f0);

}

}

public class IntegerLiteral extends Exp {private String f0;

public IntegerLiteral(String n0) { f0 = n0; }

public int eval() {

return Integer.parseInt(f0);

}

}

javacc example for ast construction
JavaCC Example for AST construction

Exp Start() :

{ Exp e; }

{ e=Exp() { return e; }}

Exp Exp() :

{ Exp e1, e2; }

{ e1=Term() ( “+” e2=Term() { e1=new PlusExp(e1,e2); }

| “-” e2=Term() { e1=new MinusExp(e1,e2); } )*

{ return a; }

}

Exp Factor() :

{ Token t; Exp e; }

{ t = <IDENTIFIER > {return new Identifier(t.image); }

| t=<INTEGER_LITERAL>

{return new IntegerLiteral(t.image);}

| “(“ e=Exp() “)” {return e; }

}

positions
Positions
  • Must remember the position in the source file
    • Lexical analysis, parsing and semantic analysis are not done simultaneously.
    • Necessary for error reporting
  • AST must keep the pos fields, which indicate the position within the original source file.
  • Lexer must pass the information to the parser.
  • Ast node constructors must be augmented to init the pos fields.
javacc class token
JavaCC : Class Token
  • Each Token object has the following fields:
    • int kind;
    • int beginLine, beginColumn, endLine, endColumn;
    • String image;
    • Token next;
    • Token specialToken;
    • static final Token newToken(int ofKind);
  • Unfortunately, ….
visitors
Visitors
  • “syntax separate from interpretation “style of programming
    • Vs. object-oriented style of programming
  • “Visitor pattern”
    • Visitor implements an interpretation.
    • Visitor object contains a visit method for each syntax-tree class.
    • Syntax-tree classes contain “accept” methods.
    • Visitor calls “accept”(what is your class?). Then “accept” calls the “visit” of the visitor.
example expression classes
Example :Expression Classes

public abstract class Exp {

public abstract int accept(Visitor v):

}

public class PlusExp extends Exp {

private Exp e1, e2;

public PlusExp(Exp a1, Exp a2) { e1=a1; d2=a2; }

public int accept(Visitor v) { return v.visit(this) ; }

}

public class Identifier extends Exp {private String f0;

public Indenfifier(String n0) { f0 = n0; }

public int accept(Visitor v) { return v.visit(this) ; }

}

public class IntegerLiteral extends Exp {private String f0;

public IntegerLiteral(String n0) { f0 = n0; }

public int accept(Visitor v) { return v.visit(this) ; }

}

an interpreter visitor
An interpreter visitor

public interface Visitor {

public int visit(PlusExp n);

public int visit(Identifier n);

public int visit(IntegerLiteral n);

}

public class Interpreter implements Visitor {

public int visit(PlusExp n) {

return n.e1.accept(this) + n.e2.accept(this);

}

public int visit(Identifier n) {

return looup(n.f0);

}

public int visit(IntegerLiteral n) {

return Integer.parseInt(n.f0);

}

abstract syntax for minijava i
Abstract Syntax for MiniJava (I)

Package syntaxtree;

Program(MainClass m, ClassDecList c1)

MainClass(Identifier i1, Identifier i2, Statement s)

----------------------------

abstract class ClassDecl

ClassDeclSimple(Identifier i, VarDeclList vl,

methodDeclList m1)

ClassDeclExtends(Identifier i, Identifier j,

VarDecList vl, MethodDeclList ml)

-----------------------------

VarDecl(Type t, Identifier i)

MethodDecl(Type t, Identifier I, FormalList fl,

VariableDeclList vl, StatementList sl, Exp e)

Formal(Type t, Identifier i)

abstract syntax for minijava ii
Abstract Syntax for MiniJava (II)

abstract class type

IntArrayType()

BooleanType()

IntegerType()

IndentifierType(String s)

---------------------------

abstract class Statement

Block(StatementList sl)

If(Exp e, Statement s1, Statement s2)

While(Exp e, Statement s)

Print(Exp e)

Assign(Identifier i, Exp e)

ArrayAssign(Identifier i, Exp e1, Exp e2)

-------------------------------------------

abstract syntax for minijava iii
Abstract Syntax for MiniJava (III)

abstract class Exp

And(Exp e1, Exp e2) LessThan(Exp e1, Exp e2)

Plus(Exp e1, Exp e2) Minus(Exp e1, Exp e2)

Times(Exp e1, Exp e2) Not(Exp e)

ArrayLookup(Exp e1, Exp e2) ArrayLength(Exp e)

Call(Exp e, Identifier i, ExpList el)

IntergerLiteral(int i)

True() False()

IdentifierExp(String s)

This()

NewArray(Exp e) NewObject(Identifier i)

-------------------------------------------------

Identifier(Sting s)

--list classes-------------------------

ClassDecList() ExpList() FormalList() MethodDeclList()

StatementLIst() VarDeclList()

syntax tree nodes details
Syntax Tree Nodes - Details

package syntaxtree;

import visitor.Visitor;

import visitor.TypeVisitor;

public class Program {

public MainClass m;

public ClassDeclList cl;

public Program(MainClass am, ClassDeclList acl) {

m=am; cl=acl;

}

public void accept(Visitor v) {

v.visit(this);

}

public Type accept(TypeVisitor v) {

return v.visit(this);

}

}

classdecl java
ClassDecl.java

package syntaxtree;

import visitor.Visitor;

import visitor.TypeVisitor;

public abstract class ClassDecl {

public abstract void accept(Visitor v);

public abstract Type accept(TypeVisitor v);

}

classdeclextends java
ClassDeclExtends.java

package syntaxtree;

import visitor.Visitor;

import visitor.TypeVisitor;

public class ClassDeclExtends extends ClassDecl {

public Identifier i;

public Identifier j;

public VarDeclList vl;

public MethodDeclList ml;

public ClassDeclExtends(Identifier ai, Identifier aj,

VarDeclList avl, MethodDeclList aml) {

i=ai; j=aj; vl=avl; ml=aml;

}

public void accept(Visitor v) {

v.visit(this);

}

public Type accept(TypeVisitor v) {

return v.visit(this);

}

}

statementlist java
StatementList.java

package syntaxtree;

import java.util.Vector;

public class StatementList {

private Vector list;

public StatementList() {

list = new Vector();

}

public void addElement(Statement n) {

list.addElement(n);

}

public Statement elementAt(int i) {

return (Statement)list.elementAt(i);

}

public int size() {

return list.size();

}

}

package visitor visitor java
Package Visitor/visitor.java

package visitor;

import syntaxtree.*;

public interface Visitor {

public void visit(Program n); public void visit(MainClass n);

public void visit(ClassDeclSimple n); public void visit(ClassDeclExtends n);

public void visit(VarDecl n); public void visit(MethodDecl n);

public void visit(Formal n); public void visit(IntArrayType n);

public void visit(BooleanType n); public void visit(IntegerType n);

public void visit(IdentifierType n); public void visit(Block n);

public void visit(If n); public void visit(While n);

public void visit(Print n); public void visit(Assign n);

public void visit(ArrayAssign n); public void visit(And n);

public void visit(LessThan n); public void visit(Plus n);

public void visit(Minus n); public void visit(Times n);

public void visit(ArrayLookup n); public void visit(ArrayLength n);

public void visit(Call n); public void visit(IntegerLiteral n);

public void visit(True n); public void visit(False n);

public void visit(IdentifierExp n); public void visit(This n);

public void visit(NewArray n); public void visit(NewObject n);

public void visit(Not n); public void visit(Identifier n);

}

x y m 1 4 5
X = y.m(1,4+5)

Statement -> AssignmentStatement

AssignmentStatement -> Identfier1 “=“ Expression

Identifier1 -> <IDENTIFIER>

Expression -> Expression1 “.” Identifier2 “(“ ( ExpList)? “)”

Expression1 -> IdentifierExp

IdentifierExp -> <IDENTIFIER>

Identifier2 -> <IDENTIFIER>

ExpList -> Expression2 ( “,” Expression3 )*

Expression2 -> <INTEGER_LITERAL>

Expression3 -> PlusExp -> Expression “+” Expression

-> <INTEGER_LITERAL> , <INTEGER_LITERAL>

slide55
AST

Statement s ->

Assign (Identifier,Exp)

Identifier(“x”)

Call(Exp,Identifier,ExpList)

init

IdentifierExp(“y”)

Identifier(“m”)

ExpList e1

add

IntegerLiteral(1)

add

Plus(Exp,Exp)

(IntegerLiteral(5)

IntegerLiteral(4)

minijava grammar i
MiniJava : Grammar(I)

Program -> MainClassClassDecl *

Program(MainClass, ClassDeclList)

Program Goal() :

{ MainClass m; ClassDeclList cl = new ClassDeclList();

ClassDecl c;

}

{ m = MainClass() (c = ClassDecl() {cl.addElement(c);})*

<EOF> {return new Program(m,cl)

}

minijava grammar ii
MiniJava : Grammar(II)

MainClass ->classid { publicstaticvoidmain ( String [] id )  

      { Statement } }

MainClass(Identifier, VarDeclList)

ClassDecl -> classid { VarDecl * MethodDecl * }

->classidextendsid { VarDecl* MethodDecl * }

ClassDeclSimple(…), ClassDecExtends(…)

VarDecl->Type id;

VarDecl(Type, Identifier)

MethodDecl -> publicType id ( FormalList )

       { VarDecl * Statement* return Exp ; }

MethodDecl(Type,Identifier,FormalList,VarDeclList

StaementList, Exp)

minijava grammar iii
MiniJava : Grammar(III)

FormalList -> Type idFormalRest *

->

FormalRest -> , Type id

Type ->int []

->   boolean

-> int

-> id

minijava grammar iv
MiniJava : Grammar(IV)

Statement -> { Statement * }

-> if ( Exp ) Statement else Statement

->  while ( Exp ) Statement

-> System.out.println ( Exp );

-> id= Exp ;

->  id [ Exp ]= Exp ;

ExpList -> Exp ExpRest *

->

ExpRest -> , Exp

minijava grammar v
MiniJava : Grammar(V)

Exp -> Exp op Exp

->Exp [ Exp ]

-> Exp . length

->   Exp . Id ( ExpList )

-> INTEGER_LITERAL

-> true

-> false

-> id

-> this

-> newint[ Exp ]

->newid( )

->   !Exp

->( Exp )

r eferences
References
  • Andrew W. Appel, Modern Compiler Implementation in Java (2nd Edition), Cambridge University Press, 2002
  • http://compiler.kaist.ac.kr/courses/cs420/classtps/Chapter05.pps
  • Modern Compiler Design, Scott Galles, Scott Jones
  • http://www.cs.utsa.edu/~qingyi/cs4713/handouts/AbstractSyntaxTree.ppt
using java tree builder

Using Java Tree Builder

CMSC 431

Shon Vick

lecture outline
Lecture Outline
  • Introduction
  • Syntax Directed Translation
  • Java Virtual Machine
  • Examples
  • Administration
introduction
Introduction
  • The Java Tree Builder (JTB) is a tool used to automatically generate syntax trees with the Java Compiler Compiler (JavaCC) parser generator. 
  • It’s based on the Visitor design pattern please see the section entitled
  • Why Visitors?
why visitors
Why Visitors?
  • The Visitor pattern is one among many design patterns aimed at making object-oriented systems more flexible
  • The issue addressed by the Visitor pattern is the manipulation of composite objects.
  • Without visitors, such manipulation runs into several problems as illustrated by considering an implementation of integer lists, written in Java
integer lists written in java without generics
Integer lists, written in Java(Without Generics)

interface List {}

class Nil implements List {}

class Cons implements List {

int head;

List tail;

}

What happens when we

write a program which computes the sum of all components of a given List object?

first attempt instanceof and type casts
First Attempt: Instanceof and Type Casts

List l;

// The List-object we are working on.

int sum = 0;

// Contains the sum after the loop.

boolean proceed = true;

while (proceed) {

if (l instanceof Nil)

proceed = false;

else if (l instanceof Cons) {

sum = sum + ((Cons) l).head; // Type cast!

l = ((Cons) l).tail; // Type cast!

}

}

What are the

problems here?

what are the problems here
What are the problems here?
  • Type Casts?
    • We want static (compile time) type checking
  • Flexible?
    • Probably not well illustrated with this example
second attempt dedicated methods
Second Attempt: Dedicated Methods

interface List {

int sum();

}

class Nil implements List {

public int sum() { return 0; }

}

class Cons implements List {

int head;

List tail;

public int sum() {

return head + tail.sum();

}

}

tradeoffs
Tradeoffs
  • Can compute the sum of all components of a given List-object l by writing l.sum().
  • Advantage: type casts and instanceof operations have disappeared, and that the code can be written in a systematic way.
  • Disadvantage: Every time we want to perform a new operation on List-objects, say, compute the product of all integer parts, then new dedicated methods have to be written for all the classes, and the classes must be recompiled
third attempt the visitor pattern
Third Attempt: The Visitor Pattern.

interface List {

void accept(Visitor v);

}

class Nil implements List {

public void accept(Visitor v) { v.visitNil(this); }

}

class Cons implements List {

int head;

List tail;

public void accept(Visitor v) { v.visitCons(this);

} }

second part of visitor idea
Second Part of Visitor Idea

interface Visitor {

void visitNil(Nil x);

void visitCons(Cons x);

}

class SumVisitor implements Visitor {

int sum = 0;

public void visitNil(Nil x) {}

public void visitCons(Cons x){

sum = sum + x.head;

x.tail.accept(this); } }

summary
Summary
  • Each accept method takes a visitor as argument.
  • The interface Visitor has a header for each of the basic classes.
  • We can now compute and print the sum of all components of a given List-object l by writing

SumVisitor sv = new SumVisitor();

l.accept(sv);

System.out.println(sv.sum);

summary continued
Summary Continued
  • The advantage is that one can write code that manipulates objects of existing classes without recompiling those classes.
  • The price is that all objects must have an accept method.
  • In summary, the Visitor pattern combines the advantages of the two other approaches
summary table

Frequent

type casts?

Frequent

recompilation?

Instanceof and type casts

Yes

No

Dedicated methods

No

Yes

The Visitor pattern

No

No

Summary Table
overview of generated files
Overview of Generated Files
  • To begin using JTB, simply run it using your grammar file as an argument
    • Run JTB without any argumentsfor list. 
  • This will generate an augmented grammar file, as well as the needed classes
details
Details
  • jtb.out.jj, the original grammar file, now with syntax tree building actions inserted
  • The subdirectory/package syntaxtree which contains a java class for each production in the grammar
  • The subdirectory/package visitor which contains Visitor.java, the default visitor interface, also
    • DepthFirstVisitor.java, a default implementation which visits each node of the tree in depth-first order. 
  • ObjectVisitor.java, another default visitor interface that supports return value and argument.
    • ObjectDepthFirst.java is a defualt implemetation of ObjectVisitor.java.
general instructions
General Instructions
  • To generate your parser, simply run JavaCC using jtb.out.jj as the grammar file. 
  • Let's take a look at all the files and directories JTB generates. 
the grammar file
The grammar file
  • Named jtb.out.jj 
  • This file is the same as the input grammar file except that it now contains code for building the syntax tree during parse. 
  • Typically, this file can be left alone after generation. 
  • The only thing that needs to be done to it is to run it through JavaCC to generate your parser
the syntax tree node classes
The syntax tree node classes
  • This directory contains syntax tree node classes generated based on the productions in your JavaCC grammar. 
  • Each production will have its own class.  If your grammar contains 42 productions, this directory will contain 42 classes (plus the special automatically generated nodes--these will be discussed later), with names corresponding to the left-hand side names of the productions. 
  • Like jtb.out.jj, after generation these files don't need to be edited.  Generate them once, compile them once, and forget about them
example
Example
  • Let's examine one of the classes generated from a production.  Take, for example, the following production

void ImportDeclaration() :

{}

{ "import" Name() [ "." "*" ] ";“

}

what gets produced part 1
What gets produced?Part 1

// Generated by JTB 1.1.2 //

package syntaxtree;

/**

* Grammar production:

* f0 -> "import"

* f1 -> Name()

* f2 -> [ "." "*" ]

* f3 -> ";"

*/

public class ImportDeclaration implements Node {

public NodeToken f0;

public Name f1;

public NodeOptional f2;

public NodeToken f3;

All parts of a production

are represented in the tree,

including tokens.

the syntax tree classes
The Syntax Tree Classes
  • Notice the package "syntaxtree". 
  • The purpose of separating the generated tree node classes into their own package is that it greatly simplifies file organization, particularly when the grammar contains a large number of productions. 
  • It’s often not necessary to pay the syntax classes any more attention.  All of the work is to done to the visitor classes. 
  • Note that this class implements an interface named Node.   
slide86
Node
  • The interface Node is implemented by all syntax tree nodes. Node looks like this: 

public

interface Node extends java.io.Serializable {

public void accept(visitor.Visitor v);

public Object accept(visitor.ObjectVisitor v,

Object argu);

}

nodes and accept
Nodes and Accept
  • All tree node classes implement the accept() method. 
    • In the case of all the automatically-generated classes, the accept() method simply calls the corresponding visit(XXXX n) (where XXXX is the name of the production) method of the visitor passed to it.  
    • Note that the visit() methods are overloaded, i.e. the distinguishing feature is the argument each takes, as opposed to its name. 
two new features
Two New Features
  • Two features presented in JTB 1.2 may be helpful
    •   The first is that Node extends java.io.Serializable, meaning that you can now serialize your trees (or subtrees) to an output stream and read them back in. 
    • Secondly, there is one accept() method that can take an extra argument and return a value. 
what gets produced part 189
What gets produced?Part 1

// Generated by JTB 1.1.2 //

package syntaxtree;

/**

* Grammar production:

* f0 -> "import"

* f1 -> Name()

* f2 -> [ "." "*" ]

* f3 -> ";"

*/

public class ImportDeclaration implements Node {

public NodeToken f0;

public Name f1;

public NodeOptional f2;

public NodeToken f3;

All parts of a production

are represented in the tree,

including tokens.

nodelistinterface
NodeListInterface
  • The interface NodeListInterface is implemented by NodeList, NodeListOptional, and NodeSequence.   NodeListInterface looks like this: 

public interface NodeListInterface extends Node {

public void addNode(Node n);   

public Node elementAt(int i);   

public java.util.Enumeration elements();   

public int size();

}

details91
Details
  • Interface not generally needed but can be useful when writing code which only deals with the Vector-like functionality of any of the three classes listed above. 
    • addNode() is used by the tree-building code to add nodes to the list. 
    • elements() is similar to the method of the same name in Vector, returning an Enumeration of the elements in the list.
    • elementAt() returns the node at the ith position in the list (starting at 0, naturally).
    • size() returns the number of elements in the list
nodechoice
NodeChoice
  • NodeChoice is the class which JTB uses to represent choice points in a grammar.  An example of this would be 

( "abstract" | "final" | "public" )

  • JTB would represent the production 

void ResultType() :

{}

{ "void" | Type() }

    • as a class ResultType with a single child of type NodeChoice. 
details93
Details
  • The type stored by this NodeChoice would not be determined until the file was actually parsed. 
  • The node stored by a NodeChoice would then be accessible through the choice field. 
  • Since the choice is of type Node, typecasts are sometimes necessary to access the fields of the node stored in a NodeChoice. 
implementation
Implementation

public class NodeChoice implements Node {   

public NodeChoice(Node node, int whichChoice);   

public void accept(visitor.Visitor v);   

public Object accept(visitor.ObjectVisitor v,

Object argu);   

public Node choice;   

public int which;

}

which one
Which One?
  • Another feature of NodeChoice is the field which for determining which of the choices was selected
  • The which field is used to see which choice was used
    •   If the first choice is selected, which equals 0 (following the old programming custom to start counting at 0). 
    • If the second choice is taken, which equals 1.  The third choice would be 2, etc. 
    • Note that your code could potentially break if the order of the choices is changed in the grammar. 
nodelist
NodeList
  • NodeList is the class used by JTB to represent lists.  An example of a list would be 

( "[" Expression() "]" )+

  • JTB would represent the javacc production :

void ArrayDimensions() :

{}

{ ( "[" Expression() "]" )+ ( "[" "]" )* }

    • as a class ArrayDimensions() with children NodeList and NodeListOptional respectively. 
details97
Details
  • NodeLists use java.lang.Vectors to store the lists of nodes. 
  • Like NodeChoice, typecasts may occasionally be necessary to access fields of nodes contained in the list. 
implementation98
Implementation

public class NodeList implements NodeListInterface {   

public NodeList();  

  public void addNode(Node n);   

public Enumeration elements();   

public Node elementAt(int i);  

  public int size();   

public void accept(visitor.Visitor v);  

  public Object accept(visitor.ObjectVisitor v,

Object argu);   

public Vector nodes;

}

nodetoken
NodeToken
  • This class is used by JTB to store all tokens into the tree, including JavaCC "special tokens" (if the -tk command-line option is used). 
  • In addition, each NodeToken contains information about each token, including its starting and ending column and line numbers. 
implementation100
Implementation

public class NodeToken implements Node {   

public NodeToken(String s);   

public NodeToken(String s, int kind, int beginLine,

 int beginColumn, int endLine, int endColumn);

public String toString();   

public void accept(visitor.Visitor v);   

public Object accept(visitor.ObjectVisitor v, Object argu);

// -1 for these ints means no position info is available.

// …

}

continued
Continued

public class NodeToken implements Node {

// ….

public String tokenImage;     

public int beginLine, beginColumn, endLine, endColumn;

// -1 if not available.   

// Equal to the JavaCC token "kind" integer.      

public int kind;  

// Special Token methods below   

public NodeToken getSpecialAt(int i);   

public int numSpecials();   

public void addSpecial(NodeToken s);   

public void trimSpecials();   

public String withSpecials();  

public Vector specialTokens;

}

token details
Token Details
  • The tokens are simply stored as strings. 
  • The field tokenImage can be accessed directly, and the toString() method returns the same string. 
  • Also available is the kind integer. 
  • JavaCC assigns each type of token a unique integer to identify it. 
  • This integer is now available in each JTB NodeToken.  For more information on using the kind integer, see the JavaCC documentation. 
member variables of generated classes
Member Variables of Generated Classes
  • Next comes the member variables of the ImportDeclaration class. 
  • These are generated based on the RHS of the production.  Their type depends on the various items in the RHS and their names begin with f0 and work their way up. 
  • Why are they public? 
    • Visitors which must access these fields reside in a different package than the syntax tree nodes
    • Package visibility cannot be used. 
    • Breaking encapsulation was a necessary evil in this case. 
what gets produced part 2
What gets produced?Part 2

public ImportDeclaration(NodeToken n0,

Name n1, NodeOptional n2, NodeToken n3)

{

f0 = n0;

f1 = n1;

f2 = n2;

f3 = n3;

}

public ImportDeclaration(Name n0, NodeOptional n1) {

f0 = new NodeToken("import");

f1 = n0;

f2 = n1;

f3 = new NodeToken(";");

}

constructors
Constructors
  • The next portion of the generated class is the standard constructor.  It is called from the tree-building actions in the annotated grammar so you will probably not need to use it. 
  • Following the first constructor is a convenience constructor with the constant tokens of the production already filled-in by the appropriate NodeToken.  This constructor's purpose is to help in manual construction of syntax trees. 
what gets produced part 3
What gets produced?Part 3

public void accept(visitor.Visitor v)

{       v.visit(this);   

}   

public Object

accept(visitor.ObjectVisitor v, Object argu) {      

return v.visit(this,argu);   

}

}

the accept methods
The Accept Methods
  • After the constructor are the accept() methods. 
  • These methods are the way in which visitors interact with the class. 

void accept(visitor.Visitor v)

    • works with Visitor

Object accept(visitor.ObjectVisitor v,

Object argu)

    • works with ObjectVisitor
other options
Other Options?
  • Yes, you could use jjtree
  • How does it compare?
references
References
  • Java Tree Builder Documentation
  • Why Visitors?
  • Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides. Design Patterns: Elements of Reusable Object-Oriented Software. Addison-Wesley, 1995