1 / 26

Abstract Syntax Trees

Abstract Syntax Trees. Compiler Baojian Hua bjhua@ustc.edu.cn. Front End. lexical analyzer. source code. tokens. abstract syntax tree. parser. semantic analyzer. IR. Recap. Lexer Program source to token sequence Parser token sequence, and answer Y or N Today’s topic:

violet-pope
Download Presentation

Abstract Syntax Trees

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Abstract Syntax Trees Compiler Baojian Hua bjhua@ustc.edu.cn

  2. Front End lexical analyzer source code tokens abstract syntax tree parser semantic analyzer IR

  3. Recap • Lexer • Program source to token sequence • Parser • token sequence, and answer Y or N • Today’s topic: • abstract syntax trees

  4. E E * E 15 ( E ) E + E 3 4 Abstract Syntax Trees • Parse trees encodes the grammatical structure of the source program • However, they contain a lot of unnecessary information • What are essential here?

  5. E E * E 15 ( E ) E + E 3 4 Abstract Syntax Trees • For the compiler to understand an expression, it only need to know operators and operands • punctuations, parentheses, etc. are not needed • Similar for statements, functions, etc.

  6. E E * E 15 ( E ) E + E 3 4 Abstract Syntax Trees Times Int 15 Plus Int 3 Int 4 Parse tree Abstract syntax tree

  7. Concrete and Abstract Syntax • Concrete Syntax is needed for parsing • includes punctuation symbols, factoring, elimination of left recursion, depends on the format of the input • Abstract Syntax is simpler, more convenient internal representation • clean interface between the parser and the later phases of the compiler

  8. S E E + T T T * F x F F 2 3 Concrete and Abstract Syntax 2 + 3 * x E ::= E + T | T T ::= T * F | F F ::= id | num | ( E )

  9. Plus Int 2 Times Int 3 Id x Concrete and Abstract Syntax 2 + 3 * x E ::= id | num | E + E | E * E | ( E )

  10. AST Data Structures • In the compiler, abstract syntax makes use of the implementation language to represent aspects of the grammatical structure • Highly target and implementation languages dependent • arts more than science

  11. AST in SML (* data structures *) datatype exp = Int of int | Id of string | Add of exp * exp | Times of exp * exp E ::= id | num | E + E | E * E | ( E ) (* to encode “2+3*x” *) val prog = Add (Int 2, Times (Int 3, Id “x”)) (* Compile “2+3*x”. To be covered later… *) val x86 = compile (prog)

  12. AST in SML (* calculate number of nodes in an ast *) fun numNodes e = case e of Int _ => 1 | Id _ => 1 | Add (e1, e2) => 1 + numNodes e1 + numNodes e2 | Times (e1, e2) => 1 + numNodes e1 + numNodes e2 (* Note this may be too inefficient, why? *)

  13. AST in SML (* tail-recursion *) fun numNodes (e, n) = case e of Int _ => 1 + n | Id _ => 1 + n | Add (e1, e2) => let val n’ = numNodes (e1, n) in numNodes (e2, 1+n’) end | Times (e1, e2) => …(*similar)

  14. AST in SML (* yet another version using reference *) val nodes = ref 0; val op ++ = fn x => x := !x + 1 fun numNodes e = case e of Int _ => ++ nodes | Id _ => ++ nodes | Add (e1, e2) => (numNodes e1 ; ++ nodes ; numNodes e2) ) | Times (e1, e2) => …(*similar)

  15. AST in C /* data structures */ typedef struct exp *exp; enum expKind {INT, ID, ADD, TIMES}; struct exp { enum expKind kind; union { int i; char *id; struct {exp e1; exp e2;} add; struct {exp e1; exp e2;} times; } u; }; E ::= id | num | E + E | E * E | ( E )

  16. AST in C /* sample program “2+3*x” */ exp e1 = malloc (sizeof (*e1)); e1->kind = INT; e1->u.i = 3; exp e2 = malloc (sizeof (*e2)); e2->kind = ID; e2->u.id = “x”; exp e3 = malloc (sizeof (*e3)); e3->kind = TIMES; e3->u.times.e1 = e1; e2->u.times.e2 = e2; … /* really boring and error-prone :-( */ E ::= id | num | E + E | E * E | ( E )

  17. AST in C (* number of nodes again *) int numNodes (exp e) { switch (e->kind) { case INT: return 1; case ID: return 1; case ADD: case TIMES: return 1+numNodes(e->u.add.e1) +numNodes(e->u.add.e2); default: error (“impossible”); } } Aha, C compiler is stupid!

  18. AST in OO /* data structures */ abstract class Exp {} class Int extends Exp {…} class Id extends Exp {…} class Add extends Exp {…} class Times extends Exp {…} E ::= id | num | E + E | E * E | ( E ) /* to encode “2+3*x” */ Exp prog = new Add (new Int (2), new Times (new Int (3), new Id (“x”))) /* Not so ugly as C, but still boring */

  19. AST in OO (* number of nodes again *) int numNodes (Exp e) { if (e instanceof Int) return 1; else if (e instanceof Id) return 1; else if (e instanceof ADD) { Add f = (Add)e; return 1+numNodes(f.e1)+numNodes(f.e2); } … }

  20. AST Generations • ML-Yacc uses an attribute-grammar scheme • each nonterminal may have a semantic value associated with it • when the parser reduces with (X ::= s1…sn) • a semantic action will be executed • uses semantic values from symbols in si • when parsing completes successfully • parser returns semantic value associated with the start symbol • usually an abstract syntax tree

  21. E Attribute Grammars 2 factor term exp exp + exp + 3 exp + factor exp + term 2 + 3 * 4 + 3 * 4 + 3 * 4 + 3 * 4 + 3 * 4 3 * 4 * 4  * 4  * 4 S + * 2 + T E 3 T * F 2 T 4 Each nonterminal is associated with a tree. 3 F 4 2 F 4 3 2 3 2

  22. Attribute Grammars datatype exp = Id of string | Num of int | Add of exp * exp | Times of exp * exp %% %% e -> e PLUS e (Add (e1, e2)) | e TIMES e (Times (e1, e2)) | ID (Id ID) | NUM (Num NUM)

  23. Source Position • In one-pass compiler, error messages are precise • early compilers never worry about with this • But in a multi-pass compiler, source positions must be stored in AST itself (* Example *) type pos = … datatype exp = Int of int * pos | Id of string * pos | Add of exp * exp * pos | Times of exp * exp * pos

  24. Source Position datatype exp = Id of string * pos | Num of int * pos | Add of exp * exp * pos | Times of exp * exp * pos %% %% e -> e PLUS e (Add (e1, e2, PLUSleft)) | e TIMES e (Times (e1, e2, TIMESleft)) | ID (Id (ID, IDleft)) | NUM (Num (NUM, NUMleft))

  25. Labs • For lab #4, your job is to produce abstract syntax trees from source programs • we’ve offered code skeleton, you should firstly familiarize yourself with it • your job is to understand the “layout” function etc. • and glue the parser by adding semantic actions • Test your compiler carefully to make sure it parses the source programs correctly

  26. Summary • Abstract syntax trees are compiler internal representations of source programs • interface between front-end and compiler later parts • Abstract syntax trees design is language-dependent, and more art than science

More Related