1 / 37

Chapter 2. Design of a Simple Compiler

Chapter 2. Design of a Simple Compiler. J. H. Wang Sep. 20, 2011. Outline. An Informal Definition of the ac Language Formal Definition of ac Phases of a Simple compiler Scanning Parsing Abstract Syntax Trees Semantic Analysis Code Generation. Introduction.

aadi
Download Presentation

Chapter 2. Design of a Simple Compiler

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 2. Design of a Simple Compiler J. H. Wang Sep. 20, 2011

  2. Outline • An Informal Definition of the ac Language • Formal Definition of ac • Phases of a Simple compiler • Scanning • Parsing • Abstract Syntax Trees • Semantic Analysis • Code Generation

  3. Introduction • An overview of compilation process by considering a simple language • A quick overview of a compiler’s phases and their associated data structures

  4. An Informal Definition of the ac Language • ac: adding calculator • Types • integer • float: allows 5 fractional digits after the decimal point • Automatic type conversion from integer to float • Keywords • f: float • i: integer • p: print • Variables • 23 names from lowercase Roman alphabet except the three reserved keywords f, i, and p • Target of translation: dc (desk calculator) • Reverse Polish notation (RPN)

  5. Formal Definition of ac • Syntax specification: context-free grammar (CFG) • (Chap. 4) • Token specification: regular expressions • (Sec. 3.2)

  6. Syntax Specification

  7. CFG: • A set of productions or rewriting rules • E.g.: Stmt  id assign Val Expr | print id • Two kinds of symbols • Terminals: cannot be rewritten • E.g.: id, assign, print • Start symbol: Prog • Empty or null string: λ • End of input stream or file: $ • Nonterminals: • E.g.: Val, Expr • Left-hand side (LHS) • Right-hand side (RHS)

  8. Token Specification

  9. Example ac program: f bi aa = 5b = a + 3.2p b$ Corresponding dc code 5sala3.2+sblbp An Example ac Program

  10. Phases of a Simple Compiler • Scanner: source ac program -> tokens • Chap. 3 • Parser: tokens -> abstract syntax tree (AST) • Chap. 5 & 6 • Symbol table: created from AST • Chap. 8 • Semantic analysis: AST decoration • Translation

  11. Scanning • To translate a stream of characters into a stream of tokens • Automatic construction of scanners: Chap.3 • Token: • Type: membership in the terminal alphabet • Semantic value: additional information • For most programming languages, the scanner’s job is not so easy • +, ++ • //, “, \” • Variable-length tokens

  12. CANNER ADVANCE PEEK PEEK CAN IGITS ADVANCE EXICAL RROR

  13. CAN IGITS PEEK ADVANCE PEEK ADVANCE PEEK ADVANCE

  14. Parsing • To determine if the stream of tokens conforms to the language’s grammar specification • Chap. 4, 5, 6 • For ac, a simple parsing technique called recursive descent is used • “Mutually recursive parsing routines that descend through a derivation tree” • Each nonterminal has an associated parsing procedure for determining if the token stream contains a sequence of tokens derivable from that nonterminal

  15. Predicting a Parsing Procedure • Examine the next input token to predict which production should be applied • E.g.: • Stmt  id assign Val Expr • Stmt  print id • Predict set • {id} [1] • {print} [6]

  16. TMT PEEK MATCH MATCH AL XPR PEEK MATCH MATCH ERROR TMT PEEK MATCH MATCH AL XPR PEEK MATCH MATCH ERROR

  17. Consider the productions for Stmts • Stmts  Stmt Stmts • Stmts  λ • The predict sets • {id, print} [8] • {$} [11]

  18. TMTS PEEK PEEK TMT TMTS PEEK ERROR

  19. Implementing the Production • When a terminal is encountered, a call to MATCH() is placed • For each nonterminal, the corresponding procedure will be called • For the symbol λ, no code is executed

  20. Abstract Syntax Trees • Aspects of compilation that can be difficult to perform during syntax analysis • Some aspects of language cannot be specified in a CFG • Symbol usage consistency with type declaration • In Java: x.y.z • Package x, class y, static field z • Variable x, field y, another field z • Operator overloading • +: numerical addition or appending of strings • Separation into phases makes the compiler much easier to write and maintain

  21. Parse trees are large and unnecessarily detailed (Fig. 2.4) • Abstract syntax tree (AST) (Fig. 2.9) • Inessential punctuation and delimiters are not included • A common intermediate representation for all phases after syntax analysis • Declarations need not be in source form • Order of executable statements explicitly represented • Assignment statement must retain identifier and expression • Nodes representing computation: operation and operands • Print statement must retain name of identifier

  22. Semantic Analysis • Example processing • Declarations and name scopes are processed to construct a symbol table • Type consistency • Make type-dependent behavior explicit

  23. Symbol Tables • To record all identifiers and their types • 23 entries for 23 distinct identifiers in ac (Fig. 2.11) • Type info.: integer, float, unused (null) • Attributes: scope, storage class, protection properties • Symbol table construction (Fig. 2.10) • Symbol declaration nodes call VISIT(SymDeclaring n) • ENTERSYMBOL checks the given symbol has not been previously declared

  24. VISIT GET YPE NTER YMBOL GET D GET D NTER YMBOL NTER YMBOL ERROR OOKUP YMBOL

  25. Type Checking • Only two types in ac • Integer • Float • Type hierarchy • Float wider than integer • Automatic widening (or casting) • integer -> float

  26. VISIT ONSISTENT VISIT ONVERT VISIT OOKUP YMBOL VISIT VISIT Type Analysis

  27. ONSISTENT ENERALIZE ONVERT ONVERT ENERALIZE ONVERT ERROR

  28. Type checking • Constants and symbol reference: simply set the node’s type based on the node’s contents • Computation nodes: CONSISTENT(n.c1, n.c2) • Assignment operation: CONVERT(n.c2, n.c1.type) • CONSISTENT() • GENERALIZE(): determines the least general type • CONVERT(): checks whether conversion is necessary

  29. Code Generation • The formulation of target-machine instructions that faithfully represent the semantics of the source program • Chap. 11 & 13 • dc: stack machine model • Code generation proceeds by traversing the AST, starting at its root • VISIT (Computing n) • VISIT (Assigning n) • VISIT (SymReferencing n) • VISIT (Printing n) • VISIT (Converting n)

  30. VISIT ODE EN MIT MIT MIT VISIT ODE EN ODE EN MIT VISIT MIT MIT VISIT MIT MIT MIT MIT VISIT ODE EN MIT VISIT MIT VISIT ODE EN MIT MIT MIT VISIT ODE EN ODE EN MIT VISIT MIT MIT VISIT MIT MIT MIT MIT VISIT ODE EN MIT VISIT MIT

  31. End of Chapter 2

More Related