1 / 30

Week 2 – Lecture 1

Week 2 – Lecture 1. Compiler Construction. Introduction to Parsing Recursive Grammars Derivations and parse trees Ambiguous Grammars Overview of Top Down Parsing. Syntax Analysis. aka Parsing Grouping together tokens into larger structures Analogous to lexical analysis Input: Tokens

dwight
Download Presentation

Week 2 – Lecture 1

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Week 2 – Lecture 1 Compiler Construction • Introduction to Parsing • Recursive Grammars • Derivations and parse trees • Ambiguous Grammars • Overview of Top Down Parsing

  2. Syntax Analysis • aka Parsing • Grouping together tokens into larger structures • Analogous to lexical analysis • Input: • Tokens • Output of Lexical Analyzer • Output: • Structured representation of original program

  3. Parsing Fundamentals • Source program: • 3 + 4 • After Lexical Analysis: ???

  4. Parsing • Expression  number plus number • Similar to regular definitions: • Concatenation • Choice expression  number Operator number Operator  + | - | * | / • Repetition is done differently

  5. BNF Grammar Expression  number Operator number Operator  + | - | * | / Structure on the left is defined to consist of the choices on the right hand side Meta-symbols:  | Different conventions for writing BNF Grammars: <expression> ::= number <operator> number Expression  number Operator number

  6. Derivations • Derivation: • Sequence of replacements of structure names by choices on the RHS of grammar rules • Begins: structure name • End: string of token symbols • Each step one replacement is made Exp  Exp Op Exp | number Op + | - | * | /

  7. Example Derivation Example: number * number + number Note the different arrows:  Derivation applies grammar rules  Used to define grammar rules Non-terminals: Exp, Op Terminals: number, * Terminals: because they terminate the derivation

  8. Derivations (2) • E  ( E ) ??????? • E  ( E ) | a • What sentences does this grammar generate An example derivation: • Note that this is what we couldn’t achieve with regular definitions • See pg 96 in textbook

  9. Recursive Grammars • E  ( E ) | a • is recursive • E  ( E ) is the general case • E  a is the terminating case • We have no * operator in context free grammars • Repetition = recursion • E  E  |  • derives ,  ,   ,     ,      …. • All strings beginning with  followed by zero or more repetitions of  •  *

  10. Recursive Grammars (2) • a+ (regular expression) • E  E a | a (1) • Or • E  a E | a (2) • 2 different grammars can derive the same language (1) is left recursive (2) is right recursive • a* • Implies we need the empty production • E  E a | 

  11. Recursive Grammars (3) • Require recursive data structures •  trees • Parse Trees Exp  Exp Op Exp | number Op + | - | * | / 1 exp 3 2 4 exp op exp number * number

  12. Parse Trees & Derivations • Leafs = terminals • Interior nodes = non-terminals • If we replace the non-terminals right to left • The parse tree sequence is right to left • A rightmost derivation -> reverse post-order traversal • If we derive left to right: • A leftmost derivation •  pre-order traversal •  parse trees encode information about the derivation process

  13. Abstract Syntax Trees Parse trees contain surplus information Parse Tree Abstract Syntax Tree + exp 3 4 exp op exp This is all the information we actually need Token sequence number + number 3 4

  14. An exercise • Consider the grammar lexp number | (op lexp-seq) op + | - | * lexp-seq  lexp-seq lexp | lexp • What are the terminals, nonterminals and start symbol • Find leftmost and rightmost derivations and parse trees for the following sentences • (+ 4) • (+ 4 (* 5 6 7))

  15. Parsing token sequence: id + id * id E  E + E | E * E | ( E ) | - E | id

  16. Ambiguous Grammars • A grammar that generates a string with 2 distinct parse trees is called an ambiguous grammar • 2+3*4 = 2 + (3*4) = 14 • 2+3*4 = (2+3) * 4 = 20 • Our experience of maths says interpretation 1 is correct but the grammar does not express this: E  E + E | E * E | ( E ) | - E | id

  17. Removing Ambiguity • Two methods • 1. Disambiguating Rules • +ve leaves grammar unchanged • -ve grammar is not sole source of syntactic knowledge • 2. Rewrite the Grammar • Using knowledge of the meaning that we want to use later in the translation into object code to guide grammar alteration

  18. Precedence • 2+3*4 • The * binds tighter to the 3 than the 2 E  E addop E | term addop  + | - term  term * term | factor factor  ( exp ) | number | id • Operators of equal precedence are grouped together at the same ‘level’ of the grammar  ’precedence cascade’

  19. Associativity • 45-10-5 ?30 or 40 Subtraction is left associative, left to right (=30) • E  E addop E | termDoes not tell us how to split up 45-10-5 • E  E addop term | termForces left associativity via left recursion • Precedence & associativity remove ambiguity of arithmetic expressions • Which is what our maths teachers took years telling us!

  20. Ambiguous grammars Statement -> If-statement | other If-statement -> if(Exp) Statement | if (Exp) Statement else Statement Exp -> 0 | 1 Parse if (0) if (1) other else other

  21. Removing ambiguity Statement -> Matched-stmt | Unmatched-stmt Matched-stmt -> if (Exp) Matched-stmt else Matched-stmt | other Unmatched-stmt -> if (Exp) Statement | if (Exp) Matched-stmt else Unmatched-stmt

  22. Top Down Parsing Start parsing from the start symbol and end up with a match for the sentence we are parsing. Predictive parsing non-backtracking Parse a category of grammars which are LL(1) nonambiguous no left recursion

  23. Top Down Parsing • Table Driven Predictive Parsing • Recursive Descent Predictive Parsing E -> TE’ E’ -> +TE’ | e T -> FT’ T’ -> *FT’ | e F -> (E) | id Note this grammar has no left recursion. Is unambiguous. Gives the correct precedence to arithmetic operators.

  24. Predictive Parsing Program Table Driven Predictive Parsing id + id * id a + b $ Input X Output Y Z $ Stack Parsing Table

  25. Table Driven Predictive Parsing Input Symbol Non Terminal ) + $ id ( * E->TE’ E E->TE’ E’->e E’ E’->e E’->+TE’ T T->FT’ T->FT’ T’->e T->*FT’ T’->e T’->e T’ F F->id F->(E)

  26. Table Driven Predictive Parsing Parse id + id * id Leftmost derivation and parse tree using the grammar E -> TE’ E’ -> +TE’ | e T -> FT’ T’ -> *FT’ | e F -> (E) | id

  27. Predictive Parsing Table • Now parse id + id * id using the parsing table

  28. First and Follow Sets • First and Follow sets tell when it is appropriate to put the right hand side of some production on the stack. (i.e. for which input symbols) E -> TE’ E’ -> +TE’ | e T -> FT’ T’ -> *FT | e F -> (E) | id id + id * id

  29. First Sets • If X is a terminal, then FIRST(X) is {X} • IF X -> e is a production, then add e to FIRST(X) • IF X is a nonterminal and X -> Y1Y2…Yk is a production, then place a in FIRST(X) if for some i, a is in FIRST(Yi), and e is in all of First(Y1), …First(Yi-1). If e is in FIRST(Yj) for all j = 1, 2, …k, then add e to FIRST(X).

  30. FIRST sets E -> TE’ E’ -> +TE’ | e T -> FT’ T’ -> *FT | e F -> (E) | id

More Related