1 / 41

Compiler Construction Syntax Analysis

Compiler Construction Syntax Analysis. Goals of parsing. Every Programming language has syntactic rules Decide whether program satisfies syntactic structure Error detection Error recovery Simplification: rules on tokens Build A bstract S yntax T ree. E. E. +. E. num. (. E. ).

rendor
Download Presentation

Compiler Construction Syntax Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Compiler Construction Syntax Analysis

  2. Goals of parsing • Every Programming language has syntactic rules • Decide whether program satisfies syntactic structure • Error detection • Error recovery • Simplification: rules on tokens • Build Abstract Syntax Tree

  3. E E + E num ( E ) + E * E num id num * 7 x Ex. From text to abstract syntax tree 5 + (7 * x) program text Lexical Analyzer token stream Grammar ( CFG)E id E num E  E+EE  E*EE  ( E ) Parser parse tree valid syntaxerror Abstract syntax tree

  4. CFG context free Grammar: is collection of G=(V T P S) Page Grammar rules:E id E num E  E+EE  E*EE  ( E ) Symbols:terminals (tokens)+ * ( )id numnon-terminals E Parse tree: Derivation:EE + E1+ E1+ E + E1+2+ E 1+2*3 E E + E 1 * E E 3 2

  5. Production rules actually defines the language • Ex. Let the Language L= anbnwhere n=>1 • G=(VTPS) v={S} • T={a,b} give production rule S aSb S ab

  6. Parsing Techniques

  7. Introduction • A Top-down parser tries to create a parse tree from the root towards the leafs scanning input from left to right • It can be also viewed as finding a leftmost derivation for an input string • Example: id+id*id E E E E E E E -> TE’ E’ -> +TE’ | Ɛ T -> FT’ T’ -> *FT’ | Ɛ F -> (E) | id lm lm lm lm lm T E’ T E’ T E’ T E’ T E’ F T’ F T’ F T’ F T’ + T E’ id id id Ɛ Ɛ

  8. Top down parsing • Parse tree is generated from top to bottom (root to leaves) • Consider grammar • S xPz • Pyw|y • Consider input string xyz • Construct parse tree for above grammar • Step 1 : explore start symbol S • Step 2 : P as yw -------- not matched with string • Go back try Py

  9. 1.backtracking will try different production rule to find the match for the i/p string by backtracking each time . It is powerful than predictive parser It is slower.

  10. Predictive parser • Tries to predict the next construction using one or more lookahead symbols from i/p string. • 1 Recursive decent • 2 LL(1) parser

  11. Recursive decent parser • It consist of a set of procedures. One for each non terminal • CFG is used to build recursive routines • RHS of production rule is directly converted to program • Execution begins with the procedure for start symbol

  12. A typical procedure for non terminal • Void A() • { Choose an A production , AX1 X2…. Xk for(i=1 to k) { if (Xi is non-terminal call procedure Xi(); else if (Xi == current input symbol a ) advance the i/p to next symbol. else error has occured

  13. In general form it cant choose an A production Easily so we need to try all alternatives • If one failed the i/p ptr needs to be reset and another alternative should be tried • Ex- E num T • T* num T |ɛ • i/p string • 3 * 4 $

  14. i/p string use production rule • 3 * 4 $ Enum T • 3 * 4 $ T* num T • 3 * 4 $ T* num T • 3 * 4 $ Declare success • halt

  15. Predictive LL(1) parser • It is non recursive • Here a parsing table is built

  16. Parsing methods • LL(1) • “L” – left-to-right scan of input • “L” – uses leftmost derivation for i/p string • “1” – (Look ahead) it uses only one symbol to predict parsing Process so that it can decide which production to apply • predict based on one token look-ahead • diagram ???? • It uses following data structures • stack, input buffer, parsing table

  17. Model of LL(1) parser Input token a + b $ LL(1) Parser TOP output Parsing table stack

  18. The stack is used to hold the left sentential form, Aϒ • The symbols in RHS of rule are pushed into the stack in reverse order. (from R-L) APQR then push R,Q,P • Thus stack makes this algorithm non recursive • Table entries are M[A, a] – A non terminal a - current i/p symbol. • The parser works as follows • 1 It reads top of the stack and current i/p symbol. With the help of these two symbols the parsing action is determined. These actions can be

  19. Construction of Predictive LL(1) parser • The construction of predictive LL(1) parser is based on two very important functions FIRST and FOLLOW • Overall construction steps • 1. computations of FIRST and FOLLOW function • 2. construct the Predictive parsing table using FIRST and FOLLOW function • 3. Parse the i/p string with the help of Predictive Parsing table

  20. Compute FIRST • To compute FIRST(X) for all grammar symbols X, apply following rules until no more terminals or ɛcan be added to any First set: FIRST(A) is a set of terminal symbols that are first symbols appearing at RHS in derivation of A If Aɛ then empty is also in FIRST(A) • If a is a terminal symbol then then First(a) = {a}. • If X->ɛis a production then add ɛto First(X) • For the rule AX1 X2 X3……Xk then FIRST(A)= First(X1) U First(X2)…

  21. Follow • Follow(A) is defined as the set of terminal symbols that appear immediately to the right of A, in other words. • Follow (A)={a| S αAaβ} • compute Follow(A) for all nonterminals A, apply following rules until nothing can be added to any follow set: • Place $ in Follow(S) where S is the start symbol • If there is a production A-> αBβ then everything in First(β) except ɛ is in Follow(B). • If there is a production A-> αB or a production A->αBβ where First(β) ={ ɛ}, then everything in Follow(A) is in Follow(B) or Follow(B)=Follow(A) that means everything in Follow(A) is in Follow(B)

  22. Consider grammar • ETE’ • E’+TE’| ɛ • TFT’ • T’*FT’| ɛ • F(E)| id • Find first and follow

  23. Construction of predictive parsing table • For each production A->α in grammar do the following: • For each terminal a in First(α) create entry M[A,a] =Aα • If ɛ is in First(α), create entry M[A,b]= Aα where b is the symbol from Follow(A) • If ɛ is in First(α) and $ is in Follow(A) then create entry in the table M[A,$]= Aα • All the remaining entries in the table M are marked as SYNTAXerror

  24. Algorithm for

  25. Example First Follow F T E E’ T’ {(,id} {+, *, ), $} E -> TE’ E’ -> +TE’ | Ɛ T -> FT’ T’ -> *FT’ | Ɛ F -> (E) | id {(,id} {+, ), $} {(,id} {), $} {+,ɛ} {), $} {+, ), $} {*,ɛ} Input Symbol Non - terminal id + ( ) $ * E E’ T T’ F E -> TE’ E -> TE’ E’ -> +TE’ E’ -> Ɛ E’ -> Ɛ T -> FT’ T -> FT’ T’ -> *FT’ T’ -> Ɛ T’ -> Ɛ T’ -> Ɛ F -> (E) F -> id

  26. Grammar hierarchy Non-ambiguous CFG LALR(1) LL(1) SLR(1) LR(0)

  27. See you next time

  28. Compiler Construction tools

  29. If u want to write a compiler program there are some commonly used complier construction tools include • 1. Parser generators : that automatically produce syntax analyzer from a grammatical Description of a programming language. 2. Scanner generator : that produce lexical analyzers from a regular expression description of the tokens of the language 3. Syntax directed translation engines that produce collections of routines for walking a parse tree and generating intermediate code.

  30. 4. Code generator generators: that produce a code generator from a collection of rules for translating each operation of the intermediate language into the m/c language for a target m/c • 5.Data flow analysis : engines that facilitate the gathering of information about how values are transmitted from one part of program to each other part . Data-flow analysis is a key part of code optimization • 6.Compiler construction toolkits that provide an integrated set of routines for constructing various phases of compiler.

  31. Error handling • Common programming errors • Lexical errors • Syntactic errors • Semantic errors • Lexical errors • Error handler goals • Report the presence of errors clearly and accurately • Recover from each error quickly enough to detect subsequent errors • Add minimal overhead to the processing of correct progrms

  32. Error-recover strategies • Panic mode recovery • Discard input symbol one at a time until one of designated set of synchronization tokens is found • Phrase level recovery • Replacing a prefix of remaining input by some string that allows the parser to continue • Error productions • Augment the grammar with productions that generate the erroneous constructs • Global correction • Choosing minimal sequence of changes to obtain a globally least-cost correction

  33. Error recovery from syntax error strategies • There are various methods for error recovery some strategies are • Panic Mode: • This is the simplest method • It is used by most parsing methods • In this method on discovering error the parser discards input symbol one at a time . This process is continued until one of the designed set of synchronizing token is found. Synchronizing tokens are ; or . . These tokens indicate end of input statement. • Thus in panic mode recovery a considerable amount of input is skipped without checking it for additional errors. • This method guarantees not to go in infinite loop. • If there is less no. of errors in the same statement then this strategy is best choice.

  34. Phrase level recovery • In this method on discovering error, the parser perform local correction on remaining input • It can replace a prefix of remaining input by some string some local corrections are replacing , by ; deletion of extra semicolon or inserting extra semicolon etc • the type of local correction is decided by compiler designer • But while performing local correction we must be careful to choose replacement that do not lead to infinite loops. • The drawback is that it finds difficult to handle the situations where the actual error has occurred before the point of detection.

  35. Error production ….(used to display error message) • If we have a knowledge of common errors that can be encountered then we can incorporate these errors by augmenting the grammar with error productions. • If error prodcution is used then during parsing ,we can generate appropriate error message and parsing can be continued. • This is extremely difficult to maintain. Because if we change the grammar then it becomes necessary to change corresponding error productions.

  36. Global productions • We often want such a compiler that makes very few changes in processing an incorrect input string • We expect less no. of insertions, deletions and changes of tokens to recover from erroneous input. • Such methods increase time and space requirements at parsing time. • Choosing minimal sequence of changes to obtain a globally least-cost correction

More Related