1 / 28

Syntax Analysis / Parser

Syntax Analysis / Parser. Parser works on a stream of tokens. The smallest item is a token. token. source program. parse tree. get next token. Syntax Analysis.

edmund
Download Presentation

Syntax Analysis / Parser

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Syntax Analysis / Parser • Parser works on a stream of tokens. • The smallest item is a token. token source program parse tree get next token

  2. Syntax Analysis • At syntax analysis phase, a compiler verifies whether or not the tokens generated by the lexical analyzer are grouped according to the syntactic rules of the language. • If the tokens in a string are grouped according to the language’s rules of syntax, then the string of tokens generated by the lexical analyser is accepted as a valid construct of the language; otherwise an error handler is called. • Therefore, a suitable notation must be used to specify the constructs of a languageCFG

  3. Syntax Analyzer • Syntax Analyzer creates the syntactic structure (Parse Tree)of the given source program. • The syntax of a programming is described by a CFG. We will use BNF notation in the description of CFGs. • The syntax analyzer checks whether a given source program satisfies the rules implied by a CFG or not. • If it satisfies, the parser creates the parse tree of that program. • Otherwise the parser gives an error messages. • A context-free grammar • gives a precise syntactic specification of a programming language. • the design of the grammar is an initial phase of the design of a compiler.

  4. Parsers We categorize the parsers into two groups: • Top-Down Parser • the parse tree is created top to bottom, starting from the root. (from N.T  T) • Bottom-Up Parser • the parse tree is created bottom to top, starting from the leaves. (from TN.T) • Both top-down and bottom-up parsers scan the input from left to right (one symbol at a time).

  5. Context-Free Grammars (CFG) • Inherently recursive structures of a programming language are defined by a CFG. • In a CFG, we have: • A finite set of terminals (in our case, this will be the set of tokens) • A finite set of non-terminals (syntactic-variables) • A finite set of productions rules in the following form A   where A is a non-terminal and  is a string of terminals and non-terminals (including the empty string) • A start symbol (one of the non-terminal symbol) • Example: E  E + E | E – E | E * E | E / E | - E E  ( E ) E  id

  6. Derivations E  E+E E+E derives from E • we can replace E by E+E • to able to do this, we have to have a production rule EE+E in our grammar. E  E+E  id+E  id+id “A sequence of replacements of non-terminal symbols is called derivation”. In general a derivation step is A  if there is a production rule A in our grammar where  and  are arbitrary strings of terminal and non-terminal symbols 1  2  ...  n (n derived from 1 or 1 derives n )  : derives in one step  : derives in zero or more steps  : derives in one or more steps * +

  7. CFG - Terminology • L(G) is the language of G (the language generated by G) which is a set of sentences. • A sentence of L(G) is a string of terminal symbols of G. • If S is the start symbol of G then  is a sentence of L(G) iff S   where  is a string of terminals of G. • If G is a context-free grammar, L(G) is a context-free language. • Two grammars are equivalent if they produce the same language. • S   - If  contains non-terminals, it is called as a sentential form of G. - If  does not contain non-terminals, it is called as a sentence of G. +

  8. Derivation Example Let the grammar be E  E + E | E – E | E * E | E / E | - E E  ( E ) E  id Elmd -E  -(E)  -(E+E)  -(E * E+E)  -(id * E+E)  -(id * id+E)  -(id*id+ id) Ermd -E  -(E) -(E*E)  -(E * E +E) -(E *E + id) -(E* id + id) -(id*id +id) • At each derivation step, we can choose any of the non-terminal in the sentential form of G for the replacement. • If we always choose the left-most non-terminal in each derivation step, this derivation is called as left-most derivation. • If we always choose the right-most non-terminal in each derivation step, this derivation is called as right-most derivation. Exercise • Derive a sentence -(id + id* id) using LMD and RMD?

  9. Left-Most and Right-Most Derivations Left-Most Derivation E  -E  -(E)  -(E+E)  -(id+E)  -(id+id) Right-Most Derivation E  -E  -(E)  -(E+E)  -(E+id)  -(id+id) • Top-down parsers try to find the left-most derivation of the given source program. • Bottom-up parsers try to find the right-most derivation of the given source program in the reverse order.

  10. E E E - E - E - E ( E ) ( E ) E + E E E - E - E ( E ) ( E ) E E + E + E id id id Parse Tree • Inner nodes of a parse tree are non-terminal symbols. • The leaves of a parse tree are terminal symbols. • A parse tree can be seen as a graphical representation of a derivation. E  -E  -(E)  -(E+E)  -(id+E)  -(id+id)

  11. Ambiguity • A grammar that produces more than one parse tree for a sentence is called as an ambiguous grammar. • Therefore, the grammar having the productions E→ E +E E → E * E E → id Is an ambiguous grammar because there exists more than one parse tree for the sentence id + id * id in L(G).

  12. E E + E id * E E id id E * E E E + E id id id Two parse trees for id + id*id E  E+E  id+E  id+E*E  id+id*E  id+id*id E  E*E  E+E*E  id+E*E  id+id*E  id+id*id

  13. Ambiguity – Operator Precedence Rewrite the grammar • Use different N for each precedence levels • Start with the lowest precedence EE-E|E/E|(E)|id Rewrite to: EE-E|T TT/T|F Fid|(E)

  14. Conts.. W=id-id/id EE-E E-T E-T/T E-T/F E-T/id E-F/id E-id/id id-id/id

  15. Workout E  E + E | E – E | E * E | E / E E  ( E ) E  id Rewrite the above P in operator precedence and parse this string w= id+id-id (both in RMD and LMD)

  16. Left Recursion • A grammar is left recursive if it has a non-terminal A such that there is a derivation. A  A for some string  • Top-down parsing techniques cannot handle left-recursive grammars. • So, we have to convert our left-recursive grammar into a grammar which is not left-recursive. • The left-recursion may appear in a single step of the derivation (immediate left-recursion), or may appear in more than one step of the derivation. +

  17. Left-Recursion -- Problem • A grammar can be a direct and indirect left recursion • By just eliminating the immediate left-recursion, we may not get • a grammar which is not left-recursive. • S  Aa | b • A  Sc | d This grammar is not immediately left-recursive, • but it is still left-recursive. • S  Aa  Sca or • A  Sc  Aac causes to a left-recursion • So, we have to eliminate all left-recursions from our grammar

  18. Eliminating Left –recursion • Eliminate -production • Eliminate cycles AA (one / more derivation of time if it produces A for A) +

  19. ExampleCFG is given as : SXX|Y |  XaSb | S YYb|Z ZbZa|  S, XS, YZ, Z SXX|Y|X XaSb|S|ab YYb|Z|b ZbZa|ba

  20. Workout SXZ XXb|  ZaZ|ZX|  • What are the nullable variables? • What are the new productions?

  21. Cont..(Eliminating Cycles) SXX|Y|X XaSb|b |S YYb |b|Z ZbZa|ba YY , ZZ SXX|Y|X XaSb|b |S YY’b|b|Z ZbZ’a|ba Y’Yb|b|Z Z’bZa|ba SXX|Y|X XaSb|b |S YYb Yb YZ ZbZa Zba

  22. Immediate Left-Recursion where: A is a left-recursive nonterminal  is a sequence of nonterminals and terminals that is not null  is a sequence of nonterminals and terminals that does not start with A. replace the A-production by the production: And create a new nonterminal This newly created symbol is often called the "tail", or the "rest".

  23. Immediate Left-Recursion -- Example For each rule which contains a left-recursive option, A  A c   |  Bc Introduce a new nonterminal A' and rewrite the rule as A  B c A'  A'   | c A’

  24. Left-Factoring • A predictive parser (a top-down parser without backtracking) insists that the grammar must be left-factored. grammar  a new equivalent grammar suitable for predictive parsing stmt  if expr then stmt else stmt | ????? if expr then stmt (which one to use ???)

  25. Left-Factoring -- Steps • For each non-terminal A with two or more alternatives (production rules) with a common non-empty prefix, let say A  1 | ... | n | 1 | ... | m convert it into A  A’ | 1 | ... | m A’  1 | ... | n

  26. Left-Factoring (cont.) • In general, A  1 | 2 where  is non-empty and the first symbols of 1 and 2 are different. • when processing  we cannot know whether expand A to 1 or A to 2 • But, if we re-write the grammar as follows A  A’ A’  1 | 2 so, we can immediately expand A to A’

  27. Left-Factoring – Example1 A  abB | aB | cdg | cdeB | cdfB  A  aA’ | cdg | cdeB | cdfB A’  bB | B  A  aA’ | cdA’’ A’  bB | B A’’  g | eB | fB

  28. Left-Factoring – Example2 A  ad | a | ab | abc | b  A  aA’ | b A’  d |  | b | bc  A  aA’ | b A’  d |  | bA’’ A’’   | c

More Related