1 / 34

Parsing

Parsing. G22.2110 Programming Languages May 24, 2012. New York University Chanseok Oh (chanseok@cs.nyu.edu). Chapter 2 Scanning Parsing. Overview Scanner, Tokenizer , Lexer , Lexical Analyzer IF ( A >= .30 ) THEN { …

kyrie
Download Presentation

Parsing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parsing G22.2110 Programming Languages May 24, 2012 New York University Chanseok Oh (chanseok@cs.nyu.edu)

  2. Chapter 2 Scanning Parsing

  3. Overview • Scanner, Tokenizer, Lexer, Lexical Analyzer IF ( A >= .30 ) THEN { … IF, LPARAN, IDENT(A), GTE, FPN(.30), RPARAN, THEN, … • Tokens, Lexemes • DFA , NFA, Regular expressions • lex, flex, Jlex • Parser • DPDA, Deterministic context-free grammars • Yacc, Bison

  4. Table of Contents • Practical parsers (Linear time) • LL (top-down, predictive) • LR (bottom-up, shift-reduce) • Related side-topics • Ambiguity, Language and parser hierarchy • Examples: Simple Calculator Language

  5. A Language • A set of strings (of given symbols) • { finite, set, with, five, strings } • { ab, aaba, abbaba, … } • { 0n1n } • { aibj | i < j } • { void main() { inti = 0 }, … } • Is an input string in the language? • cf. Recursive, Turing-decidable languages

  6. Context-Free Languages (CFL) • Languages that can be generated by • CFG’s • Languages that can be determined by • PDA’s • Not all languages are CF. • CFG: suitable for most PL’s. • <sentence> := <subject> <verb> <object> PERIOD • Deterministic CFL

  7. Example Here is our CFG: Input: sum , a1 , ptr ; S := idA A := ,idA A := ;

  8. S := idA A := ,idA A := ; S • Parse Tree A sum A , a1 A ptr , ;

  9. EE + E E– E E * E E / E • Ambiguous Grammars • Is it ambiguous? Undecidable. • No general procedure for converting to unambiguous grammars • Can be allowed to some extent for deterministic parsing, e.g., by defining precedence or associativity.

  10. Parsers • LL (Left-to-right, Left-most derivation) • Top-down • Predictive • Simple and easy to understand • LR (Left-to-right, Right-most derivation) • Bottom-up • Shift-reduce • Most common in production-level • SLR (Simple) • LALR (Look-ahead)

  11. LL(k) Parser • LL(k) Parser • Uses k look-ahead symbols • Does not backtrack (deterministic). • LL(1) is the most popular kind of LL parser. • LL(k) Languages • Not all CFL’s are LL(k) languages. CFL LL(k)

  12. LL Parsing Example It is an LL grammar. The language is also LL. Input to parse: sum , a1 , ptr ; <id_list> := id<id_list_tail> <id_list_tail> := ,id <id_list_tail> <id_list_tail> := ; CFL LL •

  13. <id_list> := id <id_list_tail> <id_list_tail> := ,id <id_list_tail> <id_list_tail> := ; <id_list> • Parse Tree <id_list_tail> <id_list_tail> <id_list_tail> sum , a1 , ptr ;

  14. LR Parser • LR(k) parser • Uses k look-ahead symbols. • Usually k is 1, and the term LR Parser is often intended to refer to this case. • LR(k) Languages • Not all CFL’s are LR(k) languages. CFL LR

  15. Language Relationships Unambiguous languages Ambiguous languages LL(1) LL(0) SLR LR(0) LR(1) LALR

  16. LR Parsing Example With the same grammar, It is also an LR grammar, and the language is LR. Input to parse (as before): sum , a1 , ptr; id_listidid_list_tail id_list_tail,idid_list_tail id_list_tail; CFL LR(1) LL •

  17. <id_list> := id <id_list_tail> <id_list_tail> := ,id <id_list_tail> <id_list_tail> := ; <id_list> • Parse Tree <id_list_tail> <id_list_tail> <id_list_tail> sum , a1 , ptr ;

  18. Another LR Parsing Example Consider a modified grammar, The grammar is not LL, (though the language itself is both LR and LL). <id_list> := <id_list_prefix>; <id_list_prefix> := <id_list_prefix> ,id <id_list_prefix> := id

  19. <id_list> := <id_list_prefix>; <id_list_prefix> := <id_list_prefix> ,id <id_list_prefix> := id • LR Parsing <id_list> <id_list_prefix> <id_list_prefix> <id_list_prefix> sum a1 , , ptr ;

  20. Simple Calculator Language 3 + ( 4 * 1 ) total := 7 read n write ( 10 – ( total + 1 ) / 3 * n )

  21. Simple Arithmetic Expression EE + E | E – E E * E | E / E E id| number| ( E )

  22. exprterm | expradd_op term termfactor | termmult_op factor factor id| number| (expr) add_op+ | - mult_op* | / • Simple Arithmetic Expression • LL language, but not LL grammar (yet LR one) • Two most common obstacles to “LL(1)-ness” • Left-recursion • Common prefixes stmtstmtstmt_list id:=expr id(arg_list)

  23. stmtstmtstmt_list • Converting to LL-Grammars • Alternatively, you can employ conflict-resolution rules. stmt_list stmt stmt_list| є stmtid| stmt_list_tail stmt_list_tail:= expr| (arg_list) stmtid:=expr id(arg_list)

  24. exprterm term_tail term_tailadd_op term term_tail| є termfactor | factor_tail factor_tailmult_op factor factor_tail | є factor (expr)| id |number add_op+ | - mult_op* | / • Converted LL(1) Grammar CFL LL Not every CFG can be converted to LL grammar. Why?

  25. program stmt_list$$ stmt_list stmt stmt_list| є stmt id:=expr|readid|writeexpr exprterm term_tail term_tailadd_op term term_tail| є termfactor factor_tail factor_tailmult_op factor factor_tail | є factor (expr)| id |number add_op+ | - mult_op* | / • LL(1) for Simple Calculator Language Added three more production rules to the previous LL(1) grammar for expressions.

  26. LL Parsing • Input program read A read B sum := A + B write sum write sum / 2

  27. program stmt_list$$ {id, read, write, $$} stmt_list stmt stmt_list{id, read, write}| є{$$} stmt id:=expr{id} readid{read}|writeexpr{write} exprterm term_tail{(, id, number} term_tailadd_op term term_tail{+,-} є{), id, read, write, $$} termfactor factor_tail{(, id, number} factor_tailmult_op factor factor_tail{*, /} є{+, -, ), id, read, write, $$} factor (expr){(}| id{id}|number{number} add_op+{+} | -{-} mult_op*{*} | /{/} • Predict Sets

  28. stmt id:=expr{id} readid{read} writeexpr{write} • Predict Sets • Notice the pair-wise disjoint sets: {id}, {read} ,{write} • You are to expand stmt. • Look ahead 1 token (LL(1)).

  29. program stmt_list$$ stmt_list stmt stmt_list| є stmt id:=expr|readid|writeexpr exprterm term_tail term_tailadd_op term term_tail| є termfactor factor_tail factor_tailmult_op factor factor_tail | є factor (expr)| id |number add_op+ | - mult_op* | / • LL(1)

  30. program stmt_list$$ stmt_liststmt_list stmt| stmt stmt id:=expr|readid|writeexpr exprterm | expradd_op term termfactor | termmult_op factor factor id| number| (expr) add_op+ | - mult_op* | / • Better grammar: LR(1) • More intuitive than LL • However, not exactly the same language (no empty string) • Left-recursive is advantageous.

  31. LR Parsing • With the same input program, read A read B sum := A + B write sum write sum / 2

  32. State 0’ stmt • State Transition Diagram State 0 (Initial state) stmt_list stmt ● program ● stmt_list$$ stmt_list● stmt_list stmt ● stmt stmt ● id:=expr ● readid ● writeexpr Reduce (shifting stmt_list) stmt_list State 2 program stmt_list ●$$ stmt_liststmt_list● stmt stmt ● id:=expr ● readid ● writeexpr read State 1 stmt read●id id Reduce (shifting stmt from a viewpoint of State 0) State 1’ stmt readid●

  33. Shift/Reduce Conflicts • Reduce/Reduce Conflicts expr● term • factor id● • … exprid● • factor id●

  34. Resolving Conflicts • LR(0) • Any LR language has an LR(0) grammar (with $$). • Not practical: prohibitively large and unintuitive • SLR • SLR grammar: no shift/reduce or reduce/reduce conflicts when using FOLLOW sets • FOLLOW sets: also used in LL to generate PREDICT sets • LALR(1) • LALR(1) grammar (may not be SLR) • Same states as SLR • Improvement over SLR with local look-ahead • LALR’s are the most common parsers in practice. • LR(1) • LR(1) grammars (may not be LALR(1) or SLR)

More Related