1 / 143

COMP3190: Principle of Programming Languages

COMP3190: Principle of Programming Languages. Formal Language Syntax. Motivation. The problem of parsing structured text is very common Consider the structure of email addresses (using a grammar): <emailAddress> := <person> @ <host> <person> := <word> <host> := <word> | <word>.<host>

lula
Download Presentation

COMP3190: Principle of Programming Languages

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. COMP3190: Principle of Programming Languages Formal Language Syntax

  2. Motivation The problem of parsing structured text is very common Consider the structure of email addresses (using a grammar): <emailAddress> := <person> @ <host> <person> := <word> <host> := <word> | <word>.<host> Describe and recognize email addresses in arbitrary text.

  3. Outline • DFA & NFA • Regular expression • Regular languages • Context free languages &PDA • Scanner • Parser

  4. Deterministic Finite Automata (DFA) • Q: finite set of states • Σ: finite set of “letters” (alphabet) • δ: QxΣ -> Q (transition function) • q0: start state (in Q) • F : set of accept states (subset of Q) • Acceptance: input consumed with the automata in a final state.

  5. Example of DFA 0 1 1 q2 q1 0 Accepts all strings that end in 1

  6. Another Example of a DFA S b a b a r1 q1 b a a b q2 r2 a b Accepts all strings that start and end with “a” OR start and end with “b”

  7. Non-deterministic Finite Automata (NFA) Transition function is different • δ: QxΣε-> P(Q) • P(Q) is the powerset of Q (set of all subsets) • Σε is the union of Σ and the special symbol ε (denoting empty) String is accepted if there is at least one pathleading to an accept state, and input consumed.

  8. Example of an NFA 0, 1 0, 1 0, ε 1 1 q1 q2 q3 q4 What strings does this NFA accept?

  9. Outline • DFA & NFA • Regular expression • Regular languages • Context free languages &PDA • Scanner • Parser

  10. Regular Expressions R is a regular expression if R is • “a” for some a in Σ. • ε (the empty string). • member of the empty language. • the union of two regular expressions. • the concatenation of two regular expr. • R1* (Kleene closure: zero or more repetitions of R1).

  11. Regular Expression Notation • a: an ordinary letter • ε: the empty string • M | N: choosing from M or N • MN: concatenation of M and N • M*: zero or more times (Kleene star) • M+: one or more times • M?: zero or one occurence • [a-zA-Z] character set alternation (choice) • . period stands for any single char exc. newline

  12. Examples of Regular Expressions {0, 1}* 0 all strings that end in 0 {0, 1} 0* string that start with 1 or 0 followed by zero or more 0s. {0, 1}* all strings {0n1n, n >=0} not a regular expression!!!

  13. Converting a Regular Expression to an NFA a N M ε MN ε ε M ε ε M ε N M* M|N

  14. Regular expression->NFA Language: Strings of 0s and 1s in which the number of 0s is even Regular expression: (1*01*0)*1*

  15. Converting an NFA to a DFA • For set of states S, closure(S) is the set of states that can be reached from S without consuming any input. • For a set of states S, DFAedge(s, c) is the set of states that can be reached from S by consuming input symbol c. • Each set of NFA states corresponds to one DFA state (hence at most 2n states).

  16. NFA -> DFA Initial classes:{A, B, E}, {C, D} No class requires partitioning! Hence a two-state DFA is obtained.

  17. Obtaining the minimal equivalent DFA • Initially two equivalence classes: final and nonfinal states. • Search for an equivalence class C and an input letter a such that with a as input, the states in C make transitions to states in k>1 different equivalence classes. • Partition C into k classes accordingly • Repeat until unable to find a class to partition.

  18. Example (cont.)

  19. Outline • DFA & NFA • Regular expression • Regular languages • Context free languages &PDA • Scanner • Parser

  20. Regular Grammar • Later definitions build on earlier ones • Nothing defined in terms of itself (no recursion) Regular grammar for numeric literals in Pascal:digit -> 0|1|2|...|8|9 unsigned_integer -> digit digit* unsigned_number -> unsigned_integer (( . unsigned_integer) | ε ) (( e (+ | - | ε ) unsigned_integer ) | ε )

  21. Languages and Automata in Programming Languages • Regular languages • Recognized(accepted) by finite automata • Useful for tokenizing program text (lexical analysis) • Context-free languages • Recognized(accepted) by pushdown automata • Useful for parsing the syntax of a program

  22. Important Theorems • A language is regular if a regular expression describes it. • A language is regular if a finite automata recognizes it. • DFAs and NFAs are equally powerful.

  23. Outline • DFA & NFA • Regular expression • Regular languages • Context free languages &PDA • Scanner • Parser

  24. Context-free Grammars • Context-free grammars are defined by substitution rules • Big Jim ate gree cheesegreen Jim ate green cheese • Jim ate cheese • Cheese ate Jim

  25. Context-free Grammars • Context-free grammars are used to formally describe the syntax of programming languages. • Every syntactically correct program is derived using the context-free grammar of the language. • Parsing a program involves tracing such derivation, given the context-free grammar and the program.

  26. Context-free Grammars A context-free grammar consists of • V: a finite set of variables • Σ: a finite set of terminals • R: a finite set of rules of the formvariable -> {variable, terminal}* • S: the start variable

  27. Pushdown Automata (PDA) • A pushdown automata consists of • Q: a set of states • Σ: input alphabet (of terminals) • Γ: stack alphabet • δ: a set of transition rulesQx Σεx Γε-> P(Qx Γε)currentState, inputSymbol, headOfStack ->newState, pushSymbolOnStack • q0: the start state • F: the set of accept states (subset of Q) Deterministic: At most one move is possible from any configuration

  28. How does a PDA accept? • By final state: • Consume all the input while • Reaching a final state • By empty stack: • Consume all the input while • Having an empty stack • Set of final states is irrelevant

  29. Example of a PDA ε, ε ->$ 0, ε->0 q2 q1 1, 0->ε ε, $->ε q3 q4 1, 0->ε Notation: a, b->c: when PDA reads “a” from input, it replaces “b” at the top of stack with “c”. What does this PDA accept?

  30. Important Theorems • A language is context-free iff a pushdown automata recognizes it • Non-deterministic PDA are more powerful than deterministic ones

  31. Example of Context-free Language That Requires a Non-deterministic PDA {w wR | w belongs to {0, 1}*} i.e. wR is w written backwards Idea: Non-deterministically guess the middle of the input string

  32. The Solution ε, ε ->$ 0, ε->0 1, ε->1 q2 q1 ε, ε->ε ε, $->ε q3 q4 1, 1->ε0, 0->ε

  33. Derivations and Parse Trees Nested constructs require recursion, i.e. context-free grammars CFG for arithmetic expressions expression -> identifier | number | - expression | (expression) | expression operator expression operator -> + | - | * | /

  34. Parse Tree for Slope*x + Intercept Is this the only parse tree for this expression and grammar?

  35. A Better Expression Grammar 1. expression -> term | expression add_op term 2. term -> factor | term mult_op factor 3. factor -> identifier | number | - factor | (expression) 4. add_op -> + | - 5. mult_op -> * | / A good grammar reflects the internal structure of programs. This grammar is unambiguous and captures (HOW?):- operator precedence (*,/ bind tighter than +,- )- associativity (ops group left to right)

  36. And Better Parse Trees... 3 + 4 * 5 10 - 4 - 3

  37. Syntax-directed Compilation • Parser calls scanner to obtain tokens. • Assembles tokens into parse tree. • Passes tree to later phases of compilation. • Scanner: deterministic finite automata. • Parser: pushdown automata. • Scanners and parsers can be generated automatically from regular expressions and CFGs (e.G. lex/yacc).

  38. Outline • DFA & NFA • Regular expression • Regular languages • Context free languages &PDA • Scanner • Parser

  39. Scanning • Accept the longest possible token in each invocation of the scanner. • Implementation. • Capture finite automata. • Case(switch) statements. • Table and driver.

  40. Scanner for Pascal

  41. Scanner for Pascal(case Statements)

  42. Scanner (Table&driver)

  43. Scanner Generators • Start with a regular expression. • Construct an NFA from it. • Use a set of subsets construction to obtain an equivalent DFA. • Construct the minimal equivalent DFA.

  44. Outline • DFA & NFA • Regular expression • Regular languages • Context free languages &PDA • Scanner • Parser • Top-down parsing • Bottom-up Parsing • Comparison

  45. Parsing approaches • Parsing in general has O(n3) cost. • Need classes of grammars that can be parsed in linear time • Top-down or predictive parsing orrecursive descent parsingor LL parsing (Left-to-right Left-most) • Bottom-up or shift-reduce parsing orLR parsing (Left-to-right Right-most)

  46. A Simple Grammar for a Comma-separated List of Identifiers id_list -> id id_list_tail id_list_tail -> , id id_list_tail id_list_tail -> ; _________________________ String to be parsed: A, B, C;

  47. Top-down/bottom-up Parsing

  48. Outline • DFA & NFA • Regular expression • Regular languages • Context free languages &PDA • Scanner • Parser • Top-down parsing • Bottom-up Parsing • Comparison

  49. Top-down Parsing • Predicts a derivation • Matches non-terminal against token observed in input

  50. LL(1) Grammar • A grammar for which a top-down deterministic parser can be produced with one token of look-ahead. • LL(1) grammar: • For a given non-terminal, the lookahead symbol uniquely determines the production to apply • Top-down parsing = predictive parsing • Driven by predictive parsing table of • non-terminals x terminals  productions

More Related