1 / 55

Scanner

Scanner. Grammar. Language. Recursive Definition. Mathematical Expression. Structure of Expressions. Formal Language. Backus Naur Form (BNF). 1960 by J. Backus and P. Naur. EBNF (Extended BNF). BNF  EBNF. BNF. EBNF. Formalism (Formal notation). N. Chomsky 近代結構語言學之父.

udell
Download Presentation

Scanner

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Scanner

  2. Grammar 2

  3. Language 3

  4. Recursive Definition 4

  5. Mathematical Expression 5

  6. Structure of Expressions 6

  7. Formal Language 7

  8. Backus Naur Form (BNF) 1960 by J. Backus and P. Naur 8

  9. EBNF (Extended BNF) 9

  10. BNF  EBNF BNF EBNF 10

  11. Formalism (Formal notation) • N. Chomsky • 近代結構語言學之父 N. Chromsky - 11

  12. Differing structural trees for the same expression 12

  13. Problem of Different structural trees 13

  14. No Ambiguous Sentence 14

  15. Context Free Language • Syntactic equations of the form defined in EBNF generate context-free languages. • The term "context free” is due to Chomsky and stems from the fact that substitution of the symbol left of = by a sequence derived from the expression to the right of = is always permitted, regardless of the context in which the symbol is embedded within the sentence. • It has turned out that this restriction to context freedom (in the sense of Chomsky) is quite acceptable for programming languages, and that it is even desirable. • Context dependence in another sense, however, is indispensible. We will return to this topic in Chapter 8. 15

  16. Regular Expression • A language is regular, if its syntax can be expressed by a single EBNF expression. • The requirement that a single equation suffices also implies that only terminal symbols occur in the expression. • Such an expression is called a regular expression. 16

  17. Syntax Analysis v.s. Regular Expression • The reason for our interest in regular languages lies in the fact that programs for the recognition of regular sentences are particularly simple and efficient. By "recognition" we mean the determination of the structure of the sentence, and thereby naturally the determination of whether the sentence is well formed, that is, it belongs to the language. Sentence recognition is called syntax analysis. 17

  18. Regular Expression v.s. State Machine • For the recognition of regular sentences a finite automaton, also called a state machine, is necessary and sufficient. In each step the state machine reads the next symbol and changes state. The resulting state is solely determined by the previous state and the symbol read. If the resulting state is unique, the state machine is deterministic, otherwise nondeterministic. If the state machine is formulated as a program, the state is represented by the current point of program execution. 18

  19. EBNF  Program • The analyzing program can be derived directly from the defining syntax in EBNF. For each EBNF construct K there exists a translation rule which yields a program fragment Pr(K). The translation rules from EBNF to program text are shown below. Therein sym denotes a global variable always representing the symbol last read from the source text by a call to procedure next. Procedure error terminates program execution, signaling that the symbol sequence read so far does not belong to the language. 19

  20. Analyzing program 20

  21. EBNF with only 1 rule 21

  22. First() 22

  23. Precondition 23

  24. Lexical Analysis for Identifier 24

  25. Lexical Analysis for Integer 25

  26. Scanner • The process of syntax analysis is based on a procedure to obtain the next symbol. This procedure in turn is based on the definition of symbols in terms of sequences of one or more characters. This latter procedure is called a scanner, and syntax analysis on this second, lower level, lexical analysis. 26

  27. Lexical Analysis v.s. Syntax Analysis 27

  28. A Scanner Example • As an example we show a scanner for a parser of EBNF. Its terminal symbols and their definition in terms of characters are 28

  29. Procedure GetSym() –(1) 29

  30. Procedure GetSym() –(2) 30

  31. Procedure GetSym() –(3) 31

  32. Syntax Analysis Overview • Goal – determine if the input token stream satisfies the syntax of the program • What do we need to do this? • An expressive way to describe the syntax • A mechanism that determines if the input token stream satisfies the syntax description • For lexical analysis • Regular expressions describe tokens • Finite automata = mechanisms to generate tokens from input stream

  33. Just Use Regular Expressions? • REs can expressively describe tokens • Easy to implement via DFAs • So just use them to describe the syntax of a programming language • NO! – They don’t have enough power to express any non-trivial syntax • Example – Nested constructs (blocks, expressions, statements) – Detect balanced braces: { { { { { {{} {} {{} { }}} . . . - We need unbounded counting! - FSAs cannot count except in a strictly modulo fashion } } } } }

  34. Context-Free Grammars • Consist of 4 components: • Terminal symbols = token or  • Non-terminal symbols = syntactic variables • Start symbol S = special non-terminal • Productions of the form LHSRHS • LHS = single non-terminal • RHS = string of terminals and non-terminals • Specify how non-terminals may be expanded • Language generated by a grammar is the set of strings of terminals derived from the start symbol by repeatedly applying the productions • L(G) = language generated by grammar G S  a S a S  T T  b T b T  

  35. CFG - Example • Grammar for balanced-parentheses language • S  ( S ) S • S   • 1 non-terminal: S • 2 terminals: “)”, “)” • Start symbol: S • 2 productions • If grammar accepts a string, there is a derivation of that string using the productions • “(())” • S = (S)  = ((S) S)  = (()  )  = (()) ? Why is the final S required?

  36. More on CFGs • Shorthand notation – vertical bar for multiple productions • S  a S a | T • T  b T b |  • CFGs powerful enough to expression the syntax in most programming languages • Derivation = successive application of productions starting from S • Acceptance? = Determine if there is a derivation for an input token stream

  37. A Parser Context free grammar, G Parser Yes, if s in L(G) No, otherwise Token stream, s (from lexer) Error messages Syntax analyzers (parsers) = CFG acceptors which also output the corresponding derivation when the token stream is accepted Various kinds: LL(k), LR(k), SLR, LALR

  38. RE is a Subset of CFG Can inductively build a grammar for each RE  S   a S  a R1 R2 S  S1 S2 R1 | R2 S  S1 | S2 R1* S  S1 S |  Where G1 = grammar for R1, with start symbol S1 G2 = grammar for R2, with start symbol S2

  39. Grammar for Sum Expression • Grammar • S  E + S | E • E  number | (S) • Expanded • S  E + S • S  E • E  number • E  (S) 4 productions 2 non-terminals (S,E) 4 terminals: “(“, “)”, “+”, number start symbol: S

  40. Constructing a Derivation • Start from S (the start symbol) • Use productions to derive a sequence of tokens • For arbitrary strings α, β, γ and for a production: A  β • A single step of the derivation is • α A γ α β γ (substitute β for A) • Example • S  E + S • (S + E) + E  (E + S + E) + E

  41. Class Problem • S  E + S | E • E  number | (S) • Derive: (1 + 2 + (3 + 4)) + 5

  42. Parse Tree S E + S • Parse tree = tree representation of the • derivation • Leaves of the tree are terminals • Internal nodes are non-terminals • No information about the order of the derivation steps ( S ) E 5 E + S 1 E + S 2 E ( S ) E + S 3 E 4

  43. Parse Tree vs Abstract Syntax Tree S Parse tree also called “concrete syntax” E + S ( S ) E + 5 E + S + 5 1 E + S 1 + 2 2 E + 3 4 ( S ) AST discards (abstracts) unneeded information – more compact format E + S 3 E 4

  44. Derivation Order • Can choose to apply productions in any order, select non-terminal and substitute RHS of production • Two standard orders: left and right-most • Leftmost derivation • In the string, find the leftmost non-terminal and apply a production to it • E + S  1 + S • Rightmost derivation • Same, but find rightmost non-terminal • E + S  E + E + S

  45. Leftmost/Rightmost Derivation Examples • S  E + S | E • E  number | (S) • Leftmost derive: (1 + 2 + (3 + 4)) + 5 S  E + S  (S)+S  (E+S) + S  (1+S)+S  (1+E+S)+S  (1+2+S)+S  (1+2+E)+S  (1+2+(S))+S  (1+2+(E+S))+S  (1+2+(3+S))+S  (1+2+(3+E))+S  (1+2+(3+4))+S  (1+2+(3+4))+E  (1+2+(3+4))+5 • Now, rightmost derive the same input string S  E+S  E+E  E+5  (S)+5  (E+S)+5  (E+E+S)+5  (E+E+E)+5  (E+E+(S))+5  (E+E+(E+S))+5  (E+E+(E+E))+5  (E+E+(E+4))+5  (E+E+(3+4))+5  (E+2+(3+4))+5  (1+2+(3+4))+5 Result: Same parse tree: same productions chosen, but in diff order

  46. Class Problem • S  E + S | E • E  number | (S) | -S • Do the rightmost derivation of : 1 + (2 + -(3 + 4)) + 5

  47. Ambiguous Grammars • In the sum expression grammar, leftmost and rightmost derivations produced identical parse trees • + operator associates to the right in parse tree regardless of derivation order + (1+2+(3+4))+5 + 5 1 + 2 + 3 4

  48. An Ambiguous Grammar • + associates to the right because of the right-recursive production: S  E + S • Consider another grammar • S  S + S | S * S | number • Ambiguous grammar = different derivations produce different parse trees • More specifically, G is ambiguous if there are 2 distinct leftmost (rightmost) derivations for some sentence

  49. Ambiguous Grammar - Example S  S + S | S * S | number Consider the expression: 1 + 2 * 3 Derivation 2: S  S*S  S+S*S  1+S*S  1+2*S  1+2*3 Derivation 1: S  S+S  1+S  1+S*S  1+2*S  1+2*3 * + + 3 1 * 2 3 1 2 Obviously not equal!

  50. Impact of Ambiguity • Different parse trees correspond to different evaluations! • Thus, program meaning is not defined!! * + + 3 1 * 2 3 1 2 = 9 = 7

More Related