1 / 46

Compilers

Basic Compiler Functions Machine-Dependent Compiler Features Machine-Independent Compiler Features Compiler Design Options Implementation Examples. C H A P T E R 5. Compilers. Basic Compiler Functions. Grammars Lexical Analysis Syntactic Analysis Code Generation.

kevin-long
Download Presentation

Compilers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Basic Compiler Functions Machine-Dependent Compiler Features Machine-Independent Compiler Features Compiler Design Options Implementation Examples C H A P T E R5 Compilers

  2. Basic Compiler Functions • Grammars • Lexical Analysis • Syntactic Analysis • Code Generation

  3. High-Level Programming Language • A high-level programming language is described in terms of a grammar, which specifies the syntax of legal statements. • An assignment statement: • a variable name + an assignment operator + an expression 1 PROGRAM STATS ;2 VAR3 SUM, SUMSQ, I, VALUE, MEAN, VARIANCE : INTEGER ;4 BEGIN5 SUM := 0 ;6 SUMSQ := 0 ;7 FOR I := 1 TO 100 DO 8 BEGIN9 READ( VALUE ) ;10 SUM := SUM + VALUE ;11 SUMSQ := SUMSQ + VALUE * VALUE ;12 END ;13 MEAN := SUM DIV 100 ;14 VARIANCE := SUMSQ DIV 100 - MEAN * MEAN ; 15 WRITE( MEAN, VARIANCE ) 16 END .

  4. Compilation: matching statements (written by programmers) to structures (defined by the grammar) and generating the appropriate object code Lexical analysis (scanning) Scanning the source statement, recognizing and classifying the various tokens, including keywords, variable names, data types, operators, etc. Syntactic analysis (parsing) Recognizing each statement as some language construct described by the grammar Semantics (code generation) Generation of the object code Compiler

  5. A grammar is a formal description of the syntax BNF (Backus-Naur Form): A simple and widely used notations for writing grammars introduced by John Backus and Peter Naur in about 1960. Meta-symbols of BNF: ::= "is defined as" | "or" < > angle brackets used to surround non-terminal symbols A BNF rule defining a nonterminal has the form: nonterminal ::= sequence_of_alternatives consisting of strings of terminals (tokens) or nonterminals separated by the meta-symbol | Grammars

  6. G = <N, T, , P> N: Nonterminal Symbol Set T: Terminal Symbol Set : Start Symbol,   N P: Production Rule Set,  ,   (N T)*,   , 為空字串 N  T =  Grammar

  7. G = <N, T, , P> N = {A, B, S, T, } T = {0, 1} P = {S, S1A, A1A, A 0B, B 1T, T }  ,   (N T)*,   , 為空字串 S1A1(1A)1+A 1+0B 1+01T 1+01 Grammar * 1 0 1 S 1 B T A

  8. 4 Language/Grammar/Machine Types

  9. 4 Language/Grammar/Machine Types

  10. Definition: I: Input Set Rules:  is a regular expression (表示空字串)  a  I, a is a RE If R, S are RE, R | S is a RE If R, S are RE, RS is a RE If R is a RE, (R) is a RE If R is a RE, R* is a RE If R is a RE, R+ is a RE Regular Set Regular Expression

  11. (a | b)* aba Nondeterministic Finite Automata Deterministic Finite Automata Regular Expression b b a 1 a 3 4 2 a a a b a 3 4 2 b b a 1 b

  12. Homework • Give deterministic finite automata (DFA) accepting the following languages over the alphabet {0,1} : • The set of all strings with three consecutive 0’s. • The set of all strings ending in 00. • The set of all strings such that every block of five consecutive symbols contains at least two 0’s. • The set of all strings beginning with a 1 which, interpreted as the binary representation of an integer, is congruent to zero modulo 3. • The set of all strings not containing 101 as a substring

  13. b*a+b (ab|b+a+b|aa+b)*a b*a+b ((a|b+a+|aa+)b)*a b*a+b ((b+a+|a+)b)*a b*a+b (b*a+b)*a (b*a+b)+a b*a+b… b*a+b b*a+ba b*a+b… b*a+b b*a*aba b*(a+b+)*a*aba (b*a*)*aba (a | b)*aba Regular Expression

  14. Simplified Pascal Grammar 1 <prog> ::= PROGRAM <prog-name> VAR <dec-list> BEGIN <stmt-list> END.2 <prog-name> ::= id3 <dec-list> ::= <dec> | <dec-list> ; <dec>4 <dec> ::= <id-list> : <type> 5 <type> ::= INTEGER6 <id-list> ::= id | <id-list> , id 7 <stmt-list> ::= <stmt> | <stmt-list> ; <stmt>8 <stmt> ::= <assign> | <read> | <write> | <for> 9 <assign> ::= id := <exp>10 <exp> ::= <term> | <exp>+<term> | <exp> - <term> 11 <term> ::= <factor> | <term>*<factor> | <term> DIV <factor>12 <factor> ::= id | int | ( <exp> )13 <read> ::= READ( <id-list> )14 <write> ::= WRITE( <id-list> )15 <for> ::= FOR <index-exp> DO <body>16 <index-exp> ::= id := <exp> TO <exp> 17 <body> ::= <stmt> | BEGIN <stmt-list> END Recursive rule

  15. Parse Tree (Syntax Tree) READ(VALUE) VARIANCE:=SUMSQ DIV 100 – MEAN*MEAN The multiplication and division precede the addition and subtraction

  16. Parse Tree

  17. Parse Tree

  18. Lexical Analysis • Tokens might be defined by grammar rules to be recognized by the parser: <ident> ::= <letter> | <ident><letter> | <ident><digit> <letter> ::= A | B | C | D | … | Z <digit> ::= 0 | 1 | 2 | 3 | … | 9 • For better efficiency, a scanner can be used instead to recognize and output the tokens in a sequence represented by fixed-length codes and the associated token specifiers.

  19. Lexical Scan

  20. Modeling Scanners as Finite Automata • Tokens can often be recognized by a finite automaton, which consists of • A finite set of states (including a starting state and one or more final states) • A set of transtitions from one state to another

  21. Finite Automata for Typical Tokens

  22. Finite Automata for Tokens from Fig.5.5

  23. Token Recognition Algorithm

  24. Syntactic Analysis • Operator-Precedence Parsing • Recursive-Descent Parsing

  25. Syntactic analysis: building the parse tree for the statements being translated Parse tree Root: goal grammar rule Leaves: terminal symbols Methods: Bottom-up: operator-precedence parsing Top-down: recursive-descent parsing Syntactic Analysis

  26. The operator-precedence method uses the precedence relation between consecutiveoperators to guide the parsing processing. A + B * C - D Subexpression B*C is to be computed first because * has higher precedence than the surrounding operators, this means that * appears at a lower level than does + or – in the parse tree. Precedence:     < < > >  = Operator-Precedence Parsing

  27. Precedence Matrix Empty means that these two tokens cannot appear together

  28. Example: READ ( VALUE )

  29. Example: VARIANCE:=SUMSQ DIV 100 – MEAN*MEAN

  30. Example: VARIANCE:=SUMSQ DIV 100 – MEAN*MEAN

  31. Operator-precedence parsing can deal with the operator grammars having the property that no production right side has two adjacent nonterminals. Shift-reduce parsing is a more general bottom-up parsing method for LR(k) grammar. It makes use of a stack to store tokens that have not yet been recognized. Actions: Shift: push the current token onto the stack Reduce: recognize symbols on top of the stack according to a grammar rule. Shift-Reduce Parsing

  32. Example: READ ( VALUE )

  33. A recursive-descent parser is made up of a procedure for each nonterminal symbol in the grammar. The procedure attempts to find a substring of the input that can be interpreted as the nonterminal. The procedure may call other procedures, or even itself recursively, to search for other nonterminals. The procedure must decide which alternative in the grammar rule to use by examining the next input token. Top-down parsers cannot be directly used with a grammar containing immediate left recursion. Recursive-Descent Parsing

  34. Modified Grammar without Left Recursion still recursive, but a chain of calls always consume at least one token

  35. Recursive-Descent Procedure for READ Statement

  36. Example: READ ( VALUE )

  37. Recursive-Descent Procedure for Assignment Statement

  38. Recursive-Descent Procedure for Assignment Statement

  39. Example: VARIANCE:=SUMSQ DIV 100 – MEAN*MEAN

  40. When the parser recognizes a portion of the source program according to some rule of the grammar, the corresponding semantic routine (code generation routine) is executed. As an example, symbolic representation of the object code for a SIC/XE machine is generated. Two data structures are used for working storage: A list (associated with a variable LISTCOUNT) A stack Code Generation

  41. Example: READ ( VALUE ) placed in register L Argument passing

  42. Example:VARIANCE:=SUMSQ DIV 100 – MEAN*MEAN

  43. Example:VARIANCE:=SUMSQ DIV 100 – MEAN*MEAN

  44. Other Code-Generation Routines

  45. Other Code-Generation Routines

  46. Symbolic Representation of the Generated Object Code

More Related