1 / 32

Introduction to Compilers

Introduction to Compilers. Professor Yihjia Tsai 2006 Spring Tamkang University. What is a compiler?. Translates source code to target code Source code is typically a high level programming language (Java, C++, etc) but does not have to be

lynley
Download Presentation

Introduction to Compilers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Compilers Professor Yihjia Tsai 2006 Spring Tamkang University

  2. What is a compiler? • Translates source code to target code • Source code is typically a high level programming language (Java, C++, etc) but does not have to be • Target code is often a low level language like assembly or machine code but does not have to be • Can you think of other compilers that you have used – according to this definition?

  3. Before we begin • * star • + plus • , comma • - hyphen, minus • / slash • : colon • ; semicolon • < less than • = equal • A-Z, a-z, 0-9 • “ double quote • # hash • $ dollar sign • % percent • & ampersand • ‘ single quote • ( left parenthesis • ) right parenthesis

  4. Symbols • ` back quote • { open brace • | or • } close brace • ~ tilde • . period, dot •  bullet • > greater than • ? question mark • @ at sign • [ left (open) square bracket • \ back slash • ] right (close) square bracket • ^ caret, power • _ underscore

  5. Greek symbols •  mu •  nu •  xi •  pi •  rho •  sigma •  tau •  chi •  psi •  eta •  omega •  alpha •  beta •  gamma •  delta •  epsilon •  phi •  zeta •  theta •  iota •  kappa •  lambda

  6. Other Compilers • Javadoc -> HTML • XML -> HTML • SQL Query output -> Table • Poscript -> PDF • High level description of a circuit -> machine instructions to fabricate circuit

  7. The Compilation Process

  8. The analysis Stage • Broken up into four phases • Lexical Analysis (also called scanning or tokenization) • Parsing • Semantic Analysis • Intermediate Code Generation

  9. Lexing Example double d1; double d2; d2 = d1 * 2.0; double TOK_DOUBLE reserved word d1 TOK_ID variable name ; TOK_PUNCT has value of “;” double TOK_DOUBLE reserved word d2 TOK_ID variable name ; TOK_PUNCT has value of “;” d2 TOK_ID variable name = TOK_OPER has value of “=” d1 TOK_ID variable name * TOK_OPER has value of “*” 2.0 TOK_FLOAT_CONST has value of 2.0 ; TOK_PUNCT has value of “;” lexemes

  10. Syntax and Semantics • Syntax - the form or structure of the expressions – whether an expression is well formed • Semantics – the meaning of an expression

  11. Syntactic Structure • Syntax almost always expressed using some variant of a notation called a context-free grammar (CFG) or simply grammar • BNF • EBNF

  12. A CFG has 4 parts • A set of tokens (lexemes), known as terminal symbols • A set of non-terminals • A set of rules (productions) where each production consists of a left-hand side (LHS) and a right-hand side (RHS) The LHS is a non-terminal and the RHS is a sequence of terminals and/or non-terminal symbols. • A special non-terminal symbol designated as the start symbol

  13. An example of BNF syntax for real numbers <r> ::= <ds> . <ds> <ds> ::= <d> | <d> <ds> <d> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7| 8 | 9 < > encloses non-terminal symbols ::= 'is' or 'is made up of ' or 'derives' (sometimes denoted with an arrow ->) | or

  14. Example • On the example from the previous slide: • What are the tokens? • What are the lexemes? • What are the non terminals? • What are the productions?

  15. Token vs. lexeme • to·ken One that represents a group, as an employee whose presence is used to deflect from the employer criticism or accusations of discrimination. • to·ken A basic, grammatically indivisible unit of a language such as a keyword, operator or identifier. • lexeme A minimal unit (as a word or stem) in the lexicon of a language; `go' and `went' and `gone' and `going' are all members of the English lexeme `go' • lexeme A minimal lexical unit of a language. Lexical analysis converts strings in a language into a list of lexemes. For a programming language these word-like pieces would include keywords, identifiers, literals and punctuations. The lexemes are then passed to the parser for syntactic analysis.

  16. BNF Points • A non terminal can have more than RHS or an OR can be used • Lists or sequences are expressed via recursion • A derivation is just a repeated set of production (rule) applications • Examples

  17. Example Grammar <program> -> <stmts> <stmts> -> <stmt> | <stmt> ; <stmts> <stmt> -> <var> = <expr> <var> -> a | b | c | d <expr> -> <term> + <term> | <term> - <term> <term> -> <var> | const

  18. Example Derivation <program> => <stmts> => <stmt> => <var> = <expr> => a = <expr> => a = <term> + <term> => a = <var> + <term> => a = b + <term> => a = b + const

  19. Parse Trees • Alternative representation for a derivation • Example parse tree for the previous example stmts stmt expr var = term term + a var const b

  20. Another Example Expression -> Expression + Expression | Expression - Expression | ... Variable | Constant | ... Variable -> T_IDENTIFIER Constant -> T_INTCONSTANT | T_DOUBLECONSTANT

  21. The Parse a + 2 Expression -> Expression + Expression -> Variable + Expression -> T_IDENTIFIER + Expression -> T_IDENTIFIER + Constant -> T_IDENTIFIER + T_INTCONSTANT

  22. Parse Trees PS -> P | P PS P -> e | '(' PS ')' | '<' PS '>' | '[' PS ']' What’s the parse tree for this statement ? < [ ] [ < > ] >

  23. EBNF - Extended BNF • Like BNF except that • Non-terminals start w/ uppercase • Parens are used for grouping terminals • Braces {} represent zero or more occurrences (iteration ) • Brackets [] represent an optional construct , that is a construct that appears either once or not at all.

  24. EBNF example Exp -> Term { ('+' | '-') Term } Term -> Factor { ('*' | '/') Factor } Factor -> '(' Exp ')' | variable | constant

  25. EBNF/BNF • EBNF and BNF are equivalent • How can {} be expressed in BNF? • How can ( ) be expressed? • How can [ ] be expressed?

  26. Semantic Analysis • The syntactically correct parse tree (or derivation) is checked for semantic errors • Check for constructs that while valid syntax do not obey the semantic rules of the source language. • Examples: • Use of an undeclared/un-initialized variable • Function called with improper arguments • Incompatible operands and type mismatches,

  27. Examples void fun1(int i); double d; d = fun1(2.1); int i; int j; i = i + 2; int arr[2], c; c = arr * 10; Most semantic analysis pertains to the checking of types.

  28. Intermediate Code Generation • Where the intermediate representation of the source program is created. • The representation can have a variety of forms, but a common one is called three-address code (TAC) • Like assembly – the TAC is a sequence of simple instructions, each of which can have at most three operands.

  29. Example _t1 = b * c _t2 = b * d _t3 = _t1 + _t2 a = _t3 a = b * c + b * d Note: temps

  30. Another Example _t1 = a > b if _t1 goto L0 _t2 = a - c a = _t2 L0: t3 = b * c c = _t3 if (a <= b) a = a - c; c = b * c; Note Temps Symbolic addresses

  31. Next Time • Finish introduction to compilation stages • Read Appel Chapter 1, and 2 if you have not already done so. • What is a splay tree?

  32. Selected References • Appel, A., Modern Compiler Implementation In Java (2nd Ed), Cambridge University Press, 2002. ISBN 052182060X. • Aho, A.V., R. Sethi, and J.D. Ullman, Compilers Principles, Techniques and Tools, Addison-Wesley, 1988. ISBN 0-201-10088-6. • Muchnick, S., Advanced Compiler Design and Implementation, Morgan Kaufmann, 1998. ISBN 1-55860-320-4.

More Related