1 / 28

Introduction to Compilers

Introduction to Compilers. CMSC 431 Shon Vick 01/28/02. What is a compiler?. Translates source code to target code Source code is typically a high level programming language (Java, C++, etc) but does not have to be

Download Presentation

Introduction to Compilers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Compilers CMSC 431 Shon Vick 01/28/02

  2. What is a compiler? • Translates source code to target code • Source code is typically a high level programming language (Java, C++, etc) but does not have to be • Target code is often a low level language like assembly or machine code but does not have to be • Can you think of other compilers that you have used – according to this definition?

  3. Other Compilers • Javadoc -> HTML • SQL Query output -> Table • Poscript -> PDF • High level description of a circuit -> machine instructions to fabricate circuit

  4. The Compilation Process

  5. The analysis Stage • Broken up into four phases • Lexical Analysis (also called scanning or tokenization) • Parsing • Semantic Analysis • Intermediate Code Generation

  6. Lexing Example double d1; double d2; d2 = d1 * 2.0; double TOK_DOUBLE reserved word d1 TOK_ID variable name ; TOK_PUNCT has value of “;” double TOK_DOUBLE reserved word d2 TOK_ID variable name ; TOK_PUNCT has value of “;” d2 TOK_ID variable name = TOK_OPER has value of “=” d1 TOK_ID variable name * TOK_OPER has value of “*” 2.0 TOK_FLOAT_CONST has value of 2.0 ; TOK_PUNCT has value of “;” lexemes

  7. Syntax and Semantics • Syntax - the form or structure of the expressions – whether an expression is well formed • Semantics – the meaning of an expression

  8. Syntactic Structure • Syntax almost always expressed using some variant of a notation called a context-free grammar (CFG) or simply grammar • BNF • EBNF

  9. A CFG has 4 parts • A set of tokens (lexemes), known as terminal symbols • A set of non-terminals • A set of rules (productions) where each production consists of a left-hand side (LHS) and a right-hand side (RHS) The LHS is a non-terminal and the RHS is a sequence of terminals and/or non-terminal symbols. • A special non-terminal symbol designated as the start symbol

  10. An example of BNF syntax for real numbers <r> ::= <ds> . <ds> <ds> ::= <d> | <d> <ds> <d> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7| 8 | 9 < > encloses non-terminal symbols ::= 'is' or 'is made up of ' or 'derives' (sometimes denoted with an arrow ->) | or

  11. Example • On the example from the previous slide: • What are the tokens? • What are the lexemes? • What are the non terminals? • What are the productions?

  12. BNF Points • A non terminal can have more than RHS or an OR can be used • Lists or sequences are expressed via recursion • A derivation is just a repeated set of production (rule) applications • Examples

  13. Example Grammar <program> -> <stmts> <stmts> -> <stmt> | <stmt> ; <stmts> <stmt> -> <var> = <expr> <var> -> a | b | c | d <expr> -> <term> + <term> | <term> - <term> <term> -> <var> | const

  14. Example Derivation <program> => <stmts> => <stmt> => <var> = <expr> => a = <expr> => a = <term> + <term> => a = <var> + <term> => a = b + <term> => a = b + const

  15. Parse Trees • Alternative representation for a derivation • Example parse tree for the previous example stmts stmt expr var = term term + a var const b

  16. Another Example Expression -> Expression + Expression | Expression - Expression | ... Variable | Constant | ... Variable -> T_IDENTIFIER Constant -> T_INTCONSTANT | T_DOUBLECONSTANT

  17. The Parse a + 2 Expression -> Expression + Expression -> Variable + Expression -> T_IDENTIFIER + Expression -> T_IDENTIFIER + Constant -> T_IDENTIFIER + T_INTCONSTANT

  18. Parse Trees PS -> P | P PS P -> e | '(' PS ')' | '<' PS '>' | '[' PS ']' What’s the parse tree for this statement ? < [ ] [ < > ] >

  19. EBNF - Extended BNF • Like BNF except that • Non-terminals start w/ uppercase • Parens are used for grouping terminals • Braces {} represent zero or more occurrences (iteration ) • Brackets [] represent an optional construct , that is a construct that appears either once or not at all.

  20. EBNF example Exp -> Term { ('+' | '-') Term } Term -> Factor { ('*' | '/') Factor } Factor -> '(' Exp ')' | variable | constant

  21. EBNF/BNF • EBNF and BNF are equivalent • How can {} be expressed in BNF? • How can ( ) be expressed? • How can [ ] be expressed?

  22. Semantic Analysis • The syntactically correct parse tree (or derivation) is checked for semantic errors • Check for constructs that while valid syntax do not obey the semantic rules of the source language. • Examples: • Use of an undeclared/un-initialized variable • Function called with improper arguments • Incompatible operands and type mismatches,

  23. Examples void fun1(int i); double d; d = fun1(2.1); int i; int j; i = i + 2; int arr[2], c; c = arr * 10; Most semantic analysis pertains to the checking of types.

  24. Intermediate Code Generation • Where the intermediate representation of the source program is created. • The representation can have a variety of forms, but a common one is called three-address code (TAC) • Like assembly – the TAC is a sequence of simple instructions, each of which can have at most three operands.

  25. Example _t1 = b * c _t2 = b * d _t3 = _t1 + _t2 a = _t3 a = b * c + b * d Note temps

  26. Another Example _t1 = a > b if _t1 goto L0 _t2 = a - c a = _t2 L0: t3 = b * c c = _t3 if (a <= b) a = a - c; c = b * c; Note Temps Symbolic addresses

  27. Next Time • Finish introduction to compilation stages • Read Aho/Sethi/Ullman Chapter 1

  28. Selected References • Compilers Principles, Techniques and Tools, Aho, Sethi, and Ullman • http://www.stanford.edu/class/cs143/

More Related