1 / 25

Compiler 3.2

Compiler 3.2. A Level computing. So, Sir what is a compiler?. Compiler is one which converts source program into object program. Source. Tokens. Interm. Language. Parsing. Today we start. The Structure of a Compiler. Lexical analysis. Code Gen. Machine Code. Optimization.

jerod
Download Presentation

Compiler 3.2

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Compiler 3.2 A Level computing

  2. So, Sir what is a compiler? Compiler is one which converts source program into object program

  3. Source Tokens Interm. Language Parsing Today we start The Structure of a Compiler Lexical analysis Code Gen. Machine Code Optimization

  4. Lexical Analysis (scanner) • Scanner reads characters from the source program • Scanner groups the characters into lexemes • The scanner is called by parser • Each lexeme corresponds to a token • Example: if (i == j) z = 0; else z = 1; • The input is just a sequence of characters: \tif (i == j)\n\t\tz = 0;\n\t\telse\n\t\tz = 1; lexemes

  5. Sir, what the hell then you mean by Token and Lexeme ? A token is a syntactic category • In English: noun, verb, adjective, … • In a programming language: Identifier, Integer, Keyword, Whitespace, … • Given the source code I = I + rate * 60; A Java scanner would return the following sequence of tokens… IDENT ASSIGN IDENT PLUS IDENT TIMESINT-LIT SEMI-COLON tokens

  6. The Parser • Group of tokens into “grammatical phrases”, discovering the underlying structure of the source program • Finds syntax errors. For example, in java source code • I = * 5; Corresponds to the following sequence of tokens IDENT ASSIGN TIMES INT-LIT SEMI COLON All are legal tokens, but sequence of token is erroneous.

  7. Once again Sir, what the hell does the parser do finally? • Might find some “static semantic” errors, e.g., use of an undeclared variable, or operands that are wrongly used with variables. • Might generate code, or build some intermediate representation of the program such as an abstract-syntax tree.

  8. Syntax Analyzer (Parser) num ‘*’ ‘(‘ num ‘+’ num ‘)’ <expr> <expr> <op> <expr> <expr> num * ( ) <expr> <op> <expr> num num +

  9. Semantic Analyzer Consider the Source code: I = I + rate * 60; = (float) I + (float) (float) I + (float) rate* 60 (float) rate * 60 (int) • The semantic analyzer checks for more “static semantic” errors. It may also annotate and or change the abstract syntax tree.

  10. The Intermediate Code Generator • The ICG translates from abstract-syntax tree to intermediate code. One possibility is 3-address code. • Example: temp1 = 60 temp2 = rate * temp1 temp3 = I + temp2 I = temp3

  11. The Optimizer • The Optimizer tries to improve code generated by the intermediate code generator. The optimizer may also try to reduce the code smaller • So.. The previous intermediate code thus becomes temp2 = rate * temp1 I = I + temp2

  12. The Code Generator generates object code finally from the (optimized) intermediate code.Example: sumcalc: xorl %r8d, %r8d xorl %ecx, %ecx movl %edx, %r9d cmpl %edx, %r8d jg .L7 sall $2, %edi .L5: movl %edi, %eax cltd idivl %esi leal 1(%rcx), %edx movl %eax, %r10d imull %ecx, %r10d movl %edx, %ecx imull %edx, %ecx leal (%r10,%rcx), %eax movl %edx, %ecx addl %eax, %r8d cmpl %r9d, %edx jle .L5 .L7: movl %r8d, %eax ret int sumcalc(int a, int b, int N) { int i; int x, t, u, v; x = 0; u = ((a<<2)/b); v = 0; for(i = 0; i <= N; i++) { t = i+1; x = x + v + t*t; v = v + u; } return x; }

  13. So what? So the compiler converts the source program into object program. When doing this process of converting (source to object) it takes the sources code and passes it to different stages: lexical analysis, which reads the source Code character by character and coins it to form the lexemes and these lexemes then are framed into bits of tokens which are then called by Parser. The parser then groups these token into meaningful “grammatical phrases”, finds syntax errors if any and then built a intermediate representation of the program as an “abstract-syntax” tree. This now is passed through another process called Semantic analyzer, which further annotates and augments the “abstract-syntax tree”. Now this “abstract-syntax” tree is translated to an intermediate code by the Intermediate Code Generator. The outcome of the Intermediate code is further optimized by Optimizer. The optimizer can further simplify the codes and passes it to the Code Generator, which generates the desired Object code. These object code are then linked by the linkers and loaded into the memory for execution by the loader.

  14. Program (character stream) Token Stream Anatomy of a ComputerStep by Step(We shall see what happens) Program (character stream) Lexical Analyzer (Scanner) Lexical Analyzer (Scanner) Token Stream

  15. Lexical Analyzer (Scanner) 2 3 4 * ( 1 1 + - 2 2 ) Num(234) mul_op lpar_op Num(11) add_op Num(-22) rpar_op

  16. Variable names cannot have ‘#’ character Not a number Lexical Analyzer (Scanner) 18..23 + val#ue 2 3 4 * ( 1 1 + - 2 2 ) Num(234) mul_op lpar_op Num(11) add_op Num(-22) rpar_op

  17. Token Stream Parse Tree Anatomy of a Computer Program (character stream) Lexical Analyzer (Scanner) Token Stream Syntax Analyzer (Parser) Syntax Analyzer (Parser) Parse Tree

  18. Syntax Analyzer (Parser) num ‘*’ ‘(‘ num ‘+’ num ‘)’ <expr> <expr> <op> <expr> <expr> num * ( ) <expr> <op> <expr> num num +

  19. Extra parentheses Not an expression Not a keyword Missing increment Syntax Analyzer (Parser) int * foo(i, j, k)) int i; int j; { for(i=0; i j) { fi(i>j) return j; }

  20. Parse Tree Intermediate Representation Anatomy of a Computer Program (character stream) Lexical Analyzer (Scanner) Token Stream Syntax Analyzer (Parser) Parse Tree Semantic Analyzer Semantic Analyzer Intermediate Representation

  21. Type not declared Undeclared variable Mismatched return type Uninitialized variable used Semantic Analyzer int * foo(i, j, k) int i; int j; { int x; x = x + j + N; return j; }

  22. Intermediate Representation Optimized Intermediate Representation Anatomy of a Computer Program (character stream) Lexical Analyzer (Scanner) Token Stream Syntax Analyzer (Parser) Parse Tree Semantic Analyzer Intermediate Representation Code Optimizer Code Optimizer Optimized Intermediate Representation

  23. Optimizer int sumcalc(int a, int b, int N) { int i; int x, t, u, v; x = 0; u = ((4*a)b*i); v = 0; for(i = 0; i <= N; i++) { t = i+1; x = x + u + t*t; } return x; } int sumcalc(int a, int b, int N) { int i; int x, y; x = 0; y = 0; for(i = 0; i <= N; i++) { x = x+4*a/b*i+(i+1)*(i+1); } return x; }

  24. Optimized Intermediate Representation Assembly code Anatomy of a Computer Program (character stream) Lexical Analyzer (Scanner) Token Stream Syntax Analyzer (Parser) Parse Tree Semantic Analyzer Intermediate Representation Code Optimizer Optimized Intermediate Representation Code Generator Code Generator Assembly code

  25. Code Generator sumcalc: xorl %r8d, %r8d xorl %ecx, %ecx movl %edx, %r9d cmpl %edx, %r8d jg .L7 sall $2, %edi .L5: movl %edi, %eax cltd idivl %esi leal 1(%rcx), %edx movl %eax, %r10d imull %ecx, %r10d movl %edx, %ecx imull %edx, %ecx leal (%r10,%rcx), %eax movl %edx, %ecx addl %eax, %r8d cmpl %r9d, %edx jle .L5 .L7: movl %r8d, %eax ret int sumcalc(int a, int b, int N) { int i; int x, t, u, v; x = 0; u = ((a<<2)/b); v = 0; for(i = 0; i <= N; i++) { t = i+1; x = x + v + t*t; v = v + u; } return x; }

More Related