1 / 52

Compiler Structures

Compiler Structures. 241-437 , Semester 1 , 2011-2012. Objective describe intermediate code generation explain a stack-based intermediate code for the expression language. 10. Intermediate Code Generation. Overview. 1. Intermediate Code (IC) Generation 2. IC Examples

zuriel
Download Presentation

Compiler Structures

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Compiler Structures 241-437, Semester 1, 2011-2012 • Objective • describe intermediate code generation • explain a stack-based intermediate code for the expression language 10. Intermediate Code Generation

  2. Overview 1. Intermediate Code (IC) Generation 2. IC Examples 3. Expression Translation in SPIM 4. The Expressions Language

  3. Source Program In this lecture Lexical Analyzer Front End Syntax Analyzer Semantic Analyzer Int. Code Generator Intermediate Code Code Optimizer Back End Target Code Generator Target Lang. Prog.

  4. 1. Intermediate Code (IC) Generation • Helps with retargeting • e.g. can easily attach a back end for a new machine to an existing front end • Enables machine-independent code optimization. Targetmachinecode Front end Back end Intermediatecode

  5. Graphical IC Representations • Abstract Syntax Trees (AST) • retains basic parse tree structure, but with unneeded nodes removed • Directed Acyclic Graphs (DAG) • compacted AST to avoid duplication • smaller memory needs • Control Flow Graphs (CFG) • used to model control flow

  6. Linear (text-based) ICs • Stack-based (postfix) • e.g. the JVM • Three-address code x := y op z • Two-address code: x := op y(the same as x := x op y)

  7. 2. IC Examples • ASTs and DAGs • Stack-based (postfix) • Three-address Code • SPIM

  8. 2.1. ASTs and DAGs a := b *-c + b * -c assign assign a + a + * * * AST DAG - - b - b b c c c Pros: easy restructuring of code and/or expressions for intermediate code optimization Cons: memory intensive

  9. 2.2. Stack-based (postfix) a := b * -c + b * -c (e.g. JVM stack instrs) b c uminus * b c uminus * + a assign iload 2 // push biload 3 // push cineg // uminusimul // *iload 2 // push biload 3 // push cineg // uminusimul // *iadd // +istore 1 // store a Postfix notation represents operations on a stack Pro: easy to generate Cons: stack operations are more difficult to optimize

  10. 2.3. Three-Address Code a := b * -c + b * -c t1 := - ct2 := b * t1t3 := - ct4 := b * t3t5 := t2 + t4a := t5 t1 := - ct2 := b * t1t5 := t2 + t2a := t5 Translatedfrom the AST Translated from the DAG

  11. 2.4. SPIM • Three address code for a simulator that runs MIPS32 assembly language programs • http://www.cs.wisc.edu/~larus/spim.html • Loading/Storing • lw register,var - loads value into register • sw register,var - stores value from register • many, many others continued

  12. 8 registers: $t0 - $t7 • Binary math ops (reg1 = reg2 op reg3): • add reg1,reg2,reg3 • sub reg1,reg2,reg3 • mul reg1,reg2,reg3 • div reg1,reg2,reg3 • Unary minus (reg1 = - reg2) • neg reg1, reg2

  13. lw $t0,c neg $t1,$t0 lw $t0,b mul $t2, $t1,$t0 lw $t0,c neg $t1,$t0 lw $t0,b mul $t1, $t1,$t0 add $t1,$t2,$t1 sw $t1,a "a := b * -c + b * -c" in SPIM assign a + t1 AST * * t1 t2 - b - b t0 t1 t0 t1 c c t0 t0

  14. lw $t0,c neg $t1,$t0 lw $t0,b mul $t1, $t1,$t0 add $t2,$t1,$t1 sw $t2,a a := b * -c + b * -c assign a + t2 DAG * t1 - b t1 t0 c t0

  15. 3. Expression Translation in SPIM Generate: lw $t1,b S Grammar: S => id := E E => E + E E => id E E E E E As we parse, use attributes to pass information about the temporary variables up the tree. E E 1 a := b + c + d + e parse tree --> code using bottom-up evaluation

  16. Generate: lw $t1,b lw $t2,c S E E E E E E E 1 2 a := b + c + d + e Each number corresponds to a temporary variable.

  17. Generate: lw $t1,b lw $t2,c add $t3,$t1,$t2 S E E E 3 E E E E 1 2 a := b + c + d + e Each number corresponds to a temporary variable.

  18. Generate: lw $t1,b lw $t2,c add $t3,$t1,$t2 lw $t4,d S E E E 3 E E 4 E E 1 2 a := b + c + d + e

  19. Generate: lw t1,b lw t2,c add $t3,$t1,$t2 lw t4,d add $t5,$t3,$t4 S E 5 E E 3 E E 4 E E 1 2 a := b + c + d + e

  20. Generate: lw $t1,b lw $t2,c add $t3,$t1,$t2 lw $t4,d add $t5,$t3,$t4 lw $t6,e S E 5 6 E E 3 E E 4 E E 1 2 a := b + c + d + e

  21. Generate: lw $t1,b lw $t2,c add $t3,$t1,$t2 lw $t4,d add $t5,$t3,$t4 lw $t6,e add $t7,$t5,$t6 S 7 E 5 6 E E 3 E E 4 E E 1 2 a := b + c + d + e

  22. Generate: lw $t1,b lw $t2,c add $t3,$t1,$t2 lw $t4,d add $t5,$t3,$t4 lw $t6,e add $t7,$t5,$t6 sw $t7,a S 7 E 5 6 E E 3 E E 4 E E 1 2 a := b + c + d + e Pro: easy to rearrange code for global optimizationCons: lots of temporaries

  23. Issues when Processing Expressions • Type checking/conversion. • Address calculation for more complex types (arrays, records, etc.). • Expressions in control structures, such as loops and if tests.

  24. 4. The Expressions Language • exprParse3.c builds a parse tree for the input file (reuses code from exprParse2.c). • An intermediate code is generated from the parse tree, and saved to an output file. • The input file is not executed by exprParse3.c • that is done by a separate emulator.

  25. test1.txt Usage let x = 2 let y = 3 + x > gcc -Wall -o exprParse3 exprParse3.c > ./exprParse3 < test1.txt > cat codeGen.txt PUSH 2 STORE x WRITE PUSH 3 LOAD x ADD STORE y WRITE STOP stores intermediate code in codeGen.txt exprParse3 test1.txt codeGen.txt

  26. Emulator Usage > ./emulator codeGen.txt Reading code from codeGen.txt == 2 == 5 Stop emulator codeGen.txt it runs the intermediate code

  27. 4.1. The Instruction Set • The instructions in codeGen.txt are executed by a emulator. • it emulates (simulates) real hardware • The instructions refer to two data structures used in the emulator.

  28. x 4 The Emulator's Data Structures • The emulator's data structures: • a symbol table of IDs and their integer values • a stack of integers for evaluating the expressions stack 2 symbol table

  29. The Instructions • WRITE // pop top element off stack and print • STOP // exit code emulation • LOAD ID // get ID value from symbol table, and push onto stack • STORE ID // copy stack top into symbol table for ID continued

  30. PUSH integer // push integer onto stack • STORE0 ID // push 0 onto stack, and save to table as value for ID ( same as push 0; store ID) • MULT // pop two stack values, multiply them, push result back • ADD, MINUS, DIV // same for those ops

  31. Intermediate Code Type • Since the intermediate code uses a stack to store values rather than registers, then it is a stack-based (postfix) representation.

  32. 4.2. exprParse3.c Coding • All the parsing code in exprParse3.c is the same as exprParse2.c. • The difference is that the parse tree is passed to a generateCode() function to convert it to intermediate code • see main()

  33. main() #define CODE_FNM "codeGen.txt" // where to store generated code int main(void) /* parse, print the tree, then generate code which is stored in CODE_FNM */ { Tree *t; nextToken(); t = statements(); match(SCANEOF); printTree(t, 0); generateCode(CODE_FNM, t); return 0; }

  34. Generating the Code void generateCode(char *fnm, Tree *t) /*Open the intermediate code file, fnm, and write to it. */ { FILE *fp; if ((fp = fopen(fnm, "w")) == NULL) { printf("Could not write to %s\n", fnm); exit(1); } else { printf("Writing code to %s\n", fnm); cgTree(fp, t); fprintf(fp, "STOP\n"); // last instruction in file fclose(fp); } } // end of generateCode()

  35. void cgTree(FILE *fp, Tree *t) /* Recurse over the parse tree looking for non-NEWLINE subtrees to convert into code Each block of code generated for a non-NEWLINE subtree ends with a WRITE instruction, to print out the value of the line. */ { if (t == NULL) return; Token tok = TreeOper(t); if (tok == NEWLINE) { cgTree(fp, TreeLeft(t)); cgTree(fp, TreeRight(t)); } else { codeGen(fp, t); fprintf(fp, "WRITE\n"); // print value at EOL } } // end of cgTree()

  36. void codeGen(FILE *fp, Tree *t) /* Convert the tree nodes for ID, INT, ASSIGNOP, PLUSOP, MINUSOP, MULTOP, DIVOP into instructions. The load/store instructions: LOAD ID, STORE ID, STORE0 ID, PUSH integer The math instructions: MULT, ADD, MINUS, DIV */ { if (t == NULL) return; : continued

  37. Token tok = TreeOper(t); if (tok == ID) codeGenID(fp, TreeID(t)); else if (tok == INT) fprintf(fp, "PUSH %d\n", TreeValue(t)); else if (tok == ASSIGNOP) { // id = expr char *id = TreeID(TreeLeft(t)); getIDEntry(id); // don't use Symbol info codeGen(fp, TreeRight(t)); fprintf(fp, "STORE %s\n", id); } : continued

  38. else if (tok == PLUSOP) { codeGen(fp, TreeLeft(t)); codeGen(fp, TreeRight(t)); fprintf(fp, "ADD\n"); } else if (tok == MINUSOP) { codeGen(fp, TreeLeft(t)); codeGen(fp, TreeRight(t)); fprintf(fp, "MINUS\n"); } : continued

  39. else if (tok == MULTOP) { codeGen(fp, TreeLeft(t)); codeGen(fp, TreeRight(t)); fprintf(fp, "MULT\n"); } else if (tok == DIVOP) { codeGen(fp, TreeLeft(t)); codeGen(fp, TreeRight(t)); fprintf(fp, "DIV\n"); } } // end of codeGen()

  40. void codeGenID(FILE *fp, char *id) /* An ID may already be in the symbol table, or be new, which is converted into a LOAD or a STORE0 code operation. */ { SymbolInfo *si = NULL; if ((si = lookupID(id)) != NULL) // already declared fprintf(fp, "LOAD %s\n", id); else { // new, so add to table addID(id, 0); // 0 is default value fprintf(fp, "STORE0 %s\n", id); } } // end of codeGenID()

  41. x y 0 0 From Tree to Code let x = 2 let y = 3 + x \n PUSH 2 STORE x WRITE PUSH 3 LOAD x ADD STORE y WRITE STOP \n = y + = NULL x 2 3 x symbol table in exprParse3.c

  42. 4.3. The Emulator > gcc –Wall –o emulator emulator.c > ./emulator codeGen.txt Reading code from codeGen.txt == 2 == 5 Stop

  43. x 4 Emulator Data Structures #define MAX_SYMS 15 // max no of vars #define STACK_SIZE 10 // stack data structure int stack[STACK_SIZE]; int stackTop = -1; // symbol table data structures typedef struct SymInfo { char *id; int value; } SymbolInfo; int symNum = 0; // number of symbols stored SymbolInfo syms[MAX_SYMS]; 2

  44. Evaluating Input Lines void eval(FILE *fp) /* Read in the code file a line at a time and process the lines. An instruction on a line may be a single command (e.g. WRITE) or a instruction name and an argument (e.g. LOAD x). */ { char buf[BUFSIZ]; char cmd[MAX_LEN], arg[MAX_LEN]; int no; : continued

  45. while (fgets(buf, sizeof(buf), fp) != NULL) { no = sscanf(buf, "%s %s\n", cmd, arg); if ((no < 1) || (no > 2)) printf("Unknown format: %s\n", buf); else processCmd(cmd, arg); // process commands as they are read in } } // end of eval()

  46. Processing an Instruction void processCmd(char *cmd, char *arg) { SymbolInfo *si; if (strcmp(cmd, "LOAD") == 0) { if ((si = lookupID(arg)) == NULL) { printf("Error: load cannot find %s\n", arg); exit(1); } push(si->value); } else if (strcmp(cmd, "STORE") == 0) addID(arg, topOf()); else if (strcmp(cmd, "STORE0") == 0) { push(0); addID(arg, 0); } continued

  47. else if (strcmp(cmd, "PUSH") == 0) push( atoi(arg) ); else if (strcmp(cmd, "MULT") == 0) { int v2 = pop(); int v1 = pop(); push( v1*v2 ); } else if (strcmp(cmd, "ADD") == 0) { int v2 = pop(); int v1 = pop(); push( v1+v2 ); } else if (strcmp(cmd, "MINUS") == 0) { int v2 = pop(); int v1 = pop(); push( v1-v2 ); } continued

  48. else if (strcmp(cmd, "DIV") == 0) { int v2 = pop(); if (v2 == 0) { printf("Error: div by 0; using 1\n"); v2 = 1; } int v1 = pop(); push( v1/v2 ); } else if (strcmp(cmd, "WRITE") == 0) printf("== %d\n", pop()); else if (strcmp(cmd, "STOP") == 0) { printf("Stop\n"); exit(1); } continued

  49. else printf("Unknown instruction: %s\n", cmd); } // end of processCmd()

  50. Evaluating the Code for test1.txt test1.txt codeGen.txt PUSH 2 STORE x WRITE PUSH 3 LOAD x ADD STORE y WRITE STOP let x = 2 let y = 3 + x continued

More Related