1 / 62

Optimization of the mMIPS

Optimization of the mMIPS. Sander Stuijk. Outline. mMIPS tool flow Extending the LCC compiler Video processing I/O operations on the mMIPS Assignment. hardware simulator. Design flow. test. implementation. Application (C sources). LCC C Compiler. Celoxica create memory. LCC C

felcia
Download Presentation

Optimization of the mMIPS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Optimization of the mMIPS Sander Stuijk

  2. Outline • mMIPS tool flow • Extending the LCC compiler • Video processing • I/O operations on the mMIPS • Assignment

  3. hardware simulator Design flow test implementation Application (C sources) LCC C Compiler Celoxica create memory LCC C Compiler sw Celoxica transfer mMIPS (C++ sources that use SystemC library) hw Visual C++ CoCentric SystemC Compiler Xilinx XST Xilinx ISE

  4. The synthesis repport is stored in the file mmips.srp What is the maximal frequency of the mMIPS? maximal frequency critical path

  5. Critical path

  6. Critical path

  7. Forwarding option 1 mux mux

  8. Forwarding option 2 mux mux

  9. Forwarding • Register 0 has always the value 0, independent of whatever value is written to it • Register 0 should never be forwarded • Forward always the newest data • EX goes before MEM, MEM goes before WB, WB goes before ID • The two source registers can come from the same or a different pipeline stage • The two forwarding multiplexers should be controlled separately • The read operations (lw, lb) have 1 delay slot, the data from the memory is only valid in the WB stage • There may exist a data hazard between a load instruction in the MEM stage and an instruction in the ID stage, this hazard cannot be solved with forwarding

  10. Forwarding – WB stage • Hazard detection in the WB stage else if (memwbregwrite_t == 1 && ((memwbwriteregister_t == ifidreadregister1_t) || (memwbwriteregister_t == ifidreadregister2_t))) { hazard = 1; } • Forwarding from the WB stage if (memwbregwrite_t == 1 && memwbwriteregister_t == ifidreadregister1_t && memwbwriteregister_t != 0) { forwardA.write(3); } if (memwbregwrite_t == 1 && memwbwriteregister_t == ifidreadregister2_t && memwbwriteregister_t != 0) { forwardB.write(3); }

  11. Write Read REG REG Registerfile and the write-back stage Input Write Output H&P The data is available at the output of the register file during the current cycle Input Write Output The data is available at the output of the register file during the next cycle mMIPS

  12. Outline • mMIPS tool flow • Extending the LCC compiler • Video processing • I/O operations on the mMIPS • Assignment

  13. hardware simulator Design flow test implementation Application (C sources) LCC C Compiler Celoxica create memory LCC C Compiler sw Celoxica transfer mMIPS (C++ sources that use SystemC library) hw Visual C++ CoCentric SystemC Compiler Xilinx XST Xilinx ISE

  14. LCC compiler: it’s a C compiler • Consider the following code fragment: for (int i = 0; i < 3; i++) a[i] = ...; • It should be: int i; for (i = 0; i < 3; i++) { a[i] = ...; } lcc prog.c –o mips_mem.bin

  15. Adding special functions Examples • swap, clip, bit-masking, multiply-accumulate, ... Constraints • At most 2 input operands and 1 output operand • Manifest loop bounds • Clock frequency • Chip area

  16. Securing our skies • Measure height each second • The airplane may never be for more then 1 second below 10000ft • If needed, take appropriate action…

  17. Securing our skies • Measure height each second • The airplane may never be for more then 1 second below 10000ft • If needed, take appropriate action…

  18. #define TRUE 1 #define FALSE 0 int launch(int height1, int height2) { int l; if (height1 < 10000 && height2 < 10000) l = TRUE; else l = FALSE; return l; } void main(void) { int height1, height2; int l; while (TRUE) { l = launch(height1, height2); sleep(1); } } missile.c

  19. Assembler int launch(int height1, int height2) { int l; if (height1 < 10000 && height2 < 10000) l = TRUE; else l= FALSE; return l; } 80: addiu sp,sp,-8 84: li t8,10000 88: slt s8,a0,t8 8c: beqz s8,0xac 90: nop 94: slt s8,a1,t8 98: beqz s8,0xac 9c: nop a0: li t8,1 a4: b 0xb0 a8: sw t8,4(sp) ac: sw zero,4(sp) b0: lw v0,4(sp) b4: jr ra b8: addiu sp,sp,8 lcc missile.c –o missile disas missile

  20. launch height 1 launch height 2 5 bits 6 bits 5 bits 5 bits 6 bits 5 bits shamt rd rs funct opcode rt Adding a special function to the mMIPS (overview) • New mMIPS instruction: launch • Select an opcode and function code opcode → 0 functioncode → 0x10 (not yet used)

  21. a a 1 2 3 0 1 1 3 ADDRLP4 ADDRLP4 ADDRGP4 CNSTI4 CNSTI4 CNSTI4 CNSTI4 ASGNI4 3 2* INDIRI4 The LCC data representation is called an Abstract Syntax Tree (AST). The data representation is converted to assembler using rules. Rules map a set of nodes (one or more) onto assembler instructions. NEI4 RETI4 RETI4 4 JUMPV Converting a C program to the LCC IR 0: int main(void) { 1: int a = 3; 2: if (a == 3) 3: return 1; 4: return 0; 5: }

  22. the node the assembler instruction two source registers one output register weight What does a rule look like? A rule for adding two unsigned integer (4 bytes): reg: ADDU4 (reg,reg) "\taddu $%c,$%0,$%1\n" 1 %1 – The first source operand register %2 – The second source operand register %c – The destination register

  23. a a 1 2 3 1 1 3 0 ADDRLP4 ADDRLP4 ADDRGP4 CNSTI4 CNSTI4 CNSTI4 CNSTI4 ASGNI4 3 2* INDIRI4 NEI4 RETI4 RETI4 4 JUMPV Converting the LCC data-structure to assembler .set reorder .globl main .text .text .align 2 .ent main main: .frame $sp,8,$31 addu $sp,$sp,-8 la $24,3 sw $24,-4+8($sp) lw $24,-4+8($sp) la $15,3 bne $24,$15,L.2 la $2,1 b L.1 L.2: move $2,$0 L.1: addu $sp,$sp,8 j $31 .end main

  24. Adding a special function to the mMIPS (software) • Launch function must be detected by LCC • Use special pattern to indicate use of launch function Example: ((a) - ((b) + *(int *) 0x12344321)) The following 4 constructs map to custom operations in LCC: ((a) - ((b) + *(int *) 0x12344321)) ((a) + ((b) + *(int *) 0x12344321)) ((a) - ((b) - *(int *) 0x12344321)) ((a) + ((b) - *(int *) 0x12344321)) More operations (possibly with more operands) can be added. Look at the website for more information.

  25. All rules are defined in the file ‘lcc/src/minimips.md’ File can be edited with a standard text editor LCC must be recompiled after editing (see website for details) What does a rule look like? Rule with three inputs

  26. #define TRUE 1 #define FALSE 0 #define launch(h1, h2) ((h1) - ((h2) + *(int *) 0x12344321)) void main(void) { int height1, height2; int l; while (TRUE) { l = launch(height1, height2); } } Custom operation in C and assembler 80:addiu sp,sp,-16 84: sw s5,0(sp) 88: sw s6,4(sp) 8c: b 0x98 90: sw s7,8(sp) 94: tgeu s7,s6,0x2a0 98: b 0x94 9c: nop a0: lw s5,0(sp) a4: lw s6,4(sp) a8: lw s7,8(sp) ac: jr ra b0: addiu sp,sp,16 lcc missile.c –o missile disas missile

  27. Comparison original added custom instruction 80: addiu sp,sp,-8 84: li t8,1000 88: slt s8,a0,t8 8c: beqz s8,0xac 90: nop 94: slt s8,a1,t8 98: beqz s8,0xac 9c: nop a0: li t8,1 a4: b 0xb0 a8: sw t8,4(sp) ac: sw zero,4(sp) b0: lw v0,4(sp) b4: jr ra b8: addiu sp,sp,8 94: tgeu s7,s6,0x2a0 Reduction of 14 instructions per execution!

  28. Adding a special function to the mMIPS (hardware) aluctrl alu

  29. Outline • mMIPS tool flow • Extending the LCC compiler • Video processing • I/O operations on the mMIPS • Assignment

  30. WEB Video processing in the ES group 24 Hz 1:1 25 Hz 1:1 30 Hz 1:1 50 Hz 2:1 60 Hz 2:1 CIF QCIF 1-25Hz 1:1

  31. Video processing – display systems

  32. Video processing – face detection Map face recognition in Smart-cam Trimedia Xetal

  33. Video processing – car safety systems

  34. Video processing – motion estimation

  35. Video processing – object detection

  36. Video processing – algorithm/architecture codesign MBS + VIP MMI+AICP CAB MPEG 1394 Conditional access T-PI ASIP GP MSP area(mm2) 1.41 11.4 load(%) 31 37 M-PI eff area(mm2) 0.437 4.2 TriMedia VLIW power(mW) 6.0 124 MIPS bandwidth(MB/s) 135 75 picture-rate up-converter

  37. 400 pixels/line 40 pixels/line 300 lines 30 lines What is an image? • A black and white image is a matrix of luminance values • More pixels means higher image quality

  38. How do you store an image? • An image is a one dimensional pixel array width-1 0 y x width*height-1 Address: [y*width+x]

  39. How many bits do we need per pixel? Experiments: we can distinguish about 200 levels in an image We shall use 8 bit representation of luminance

  40. The input file with image data (name.y format) football.y bicycle.y File: {byte0,byte1,……..byten, bytewidth*height} Pixel left top Pixel right bottom Example: Two pixels above directly above each other: byten and byten+width

  41. How many images per second? • Video is time discrete in the temporal domain • More pictures/second affects • Motion portrayal • Flicker

  42. How many images per second? • Depends on the brightness level, and viewing angle • The flicker threshold shifts to higher frequencies in the periphery of the vision field • Allows us to rapidly recognize approaching danger

  43. Video processing • Spatial domain • Image processing on a still image • Examples • Edge detection • Blurring • ... • Temporal domain • Image processing across different points in time • Examples • Motion estimation • Object recognition • ... The assignment deals with still images

  44. 3x3filter Example: filter coefficients are all “1” What does a 3x3 filter do with an image? • A 3x3 filter replaces each pixel (byte) in the file with the weighted sum of the pixel and its eight direct neighbors: • With: • And filter-coefficient Cline,pixel represented by one byte

  45. Blur filter Filter coefficient: +1 +1 +1 +1 +1 +1 +1 +1 +1

  46. C-code for blur filter for(int a=width+1; a<width*height-(width+1); a++){ result=(( 1* (int)buf_i[a-1-width] + 1* (int)buf_i[a-width] + 1* (int)buf_i[a+1-width] + 1* (int)buf_i[a-1] + 1* (int)buf_i[a] + 1* (int)buf_i[a+1] + 1* (int)buf_i[a-1 +width] + 1* (int)buf_i[a+width] + 1* (int)buf_i[a+1+width] +4 )/ 9); if(result<0) buf_o[a]=0; else if(result>255) buf_o[a]=255; else buf_o[a]=result; } clip weighted sum (pixel value) back to one byte

  47. Sharpening filter Filter coefficient: -1 -1 -1 -1 12 -1 -1 -1 -1

  48. C-code for sharpening for(int a=width+1;a<width*height-(width+1);a++){ result=(( -1* (int)buf_i[a-1-width] + -1* (int)buf_i[a-width] + -1* (int)buf_i[a+1-width] + -1* (int)buf_i[a-1] + 12* (int)buf_i[a] + -1* (int)buf_i[a+1] + -1* (int)buf_i[a-1+width] + -1* (int)buf_i[a+width] + -1* (int)buf_i[a+1+width] +2 )/ 4); if(result<0) buf_o[a]=0; else if(result>255) buf_o[a]=255; else buf_o[a]=result; }

  49. Edge detection Filter coefficient: -1 -1 -1 -1 8 -1 -1 -1 -1 +128

  50. C-code for sharpening for(int a=width+1;a<width*height-(width+1);a++){ result=(( -1* (int)buf_i[a-1-width] + -1* (int)buf_i[a-width] + -1* (int)buf_i[a+1-width] + -1* (int)buf_i[a-1] + 8* (int)buf_i[a] + -1* (int)buf_i[a+1] + -1* (int)buf_i[a-1+width] + -1* (int)buf_i[a+width] + -1* (int)buf_i[a+1+width] +128 )/ 1); if(result<0) buf_o[a]=0; else if(result>255) buf_o[a]=255; else buf_o[a]=result; }

More Related