1 / 67

Processor Architectures and Program Mapping

Processor Architectures and Program Mapping. 5kk10. flexibility. efficiency. DSP. Programmable CPU. Programmable DSP. Application specific instruction set processor (ASIP). Application specific processor. efficiency. ASIC. high medium low. ASIP. DSP.

margie
Download Presentation

Processor Architectures and Program Mapping

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Processor Architectures and Program Mapping 5kk10 Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  2. flexibility efficiency DSP Programmable CPU Programmable DSP Application specific instruction set processor (ASIP) Application specific processor Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  3. efficiency ASIC high medium low ASIP DSP GP proc FPGA low medium high flexibility Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  4. Programmable CPU cores • introduction • architecture of the MIPS core • discussed as an example • pipelining • application examples • software issues • comparison between different CPU cores • towards application specific architectures • discussion Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  5. Introduction • rationale: as high multiplex factor R as possible • consequence: often manual handcrafted design optimised for clock rate • problem : fast changes in the IC process technology • examples embedded: • MIPS (first one, licensing instruction set architecture) • ARM (Advanced Risc Machines, telecom, low power, • small code size, most popular one, licensing also • the micro-architecture as hard or soft IP) • Sparc • derivatives from general purpose CPUs • Intel, NEC, Hitachi, National, PowerPC Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  6. general purpose registers stack machines (e.g. ST20) accumulator machines register-register = load-store register-memory Introduction Instruction set architectures implicit operands explicit operands Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  7. Introduction C = A + B Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  8. Clk PC Instruction address Instruction Memory Instruction Rd Rt Rs Imm 5 5 5 16 32 Rw Ra Rb 32 32-bit registers Data address Data Memory 32 32 Data out Data in 32 Clk 32 Clk Architecture of the MIPS core [Hennessy& Patterson] Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  9. 31 26 21 16 11 6 0 Op rs rt rd shamt funct R - type 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits 31 26 21 16 0 Op rs rt immediate I - type 6 bits 5 bits 5 bits 16 bits 31 26 0 Op target address J - type 6 bits 26 bits MIPS instruction formats ( 32 bits ) [Hennessy& Patterson] op operation of the instruction rs,rt,rd source and destination registers shamt shift amount funct operation of the instruction-part 2 imm for program constants addr target address of a jump Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  10. 31 26 21 16 11 6 0 Op rs rt rd shamt funct 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits • add rd, rs, rt • mem[PC] • R[rd] = R[rs] + R[rt] • PC = PC + 4 Rd Rt Rs 5 Reg Wr 5 5 ALUctr BusA 32 Rw Ra Rb 32 32-bit registers Bus W Result 32 32 BusB 32 Clk Example 1 : R - type : add instruction [Hennessy& Patterson] Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  11. Critical path R-type operation Clk PC [Hennessy& Patterson] Instruction address Instruction Memory Instruction Rd Rt Rs Imm 5 5 5 16 32 Rw Ra Rb 32 32-bit registers Data address Data Memory 32 32 Data out Data in Clk 32 Clk Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  12. Critical path R-type operation Clock Clock-to-Q PC New value Old value Instruction memory access time Rs, rt, rd op, funct Old value New value RFile access time Bus A,B Old value New value ALU delay Bus W Old value New value Set up + skew Write into RFile Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  13. 31 26 21 16 0 Op rs rt immediate 6 bits 5 bits 5 bits 16 bits Rd Rt RedDst dc (Rt) Rs 5 Reg Wr 5 5 ALUctr MemtoReg BusA 32 Rw Ra Rb 32 32-bit registers Bus W Result 32 32 MemWr BusB 32 Clk WrEn Adr Data Memory Data In 32 Imm 16 16 32 Extender Clk ExtOp ALUSrc Example 2 : I-type : load word [Hennessy& Patterson] • lw rs, rt, imm16 • mem[PC] • addr = R[rs] + ext[imm16] • R[rt] = mem[addr] • PC = PC + 4 Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  14. Critical path load operation Clock Clock-to-Q PC Old value New value Instruction memory access time Rs, rt, rd op, funct Old value New value RFile access time Bus A,B Old value New value ALU delay address Old value New value Mem access time Bus W Old value New value set up+skew Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  15. 31 26 21 16 0 Op rs rt immediate 6 bits 5 bits 5 bits 16 bits Example 3 : I-type : branch [Hennessy& Patterson] • beq rs, rt, imm16 • mem[PC] • cond = R[rs] - R[rt] • if cond = 0 • PC = PC + 4 + ext(imm16)*4 • else • PC = PC + 4 Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  16. 31 26 21 16 0 Op rs rt immediate 6 bits 5 bits 5 bits 16 bits Example 3 : I-type : branch [Hennessy& Patterson] Rd Rt RedDst Branch dc (Rt) Rs Clk ALUctr PC 5 Reg Wr 5 5 Next Address Logic BusA 32 Imm 16 16 Rw Ra Rb 32 32-bit registers Bus W 32 BusB 32 Zero Clk To Instruction Memory Imm 16 16 32 Extender ExtOp ALUSrc Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  17. Example 3 : I-type : branch [Hennessy&Patterson] 30 30 Addr<31:2> Addr<1:0> Instruction Memory 30 “00” PC 0 30 Clk “1” 30 32 1 Imm 16 16 Instruction <31:0> 30 SignExt Branch Zero Instruction <15:0> Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  18. Example 3 : I-type : branch [Hennessy&Patterson] 30 30 Addr<31:2> Addr<1:0> Instruction Memory PC “1” c_in 00 Clk 0 “0” 32 30 Imm 16 16 SignExt 1 Instruction <15:0> Instruction <31:0> Branch Zero Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  19. cycle 1 cycle 2 cycle 3 cycle 4 cycle 5 Ifetch RF read ALU dmem RF write E.g. load 5 stages Architecture of the MIPS core • problem : long critical path • defined by the slowest instruction (load) • solution ? • = pipelining • break the instruction into smaller steps • all steps have about the same critical path Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  20. Pipelining lw instructions [Hennessy&Patterson] cycle 1 cycle 2 cycle 3 cycle 4 cycle 5 cycle 6 cycle 7 lw Ifetch RF read ALU dmem RF write lw Ifetch RF read ALU dmem RF write lw Ifetch RF read ALU dmem RF write • One instructions enters the pipeline every clock cycle • One instructions leaves the pipeline every clock cycle • => CPI = 1 (Cycles per Instruction) Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  21. I I I I I R R R R R A A A A A M M M M M W W W W W Pipelining lw instructions I R A M W Instructions Data Current CPU cycle Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  22. 4 stages of R-type instruction [Hennessy&Patterson] cycle 1 cycle 2 cycle 3 cycle 4 Ifetch RF read ALU RF write E.g. ADD Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  23. Resource conflict on the write port of the Rfile Pipelining lw and R-type instructions [Hennessy&Patterson] cycle 1 cycle 2 cycle 3 cycle 4 cycle 5 cycle 6 cycle 7 lw Ifetch RF read ALU dmem RF write add Ifetch RF read ALU RF write Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  24. cycle 1 cycle 2 cycle 3 cycle 4 cycle 5 cycle 6 cycle 7 lw Ifetch RF read ALU dmem RF write add Ifetch RF read ALU dmem RF write add Ifetch RF read ALU dmem RF write Solution: stretch R-type to 5 stages Ifetch RF read ALU dmem RF write Dummy op (noop) [Hennessy&Patterson] Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  25. mem wr Ifetch exec Reg/dec RegWr branch Next PC Rfile + 4 flags Rs BusA Ra Rt Rb BusB adr Prog mem Di Rw Data mem Dout ext. Imm16 Din Rt Rd MemtoReg [Hennessy&Patterson] MemWr RegDst ALUSrc ExtOp ALUop Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  26. DM DM DM DM DM RF RF RF RF RF IM IM IM IM IM RF RF RF RF RF Data dependencies : R-type instructions [Hennessy&Patterson] R1 = ... … = R1 + ... … = R1 + ... … = R1 + ... … = R1 + ... Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  27. DM DM DM DM DM RF RF RF RF RF IM IM IM IM IM RF RF RF RF RF Data dependencies : R-type instructions [Hennessy&Patterson] R1 = ... … = R1 + ... … = R1 + ... … = R1 + ... … = R1 + ... Solution: bypasses Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  28. Bypasses [Hennessy&Patterson] adr Data mem Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  29. DM DM DM DM RF RF RF RF IM IM IM IM RF RF RF RF Data dependencies : load instruction [Hennessy&Patterson] R1 = lw... … = R1 + ... … = R1 + ... … = R1 + ... Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  30. DM DM DM DM RF RF RF RF IM IM IM IM RF RF RF RF Data dependencies : load instruction [Hennessy&Patterson] R1 = lw... Bypass is no solution for + instruction … = R1 + ... … = R1 - ... … = R1 - ... Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  31. DM RF IM RF Data dependencies : load instruction [Hennessy&Patterson] R1 = lw... DM RF IM … = R1 + ... RF DM RF IM … = R1 - ... RF … = R1 - ... DM RF IM RF Solution: pipeline interlock = detects a data hazard and stalls the pipeline until the hazard is cleared Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  32. I R A M W Instructions i1) lw r10, r2, r0 i2) add r8, r9, r10 i1 Data available from data cache i2 I R(interlocked) A M W Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  33. I R A M W Instructions i1) MULT r3, r2, r1 i2) ADD r5, r4, r3 i1 i2 I R(interlocked) A M W Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  34. I I I I I R R R R R A A A A A M M M M M W W W W W Control hazards branch Next PC Rfile + 4 flags Rs BusA Ra Rt Rb BusB adr Prog mem Di Rw Data mem Dout ext. Imm16 Din Rt Rd [Hennessy&Patterson] Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  35. I I I R R R A A A M M M W W W Control hazards branch Next PC 0? + 4 flags Rs Ra BusA Rt Rfile Rb BusB adr Prog mem Di Rw Data mem Dout ext. Imm16 Din Rt Rd [Hennessy&Patterson] Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  36. I I I R R R A A A M M M W W W Control hazards i1) beq r10, r2, 1b i2) nop/independent instructions i3) add r8, r9, r10 i1 i2 Address available for instr. fetch i3 Solution: compiler action possibly filling the branch delay slot Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  37. 8K I$ Itag PIO PR3930 IU (including MAD) MMU DSU dtag 4K D$ PR3930 CPU Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  38. TCP chip: TV controller • PR3930 + peripherals • Gfx, SDRAM controller, • Serial interconnect bus, • I2C, UART, timers • PI bus architecture • 80 mm2 • 352 pins • 0.35 micron process • 48 MHz (96 for gfx) D$ I$ Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  39. Programmable CPU cores • introduction • architecture of the MIPS core • discussed as an example • pipelining • application examples • software issues • comparison between different CPU cores • towards application specific architectures • discussion Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  40. x4 x3 x2 x1 x0 Z-1 Z-1 Z-1 Z-1 c4 c3 c2 c1 c0 * * * * * + y Application examples (1) Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  41. Application examples (1) 19 instructions per tap!! Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  42. Application examples (2) Bit level operations: finite field arithmetic 10 instructions!! Very simple in hardware Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  43. source register ($2) 27 26 25 23 22 20 srl $13, $2, 20 andi $25, $13, 1 srl $14, $2, 21 andi $24, $14, 6 or $15, $25, $24 srl $13, $2, 22 andi $14, $13, 56 or $25, $15, $14 sll $24, $25, 2 7 6 5 4 3 2 destination register ($24) Application examples (2) Bit level operations : DES example Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  44. 18 17 16 13 $5 srl $24, $5, 18 srl $25, $5, 17 xor $8, $24, $25 srl $9, $5, 16 xor xor $10, $8, $9 srl $11, $5, 13 xor $12, $10, $11 andi $13, $12, 1 … 0 ... 1 $13 Application examples (2) Bit level operations : A5 example (GSM encryption) Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  45. Application examples (3) Video conferencing H263 CIF format = 352 * 288 px, 2:1:1, 8 bits/sample QCIF = 1/4 CIF SQCIF = 96*128 Process = 0.25 micron power consumption = 100 mW @ 10 Hz 96*128*1.5*10Hz = 180 KB/s :72 20Kb/s Compare 852*576*2B/p *50 =49MB/s Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  46. out VLC in DCT Q IQ IDCT + + - + best match Motion estimation Frame store Motion comp motion vectors Application examples (3) H.263 video encoder Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  47. Application examples (3) PR3940 I$ D$ memory 10 Hz => 140 MHz CPU Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  48. Application examples (3) In which process can the H263 video encoder be executed on a single MIPS processor ? Conclude: power consumption is limiting factor!! Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  49. Application examples: conclusions • CPUs offer flexibility, but… • not efficient in performance • not efficient in code size • not efficient in power consumption Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  50. func() { a=x.value & 0x3; if (a != 0) { b = a * c + d; } else { b = … ; } y.post(b); } compile each BB to instructions a=x.value & 0x3; BB1 a == 0 a != 0 parser b = a * c + d; b = … ; BB2 BB3 ldi #0x3, R5 and R4,R5,R6 cmp R0,R6,R7 br R7,true ba false y.post(b); BB4 func() { a=x.value & 0x3; DelayCycles(7); if (a != 0) { b = a * c + d; DelayCycles(8); } else { b = … ; DelayCycles(5); } y.post(b); DelayCycles(4); } compile and run generate new C with delay counts Arch. Model ldi=2 cycles nop =1 cycle ... Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

More Related