html5-img
1 / 76

Advanced Computer Architectures

Advanced Computer Architectures. Laboratory on DLX Pipelining Vittorio Zaccaria. DLX. Load/Store Architecture Registers are faster than memory The compiler can do deeper optimization 16bit offsets and immediates 32bit integer registers 64bit floating point registers

zyta
Download Presentation

Advanced Computer Architectures

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Advanced Computer Architectures Laboratory on DLX Pipelining Vittorio Zaccaria

  2. DLX • Load/Store Architecture • Registers are faster than memory • The compiler can do deeper optimization • 16bit offsets and immediates • 32bit integer registers • 64bit floating point registers • Fixed operation encoding: • Addr. Mode contained in the operation code • Fits in one word • Faster decoding Vittorio Zaccaria – Laboratory of Architectures

  3. DLX (cont.) • 32 General purpose registers • 32 bit instructions: Vittorio Zaccaria – Laboratory of Architectures

  4. DLX Pipeline Vittorio Zaccaria – Laboratory of Architectures

  5. Pipeline Visualization Vittorio Zaccaria – Laboratory of Architectures

  6. Hazards • Limits to pipelining: Hazards prevent next instruction from executing during its designated clock cycle • Structural hazards: HW cannot support this combination of instructions • Data hazards: Instruction depends on result of prior instruction still in the pipeline • Control hazards: Pipelining of branches & other instructions that change the PC • Common solution is to stall the pipeline until the hazard is resolved, inserting one or more “bubbles” in the pipeline Vittorio Zaccaria – Laboratory of Architectures

  7. Structural Hazards Vittorio Zaccaria – Laboratory of Architectures

  8. Data Hazards Vittorio Zaccaria – Laboratory of Architectures

  9. Control Hazards Vittorio Zaccaria – Laboratory of Architectures

  10. An example program: .data dati_a: .word 1,2,3,4,5,6,7,8 dati_b: .word 2,3,4,5,6,7,7,9 .text .global main add r3,r0,0 loop: lw r4,dati_a(r3) lw r5,dati_b(r3) sub r5,r5,r4 addi r3,r3,4 bnez r5,loop exit: Vittorio Zaccaria – Laboratory of Architectures

  11. 1st Exercise: • Draw pipeline chart • Indicate: • Data Hazards between WB stages and ID stages. • Control Hazards between EX stage and IF stage Vittorio Zaccaria – Laboratory of Architectures

  12. Hazard Individuation

  13. 2nd Exercise: Hazard Resolution • Software solution • NOPs insertion • Hardware solutions • Bubbles/stalls generation • Register forwarding • Software optimizations • Code rescheduling Vittorio Zaccaria – Laboratory of Architectures

  14. NOP insertion add r3,r0,0 NOP NOP Loop: Lw r4,dati_a(r3) Lw r5,dati_b(r3) NOP NOP Sub r5,r5,r4 Add r3,r3,4 NOP Bnez r5,Loop NOP Vittorio Zaccaria – Laboratory of Architectures

  15. NOP dynamic execution First loop: Second loop: ........ Loop composed by 5 instr and 4 Nops Vittorio Zaccaria – Laboratory of Architectures

  16. Performance Indexes • CPI= average clock cycles per instruction; • Average Clock cycles= n° instr+n°stalls/nops+4 4 is the n° of cycles needed to execute the last instruction. • CPI=[Average Clock cycles]/[n° instr] Vittorio Zaccaria – Laboratory of Architectures

  17. Performance evaluation of NOPs • Actual CPI= Instructions+Nops+4 13+4 --------------------------------- = -------- = 2.42 Instructions 7 • MIPS frequency[=200Mhz] ------------------------- = 82.35 MIPS CPI*10^6 Vittorio Zaccaria – Laboratory of Architectures

  18. NOPs Manual Exercise • Execute manually the loop for two cycles (finishing on the nop after the 2nd bnez) and calculate CPI and MIPS • 10 minutes Vittorio Zaccaria – Laboratory of Architectures

  19. Results • CPI= (21+4)/11=2.27 • MIPS= 88 Vittorio Zaccaria – Laboratory of Architectures

  20. Asymptotic loop performance • Consider an intermediate cycle of the loop. • Count instructions + nops of the cycle and divide it by the number of effective instructions -> asymptotical CPI • 10 minutes Vittorio Zaccaria – Laboratory of Architectures

  21. Performance evaluation of NOPs (asymptotic) • Asymptotic loop CPI= (Instructions+Nops)*n+4 9n+4 --------------------------------- = ---------- =~ 1.8 Instructions*n 5n • MIPS frequency[=200Mhz] ------------------------- = 111 MIPS CPI*10^6 Vittorio Zaccaria – Laboratory of Architectures

  22. Bubbles • Bubbles are NOPs inserted by the hardware. • Branch instructions provoke the generation of a NOP • Next instructions are stalled • Previous instructions are executed. Vittorio Zaccaria – Laboratory of Architectures

  23. Bubbles Example Vittorio Zaccaria – Laboratory of Architectures

  24. Performance evaluation of bubbles • Actual CPI= Instructions+Bubbles/aborts+4 7+6+4 --------------------------------- = -----------= 2.42 Instructions 7 • MIPS frequency[=200Mhz] ------------------------- = 82.35 MIPS CPI*10^6 Vittorio Zaccaria – Laboratory of Architectures

  25. Verify on the simulator • File-> load code ... -> pipe1.s -> select -> load -> yes • Configuration -> disable forwarding • Open clock cycle diagram • Execute -> single cycle (until 1st load of the 2nd cycle has been executed) Vittorio Zaccaria – Laboratory of Architectures

  26. Result Vittorio Zaccaria – Laboratory of Architectures

  27. Manual Exercise • Preview what happens in an intermediate cycle • Calculate asymptotical CPI and MIPS • 10 minutes Vittorio Zaccaria – Laboratory of Architectures

  28. Let’s simulate it • Simulate the program until the 4th cycle Vittorio Zaccaria – Laboratory of Architectures

  29. Solutions • After the 1st cycle, we note the same behavior: • 5 instructions • 1 nop • 3 stalls so the asymptotic values are: • Asymptotic values: • CPI=1.8 • MIPS=111.11 Vittorio Zaccaria – Laboratory of Architectures

  30. Result Forwarding Vittorio Zaccaria – Laboratory of Architectures

  31. Result Forwarding Vittorio Zaccaria – Laboratory of Architectures

  32. Forwarding Example Vittorio Zaccaria – Laboratory of Architectures

  33. Simulation of 2 cycles of the loop. • Configuration -> enable forwarding • Open clock cycle diagram • File -> Reset DLX • Execute -> single cycle • Just to the WB of the 2nd bnez Vittorio Zaccaria – Laboratory of Architectures

  34. Simulation results Vittorio Zaccaria – Laboratory of Architectures

  35. Manual Exercise • Calculate CPI and MIPS for the 2 cycles. • Calculate Asymptotical CPI and MIPS. • 15 minutes Vittorio Zaccaria – Laboratory of Architectures

  36. Results • 2 cycles: • 11 instructions • 1 nop • 2 stalls • 4 cycles to flush the pipe • CPI=18/11=1.63 • MIPS=122 Vittorio Zaccaria – Laboratory of Architectures

  37. Asymptotical Results • 5 instructions • 1 nop • 1 stall • CPI=[7n+4]/5n=1.4 • MIPS=142.86. Vittorio Zaccaria – Laboratory of Architectures

  38. Speedup • Speed up of A w.r.t. B: Exec. Time B ------------- Exec. Time A Vittorio Zaccaria – Laboratory of Architectures

  39. Calculate asymptotical speedup • Speedup(NOPs,Bubbles) • Speedup(Forwarding,NOPs) • Speedup(Forwarding,Bubbles) • 5 minutes Vittorio Zaccaria – Laboratory of Architectures

  40. Calculate Asym. speedup • Speedup(NOPs,Bubbles)=1 • Speedup(Forwarding,NOPs)=1.29 • Speedup(Forwarding,Bubbles)=1.29 Vittorio Zaccaria – Laboratory of Architectures

  41. Scheduling Optimizations • change of the order of operations to minimize stalls/bubbles (forwarding enabled): lw r3,0(r2) add r3,r3,r7 lw r4,0(r2) add r4,r4,r8 add r4,r4,r3 CPI=(5+2+4)/5 lw r3,0(r2) lw r4,0(r2) add r3,r3,r7 add r4,r4,r8 add r4,r4,r3 CPI=(5+4)/5 Vittorio Zaccaria – Laboratory of Architectures

  42. 1st Exercise addi r1,r0,1 seq r2,r1,r1 add r3,r3,r3 Loop: lw r4,0(r3) sub r3,r3,r4 bnez r1,Loop Vittorio Zaccaria – Laboratory of Architectures

  43. Manual Exercises • Draw the conflicts between operations until the end of the 3rd execution of the cycle (last instruction bnez). No forwarding possible. • Insert bubbles/aborts in the right place to solve hazards. • Calculate CPI and throughput of the trace. • Calculate asymptotical CPI of the loop. • 20 minutes Vittorio Zaccaria – Laboratory of Architectures

  44. Hazard Diagram Vittorio Zaccaria – Laboratory of Architectures

  45. Bubbles/Stall insertion Vittorio Zaccaria – Laboratory of Architectures

  46. CPIs • Trace CPI=[24+4]/12=~2.33 • Asymptotic CPI=[6n+4]/3n=~2 Vittorio Zaccaria – Laboratory of Architectures

  47. Manual Exercises • Suppose now that forwarding is possible. • Draw the new execution pipeline diagram (until the execution of the 3rd bnez) and indicate when stalls must be generated by the hardware. • Calculate CPI and MIPS • Calculate asymptotical CPI and MIPS • 20 minutes Vittorio Zaccaria – Laboratory of Architectures

  48. Pipeline Diagram Vittorio Zaccaria – Laboratory of Architectures

  49. Results • CPI=21/12=1.75 • Asymptotical CPI=[(4+1)n+4]/3n=5/3=1.66 Vittorio Zaccaria – Laboratory of Architectures

  50. 2nd exercise loop: lw r2,dati_a(r4) lw r3,dati_b(r5) add r1,r2,r3 sw dati_a(r6),r1 addi r4,r4,4 addi r5,r5,4 addi r6,r6,4 j loop Vittorio Zaccaria – Laboratory of Architectures

More Related