1 / 14

Csci 136 Computer Architecture II – Superscalar and Dynamic Pipelining

Csci 136 Computer Architecture II – Superscalar and Dynamic Pipelining. Xiuzhen Cheng cheng@gwu.edu. Announcement. Homework assignment #11, Due time – by April 8. Reading: Sections 6.8 Problems: 6.30 – 6.31 Project #3 is due on April 10, 2004 Final: Tuesday, May 4 th , 11:00-1:00PM

keiki
Download Presentation

Csci 136 Computer Architecture II – Superscalar and Dynamic Pipelining

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Csci 136 Computer Architecture II – Superscalar and Dynamic Pipelining Xiuzhen Cheng cheng@gwu.edu

  2. Announcement • Homework assignment #11, Due time – by April 8. • Reading: Sections 6.8 • Problems: 6.30 – 6.31 • Project #3 is due on April 10, 2004 • Final: Tuesday, May 4th, 11:00-1:00PM Note: you must pass final to pass this course!

  3. SW is In EX Stage sw ID/EX.MemWrite and MEM/WB.RegWrite and MEM/WB.RegisterRd = ID/EX.RegisterRt and EX/MEM.RegisterRd != ID/EX. RegisterRt and MEM/WB.RegisterRd != 0 R-Type R-Type or lw Sign-Ext ID/EX.MemWrite and EX/MEM.RegWrite and EX/MEM.RegisterRd = ID/EX.RegisterRt and EX/MEM.RegisterRd != 0

  4. The Big Picture: Where are We Now? • The Five Classic Components of a Computer • Current Topics: • Superscalar and Dynamic Pipeling Processor Input Control Memory Datapath Output

  5. Is Faster Processor Possible? • Potentially pipelining can provide CPI=1. Is it possible to design faster processor? • Yes • Superpipelining – longer pipelines • Divide washer into 3 machines: wash, rinse, spin • Superscaler – replicate the internal components of the computer so that it can launch multiple instructions per CC. • Buy 3 washer, 3 dryer, etc. • Dynamic pipelining – use hardware to avoid pipeline hazard • Out of order execution is possible • More complicated pipeline control and instruction execution model.

  6. Issuing Multiple Instructions/Cycle • Two main variations: Superscalar and VLIW • Superscalar: varying no. instructions/cycle (1 to 6) • Parallelism and dependencies determined/resolved by HW • IBM PowerPC 604, Sun UltraSparc, DEC Alpha 21164, HP 7100 • Very Long Instruction Words (VLIW): fixed number of instructions (16) parallelism determined by compiler • Pipeline is exposed; compiler must schedule delays to get right result • Explicit Parallel Instruction Computer (EPIC)/ Intel • 128 bit packets containing 3 instructions (can execute sequentially) • Can link 128 bit packets together to allow more parallelism • Compiler determines parallelism, HW checks dependencies and forwards/stalls

  7. Superscalar MIPS • Assume two instructions are issued per clock cycle • ALU operation or branch • Memory access instructions

  8. Additional Hardware Requirement • Instructions be paired and aligned • Extra ports in the register file – 2 instructions • Separate adder for lw/sw address computation • What will happen for load-use instructions?

  9. Simple Superscalar Example • How would this loop be scheduled on a superscalar pipeline for MIPS? Loop: lw $t0, 0($s1) addu $t0, $t0, $s2 sw $t0, 0($s1) addi $s1, $s1, -4 bne $s1, $zero, LoopRe-order the instructions to avoid as many pipeline stalls as possible • Solution Hints: • Figure out instructions with data dependencies – can not be out of order! • Figure out load-use instructions requiring pipeline stalls • Any performance (in CPI) improvement?

  10. Loop Unrolling • Purpose: To achieve more performance improvement from looping • Idea: • Schedule multiple copies of the loop body together • The previous example: assume loop index is a multiple of 4 • What is the performance improvement?

  11. Dynamic Pipeline Scheduling • The hardware performs the “scheduling” • hardware tries to find instructions to execute • out of order execution is possible • speculative execution and dynamic branch prediction • Basic Idea • DPS tries to find later instructions to execute while waiting for a stall to be resolved • Pipeline is divided into 3 major units: • Instruction fetch and issue unit – IF, ID • Execute unit – 5 to 10 independent functional units • Commit unit – determine when to put the result back to register or memory • In-order completion vs. out-of-order completion

  12. Basic Idea

  13. Summary • All modern processors are very complicated • DEC Alpha 21264: 9 stage pipeline, 6 instruction in parallel, 4 instructions per CC. • PowerPC and Pentium/Itanium: branch history table, dynamic pipelining • Compiler technology is important • Dynamic pipelining combines with branch prediction is very challenging • Commit unit should know how to “rollback”-- to discard instructions when prediction is wrong • Dynamic execution is based on prediction: • Hide memory latency • Avoid stalls • Execute instructions while waiting hazards to be resolved

  14. Questions?

More Related