1 / 25

Computer Architecture Pipelines & Superscalars

Computer Architecture Pipelines & Superscalars. Sunset over the Pacific Ocean Taken from Iolanthe II about 100nm north of Cape Reanga. Pipelines. Data Hazards Code: lw $4, 0($1) add $15, $1, $1 sub $2, $1, $3 and $12, $2, $5 or $13, $6, $2 add $14, $2, $2 sw $15,100($2)

Download Presentation

Computer Architecture Pipelines & Superscalars

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computer Architecture Pipelines & Superscalars Sunset over the Pacific Ocean Taken from Iolanthe II about 100nm north of Cape Reanga

  2. Pipelines • Data Hazards • Code: • lw $4, 0($1)add $15, $1, $1sub $2, $1, $3and $12, $2, $5or $13, $6, $2add $14, $2, $2sw $15,100($2) The last four instructions all depend on a result produced by the first! MIPS instructions have the format op dest, srca, srcb

  3. Pipelines - Data hazards • Examine the pipeline(ignore first 2!) • r2 onlyupdatedin timefor add!

  4. Pipelines - Data Hazards • Compilersolution • InsertNOOPs • Inefficient!

  5. Pipelines - Data Hazards • Second compiler solution • Reorder Read Written lw $4, 0($1)add $15, $1, $1sub $2, $1, $3and $12, $2, $5or $13, $6, $2add $14, $2, $2sw $15,100($2) sub $2, $1, $3lw $4, 0($1)add $15, $1, $1 and $12, $2, $5or $13, $6, $2add $14, $2, $2sw $15,100($2) These two must not define $1 or $3!

  6. Pipelines - Data Hazards • Second compiler solution • Reorder Read Written sub $2, $1, $3lw $4, 0($1)add $15, $1, $1 and $12, $2, $5or $13, $6, $2add $14, $2, $2sw $15,100($2) First use of $2

  7. Pipelines - Data Hazards • Compiler analyses dependencies • Registerdefinitions • Registeruse • Read After Write(RAW)dependency • No dependencies • Instruction can be moved! Written sub $2, $1, $3lw $4, 0($1)add $15, $1, $1 and $12, $2, $5or $13, $6, $2add $14, $2, $2sw $15,100($2) Uses of $2

  8. Pipelines - Data Hazards • Hardware solution • Value forwarding • Hardware detectsdependency • scoreboard • Forwards resultfrom WB to EXfor subsequentuse • Hardware • Transparent to software!

  9. Data Hazards - classification • Read after Write (RAW) • Instruction 1 must write before instruction 2 reads • Write after Write (WAW) • Instructions 1 and 2 both writeInstruction 2 must write after 1 • Write after Read (WAR) • Instruction 1 readsInstruction 2 writes (overwrites) • Instruction 2 must not write before 1 reads Reordering algorithms must consider all three!

  10. Lecture 5 - Key Points • Data Hazards • RAW - most common • WAW • WAR • Compiler looks for dependencies • then re-orders • Hardware • Scoreboard • Monitors dependencies • ensures correct operation • Value forwarding hardware • Forwards results from EX stage

  11. Pipelines - Exceptions • Caused by overflow, underflow • Example • add $1, $2, $1 • Overflow detected in EX stage • Causes jump to exception handler • as branch - remainder of pipeline flushed but • Compiler needs original $1 causing overflow • Register must not be overwritten • EX stage needs to squash WB operation • Precise Exception problem - more later!

  12. Superpipelines

  13. Superpipelines • Time to complete each instruction = t • Total: Fetch + decode + fetch operands + operation + write-back • Clock frequency: f = 1/t • An n-stage pipeline allows n instructions ‘in flight’ simultaneously • Each pipeline stage does 1/n of the work • Each stage requires time t/n • Assumes a perfectly balanced pipeline! • Balanced = each stage requires the same time • Clock frequency: fpipe = 1/(t/n) = n/t • Increasing n increases processor power?

  14. Pipelines - Depth • Pipeline can’t be too deep • Hazards are frequent • many stalls in deep pipelines Too Deep! 2.5 2.0 Relative Performance 1.5 1.0 0.5 1 2 4 8 16 Pipeline Depth

  15. Pipelines - Depth • Pipeline can’t be too deep • Hazards are frequent • many stalls in deep pipelines Too Deep! 2.5 2.0 Relative Performance Superpipelined 1.5 1.0 0.5 1 2 4 8 16 Pipeline Depth

  16. Pipeline depth • Increasing number of stages • Each stage adds overheads • Problems balancing pipeline • Require tpd1≈ tpd2≈ tpd3 • Stage time istpdj + tpdreg • n stages means n tpdreg overhead Operation (work) Operation (work) Operation (work) Register Register Register tpd1 tpdreg tpd2 tpdreg tpd3 tpdreg

  17. CISC and pipelines • High Speed CISC processors are pipelined • Overlap IF, EX • Variable • instruction length • running time (number of microcode cycles) • pipeline imbalance • “backup” in pipe stages • complicate hazard detection • Complex addressing modes • auto-increment updates address register • multiple memory accesses required • smooth pipeline flow more difficult!

  18. Instruction Queues • Vital performance determinant • Rate of instruction fetch • High Performance processors • Fetch multiple instructions in each cycle • 2 - 4 common • Use wide datapath to memory • PowerPC 604 128 bits = 4 instructions • Despatch unit • Examine dependencies • Determine which instructions can be despatched

  19. Instruction Queues • Q “matches” fetch/despatch rates • General Strategy for matchingProducers - Consumers • Use of FIFO-style Queues • Absorb AsynchronousDelivery / ConsumptionRates • ProvidesElasticityin pipelines Producer Differing Instantaneous Rates FIFO Consumer

  20. Superscalar Processors

  21. Boundary of the Si die PowerPC organisation PowerPC 601 ~1993 • 3-way SuperScalar • Integer • Branch • Floating Point A newer machine will have more functional units here! New - Look in the “Example Processors” section of the Web notes

  22. Superscalar Processors • Multiple Functional Units • PowerPC 604 • 6-way superscalar • Despatch Unit • Sends “ready” instructions to all free units • PowerPC 604: • potential 4 instructions/cycle (pipeline lengths are different!) • reality: 2-3 instructions/cycle?(program dependent!) Branch Unit LoadStore Unit 3 Integer Units Floating Point Unit

  23. Superscalar Processors • Mix of functional units • Up to 8-way superscalar common now • 2 Floating point units • Usually have ~3 cycle latency • 3 Integer Arithmetic • Branch unit • Load / store unit • + ….? • Marketing departments can play some games with the ‘n’ of a n-way superscalar!

  24. Pentium Quad Core - 2008 • Distinguish between • Multiple ‘cores’ (separate processors) – later –and • Superscalars – multiple functional units per processor • “Wide dynamic execution” in Intel-speak • Quad core • 4 cores • Complete up to 4 instructions / cycle each • IIU can issue four instructions / cycle • 3 Mb L2 cache / processor (total 12Mb) • Master clock 3.2 GHz, front side bus 1.6GHz • 771 pins

  25. Superscalar Limitations • To achieve maximum performance • Instruction mix must match Functional Unit mix • eg if we have 2 Integer ALUs, 2 FPUs, 1 branch unit, 1 load/store unit • Instruction issue unit (IIU) can issue 4 instructions • Each four instructions should be able to use 4 of the functional units • If instruction stream doesn’t have right mix • Some functional units will remain idle • FPUs require multiple cycles • Additional stalls • Pipeline hazards stall pipeline • 4-way superscalar gets 1.8-3 instructions completed per cycle • Program dependent!

More Related