1 / 13

EECS 470

EECS 470. Dynamic Scheduling – Part I Lecture 9 Coverage: Chapter 3. ROB. Reorder Buffer Recap. @ Alloc Allocate result storage at Tail @ Sched Get inputs (ROB T-to-H then ARF) Wait until all inputs ready @ WB Write results/fault to ROB Indicate result is ready @ CT

binah
Download Presentation

EECS 470

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. EECS 470 Dynamic Scheduling – Part I Lecture 9 Coverage: Chapter 3

  2. ROB Reorder Buffer Recap • @ Alloc • Allocate result storage at Tail • @ Sched • Get inputs (ROB T-to-H then ARF) • Wait until all inputs ready • @ WB • Write results/fault to ROB • Indicate result is ready • @ CT • Wait until inst @ Head is done • If fault, initiate handler • Else, write results to ARF • Deallocate entry from ROB Any order MEM IF ID Alloc Sched EX CT In-order In-order ARF PC Dst regID Dst value Except? Head Tail • Reorder Buffer (ROB) • Circular queue of spec state • May contain multiple definitions of same register

  3. ROB Register Renaming Recap Any order @ REN • Index table with source operand regID to locate ROB/ARF entry • Update table with dest regID with ROB assigned for dest @ Sched • Get inputs from ROB/ARF entry specified by REN • Wait until all inputs ready @ CT • Wait until inst @ Head is done • If fault, initiate handler • Else, write results to ROB/ARF entry specified by REN • Deallocate entry from ROB • Invalidate rename table entry @ dest regID iff the entry still points to ROB entry being deallocated MEM IF ID Alloc REN Sched EX CT In-order In-order ARF Rename Table regID robIDX Rename Table • Returns (valid, robIDX) • If valid, ROB does/will contain value of register • If invalid, ARF holds value (no instruction in flight defines this register) • Indexed with regID robIDX v Why?

  4. Putting It All Together: Out-of-Order Issue Program Order Out-of-Order Schedule • Goal: use ILP to get more work done, thus shorten run-time • Possible at compile time or run-time…trade-offs? • Most effective around branches, stores, and with few registers • H/W uses dynamic scheduler • Invented at IBM in the mid-60’s • Also called Tomasulo’s Algorithm • Instructions in reservation station • Wake up when sources ready • Select instructions each cycle I1 I1 I3 I2 I2 I3 I4 I4

  5. RS ROB Value V phyID V phyID Value Op dstID Dynamic Instruction Scheduling Any order Any order @ Alloc • Allocate ROB storage at Tail • Allocate RS for instruction @ REG • Get inputs from ROB/ARF entry specified by REN • Write instruction with available operands into assigned RS @ WB • Write result into ROB entry • Broadcast result into RS with phyID of dest register • Dellocate RS entry (requiresmaintenance of an RS free map) MEM IF ID Alloc REN REG EX WB CT In-order In-order ARF Reservation Stations (RS) • Associative storage indexedby phyID of dest, returnsinsts ready to execute • phyID is ROB index of inst that will compute operand (used to match on broadcast) • Value contains actual operand • Valid bits set when operand is available (after broadcast)

  6. RS The Wakeup-Select-Execute Loop To EX/MEM dstID result = = grant src1 val1 src2 val2 dest MEM EX WB req = = Selection Logic src1 val1 src2 val2 dest = = src1 val1 src2 val2 dest

  7. src1 src1 src1 val1 val1 val1 src2 src2 src2 val2 val2 val2 dest dest dest Dynamic Scheduling Example x p41 = p52 + p43 p52 8 p43 p41 p42 = p41 + p43 p41 p43 p42 x x 1 p43 = p51 + p50 p51 1 p50 2 p43

  8. src1 src1 src1 val1 val1 val1 src2 src2 src2 val2 val2 val2 dest dest dest Dynamic Scheduling Example x x p41 = p52 + p43 2 p52 3 8 p43 p41 x p42 = p41 + p43 p41 3 p43 p42 x x 1 p43 = p51 + p50 p51 1 p50 2 p43

  9. src1 src1 src1 val1 val1 val1 src2 src2 src2 val2 val2 val2 dest dest dest Dynamic Scheduling Example x x p41 = p52 + p43 2 p52 3 8 p43 p41 x x p42 = p41 + p43 3 p41 11 3 p43 p42 x x 1 p43 = p51 + p50 p51 1 p50 2 p43

  10. Selection Logic • Why do we need it? • More instructions may “wake up” than we have resources to execute them • Which is the best instruction to choose? • The best one • The inst that will result in the shortest run-time, on program critical path • Computationally infeasible to identify this instruction • Random • A suitable baseline • The one closest to the left/right side of RS pool • Simple to implement, only requires a priority select logic • Similar to random, due to out-of-order deallocation of RS entries • Oldest First (inst closest to the Head of the ROB) • Slightly more complicated to implement than random techniques • Usually a good choice, long latency inst, inst with many output dependencies

  11. Window Size vs. Clock Speed • Increasing the number of RS [Brainiac] • Longer broadcast paths • Thus more capacitance, and slower signal propogation • More ILP parallelism extracted • Decreasing the number of RS [Speed Demon] • Shorter broadcast paths • Thus less capacitance, and slower signal propagation • Less ILP parallelism extracted • Which approach is better and when?

  12. Cross-cutting Issue: Mispeculation • What are the impacts of mispeculation or exceptions? • When instructions are flushed from the pipeline, reclaim RS entries freed • Otherwise, storage leaks in the microarchitecture • Typical recovery approach • Checkpoint free map at potential fault/mispeculation points • Recover the RS free map associated with recovery PC

  13. Discussion Points • What about memory dependencies? • We can deallocate RS out-of-order (which improves RS utilization), why not allocate them out-of-order as well? • If we didn’t rename the registers, would the dynamic scheduler still work? • Could the wakeup-select-execute loop be reduced to a wakeup-select loop with parallel execute?

More Related