1 / 16

Lecture 19: Core Design

Lecture 19: Core Design. Today: issue queue, ILP, clock speed, ILP innovations. Wakeup Logic. tag1. tagIW. …. =. =. or. or. rdyL. tagL. tagR. rdyR. rdyL. tagL. tagR. rdyR. Selection Logic. Issue window. req. grant. enable. anyreq. Arbiter cell. enable.

Download Presentation

Lecture 19: Core Design

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 19: Core Design • Today: issue queue, ILP, clock speed, ILP innovations

  2. Wakeup Logic tag1 tagIW … = = or or rdyL tagL tagR rdyR . . . . . . rdyL tagL tagR rdyR

  3. Selection Logic Issue window req grant enable anyreq Arbiter cell enable • For multiple FUs, will need sequential selectors

  4. Structure Complexities • Critical structures: • register map tables, issue queue, LSQ, register file, • register bypass • Cycle time is heavily influenced by: • window size (physical register size), issue width (#FUs) • Conflict between the desire to increase IPC and clock speed • Can achieve both if we use large structures and deep • pipelining; but, some structures can’t be easily pipelined and • long-latency structures can also hurt IPC

  5. Deep Pipelines • What does it mean to have •  2-cycle wakeup •  2-cycle bypass •  2-cycle regread

  6. Scaling Options 2-cycle wakeup 2-cycle regread 2-cycle bypass Pipeline Scaling 20-IQ F 20-IQ F F F 40 Regs 40 Regs F F F F Capacity Scaling 15-IQ F F Replicated Capacity Scaling 30 Regs F 15-IQ F 15-IQ F F F 30 Regs 30 Regs F F

  7. Recent Trends • Not much change in structure capacities • Not much change in cycle time • Pipeline depths have become shorter (circuit delays have • reduced); this is good for energy efficiency • Optimal performance is observed at about 50 pipeline • stages (we are currently at ~20 stages for energy reasons) • Deep pipelines improve parallelism (helps if there’s ILP); • Deep pipelines increase the gap between dependent • instructions (hurts when there is little ILP)

  8. ILP Limits Wall 1993

  9. Techniques for High ILP • Better branch prediction and fetch (trace cache) •  cascading branch predictors? • More physical registers, ROB, issue queue, LSQ •  two-level regfile/IQ? • Higher issue width •  clustering? • Lower average cache hierarchy access time • Memory dependence prediction • Latency tolerance techniques: ILP, MLP, prefetch, runahead, • multi-threading

  10. Impact of Mem-Dep Prediction • In the perfect model, loads only wait for conflicting • stores; in naïve model, loads issue speculatively and must • be squashed if a dependence is later discovered From Chrysos and Emer, ISCA’98

  11. Clustering Reg-rename & Instr steer IQ IQ Regfile Regfile F F F F 40 regs in each cluster p21  p2 + p3 p22  p21 + p2 p42  p21 p41  p56 + p57 p43  p42 + p41 r1  r2 + r3 r4  r1 + r2 r5  r6 + r7 r8  r1 + r5 r1 is mapped to p21 and p42 – will influence steering and instr commit – on average, only 8 replicated regs

  12. 2Bc-gskew Branch Predictor BIM Address Pred G0 Vote Address+History G1 Meta 44 KB; 2-cycle access; used in the Alpha 21464

  13. Rules • On a correct prediction • if all agree, no update • if they disagree, strengthen correct preds and chooser • On a misprediction • update chooser and recompute the prediction • on a correct prediction, strengthen correct preds • on a misprediction, update all preds

  14. Runahead Mutlu et al., HPCA’03 Trace Cache Current Rename IssueQ Regfile (128) Checkpointed Regfile (32) ROB L1 D Runahead Cache Retired Rename FUs • When the oldest instruction is a cache miss, behave like it • causes a context-switch: • checkpoint the committed registers, rename table, return • address stack, and branch history register • assume a bogus value and start a new thread • this thread cannot modify program state, but can prefetch

  15. Memory Bottlenecks • 128-entry window, real L2  0.77 IPC • 128-entry window, perfect L2  1.69 • 2048-entry window, real L2  1.15 • 2048-entry window, perfect L2  2.02 • 128-entry window, real L2, runahead  0.94

  16. Title • Bullet

More Related