“Flea-flicker” Multipass Pipelining: An Alternative to the High-Power Out-of-Order Offense

“Flea-flicker” Multipass Pipelining: An Alternative to the High-Power Out-of-Order Offense Ronald Barnes George Mason University Shane Ryoo and Wen-mei Hwu University of Illinois Urbana-Champaign Dr. Ronald D. Barnes Department of Electrical and Computer Engineering

Dynamic scheduling approach: • Tolerating memory latency and finding ILP at runtime comes at heavy cost • Aggressive out-of-order execution incompatible with overriding power/power density concerns • ALPHA21264—18% of chip power, as much as int + fp exec • POWER4—10% of core power, scheduler highest power density • Power concerns influencing development towards efficiency rather than wide inst. window (Pentium M) In-order approach: • Rely on compiler-planned execution • Compiler techniques (e.g. prefetching) not solving problem of unanticipated memory latency Dr. Ronald D. Barnes Department of Electrical and Computer Engineering

Dr. Ronald D. Barnes Department of Electrical and Computer Engineering

Compiler Expressed Parallelism • Compiler can find a significant number of instructions for parallel execution on 6-issue processor Dr. Ronald D. Barnes Department of Electrical and Computer Engineering

Compiler Expressed Parallelism • Dynamic stalls (of which cache misses are most important [Sias04]) drastically reduce observed performance Dr. Ronald D. Barnes Department of Electrical and Computer Engineering

In-order runahead performance Dr. Ronald D. Barnes Department of Electrical and Computer Engineering

Benefits of multipass approach Dr. Ronald D. Barnes Department of Electrical and Computer Engineering

Key Multipass Contributions • Advance restart allows processing of newly woken insts. • Initial implementation relies on compiler-controlled restart • No expensive, fine-grain wakeup mechanism is needed • Re-use makes results of independent instructions persistent • Improves efficiency (no re-computation) • Hides long latency operations • Instruction Regrouping allows schedule-height reduction without reordering instructions Dr. Ronald D. Barnes Department of Electrical and Computer Engineering

Implementation cost of Multipass • Speculative memory state discussed in paper Dr. Ronald D. Barnes Department of Electrical and Computer Engineering

Experimental configuration • Benchmarks compiled with IMPACT C compiler using control-flow profiling and interprocedural alias analysis • Simulator augmented with power models of array structures Dr. Ronald D. Barnes Department of Electrical and Computer Engineering

Comparison with Out-of-Order Execution Dr. Ronald D. Barnes Department of Electrical and Computer Engineering

Overheads of Out-of-Order execution Register renaming hardware to overcome output and anti-dependencies Complex scheduling table to issue instructions as dependencies are met Increase in pipeline length Dr. Ronald D. Barnes Department of Electrical and Computer Engineering

Power Ratio Comparison • Sequential, in-order access give multipass structures their advantage Dr. Ronald D. Barnes Department of Electrical and Computer Engineering

Related approaches • In-order runahead [Dundas97] Runahead to extend out-of-order window [Mutlu03] • Checkpoint and repair run-ahead execution • All “pre-execution” results are thrown away • Subordinate microthreads [Chappel99] Speculative precomputation [Collins01] • Helper threads initiate memory accesses early • Two-pass pipelining [Barnes03] • In-order advance execution on a separate, tightly-coupled pipeline Dr. Ronald D. Barnes Department of Electrical and Computer Engineering

Conclusions • Multipass execution provides an cache-miss latency tolerant microarchitecture • Advance restart facilitates the execution of independent, newly ready instructions • Initial implementation uses compiler-direction • Instruction regrouping achieves significant speedup by increasing “rally” mode throughput • Future work • Microarchitectural mechanism for controlling advance restart • Examination of tradeoffs between continuing (perhaps with prediction) vs. restarting advance execution • Partial reuse of results Dr. Ronald D. Barnes Department of Electrical and Computer Engineering

“Flea-flicker” Multipass Pipelining: An Alternative to the High-Power Out-of-Order Offense

“Flea-flicker” Multipass Pipelining: An Alternative to the High-Power Out-of-Order Offense

Presentation Transcript

Enhancing Performance with Pipelining

I FORMATION OFFENSE

9.2 Pipelining

Chapter 8. Pipelining

Effective Measures for Flea Control in Dogs

Motion Offense

Chanukah Is Here

Alternative Energy: Nuclear Power

Parasite Identification

The Flea by John Donne

PIPELINING 2 nd week

Review: Pipelining

第七章

Complex Pipelining

Pipelining a CPU

MULTI-WAY PIPELINING FOR POWER-EFﬁCIENT IP LOOKUP

Pipelining and Retiming

Appendix C: Pipelining

3-4 Attack Defense 6/7 Man Pressures vs. Shotgun Spread Offense

Pipelining Lessons

The 4-2 Offense