1 / 29

Exploiting Postdominance for Speculative Parallelization

Exploiting Postdominance for Speculative Parallelization. Mayank Agarwal, Kshitiz Malik, Kevin Woley, Sam Stone, Matthew Frank Implicitly Parallel Architectures Group University of Illinois at Urbana-Champaign Originally in HPCA-13 Modified and Presented By: Borys Bradel. Outline.

carina
Download Presentation

Exploiting Postdominance for Speculative Parallelization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Exploiting Postdominance for Speculative Parallelization Mayank Agarwal, Kshitiz Malik, Kevin Woley, Sam Stone, Matthew Frank Implicitly Parallel Architectures Group University of Illinois at Urbana-Champaign Originally in HPCA-13 Modified and Presented By: Borys Bradel

  2. Outline • Motivation • Introduction • PolyFlow Architecture • Evaluations • Conclusions CARG March 14, 2007

  3. Speculative Parallelization • Parallelize single-threaded applications • Dynamically break execution into concurrent tasks • Multi-threaded and multi-core systems • Maintain sequential semantics A D A B C B PU1 PU2 PU3 PU4 C D CARG March 14, 2007

  4. Task Extraction Policies • Identify possible points for task creation • Critical to successful parallelization • Desirable features • Large set of possible tasks • Restrict amount of speculation • Exploit different kinds of parallelism • Work for varying application behaviors CARG March 14, 2007

  5. Limitations of Branch Prediction • Branch mispredicts limit exploitable amount of ILP • Superscalars discard all instrs fetched after mispred branch • Not all need to be discarded • Immediate postdominator • Earliest control-equivalent point • Control flow guaranteed to reconverge at E A B C D E F CARG March 14, 2007

  6. Control-Equivalent Spawning • Start new task at Immediate PostDom of branch • Spawn E as a new task at B • Control-equivalent to B • Main thread can speculate past B • Spawned thread as (control) speculative as branch B A Spawner B C D Spawnee E F CARG March 14, 2007

  7. A B C D E F Control-Equivalent Spawning PU1 PU2 PU3 A Task Spawn B Resolve Mispredict Spawned Task E D C F Reconnect A Task Spawn B Spawned Task Resolve Mispredict E D C F Reconnect CARG March 14, 2007

  8. Managing Data Dependences .. … … Branch … Prod1 … Prod2 … • Spawned tasks • Control-equivalent to spawner • Data dependent • Restrict data speculation • Delay dependent instructions • Register and memory • Until data becomes available • Independent instructions can execute in parallel Spawner Spawned Task ... Cons1 … Cons2 … Cons3 … CARG March 14, 2007

  9. Control-Equivalent Parallelization • Spawn immediate postdominator of branch • Task control-equivalent to spawner • Benefits • Subsumes heuristics based on program structures • Better performance than hybrid heuristic policies • Amenable to dynamic implementations CARG March 14, 2007

  10. Outline • Motivation • Introduction • PolyFlow Architecture • Evaluations • Conclusions CARG March 14, 2007

  11. Immediate Postdominator Spawns • Broad classification into 4 categories: • Hammocks • Loop fall-throughs • Procedure fall-throughs • Others CARG March 14, 2007

  12. A ends in if-then-else branch D postdominates A Upon reaching A Spawn new task starting at D Main task resolves branch Merits Spawns across mispredicts Finds useful work beyond mispredicts Parallelize inner loops Not directly exploited in most systems Imm PDom Hammocks Main Task A B C D E Spawned Task CARG March 14, 2007

  13. D ends in a loop branch Upon reaching D Start new task at E Main task executes loop New task executes fall-through Merits: Exploit parallelism in outer loops Reduce wastage from mispredicted loop branch Imm PDom Loop Fall-Throughs A MainTask B C D E Spawned Task CARG March 14, 2007

  14. C postdominates call instruction Upon reaching B Spawn new task at C Main task executes procedure New task executes fall-through Merits Spawns tasks in distant regions Warms up ICache Imm PDom Procedure Fall-Throughs Main Task A Proc X B call x C Spawned Task CARG March 14, 2007

  15. Others • Remaining immediate postdoms • Postdominators of indirect calls and jumps • Complex control flow • ~5-10% of static postdominators • Important in several programs CARG March 14, 2007

  16. Dynamic Spawn Distribution - Hammock and Others constitute ~65% of dynamic spawns - Not captured by most Speculative Parallelization Systems CARG March 14, 2007

  17. Twolf new_dbox_a Processor 1 spawn 9dbc spawn 9dc8 Processor 2 spawn 9dd8 Processor 3 spawn 9dec Processor 4 Processor 5 CARG March 14, 2007

  18. Outline • Motivation • Introduction • PolyFlow Architecture • Evaluations • Conclusions CARG March 14, 2007

  19. PolyFlow Task Spawn Unit if (nextPC==x) spawn y Fetch PC 1-8 Unified Scheduler Divert Queue Execute 1-8 Flush Retire CARG March 14, 2007

  20. The PolyFlow Architecture • Speculative parallelization system • Current evaluations on wide SMT core • Extend SMT system with task spawn unit • Manage task spawn, reconnection • Learn dependence and handle misspeculation • Use compiler-generated postdominators • Passed as hints to dynamic system • Stored in a separate “spawn hint cache” CARG March 14, 2007

  21. Evaluation Environment • Baseline Superscalar • 8-wide fetch/issue OOO core • 64-entry scheduler, 512-entry ROB • 8K 2-way assoc L1 ICache, 16K 4-way assoc L1 DCache • 512K 8-way assoc L2 Cache • Speculative Parallelization System • 8-context SMT CARG March 14, 2007

  22. Limitations • Each thread can spawn one successor • Only outer most branch in if-else nest • 512 entries in reorder buffer • Cannot reclaim resources • Limits parallelism • Superscalar – fetch 1 taken branch per cycle • PolyFlow – from 2 tasks per cycle, 1 taken branch/c CARG March 14, 2007

  23. Outline • Motivation • Introduction • PolyFlow Architecture • Evaluations • Conclusions CARG March 14, 2007

  24. Individual Spawn Heuristics • No single heuristic suitable for all applications • Control-equivalent spawning performs well overall FT=fall through CARG March 14, 2007

  25. Hybrid Spawn Policies CARG March 14, 2007

  26. Dynamic Implementation • Dynamic Reconvergence Analysis* • Learns immediate postdominators dynamically • Trains quickly • Can Drive Control-Equivalent Spawning • Spawn reconvergence point of branches • Alternative to compiler hints * J. D. Collins et al, Control Flow Optimization Via Dynamic Reconvergence Prediction, MICRO 2004 CARG March 14, 2007

  27. Outline • Motivation • Introduction • Polyflow Architecture • Evaluations • Conclusions CARG March 14, 2007

  28. Conclusions • Control-Equivalent Spawning • Reduces control speculation in spawned tasks • Generalizes common heuristics • For an SMT-based system • Over twice the speedups of best heuristics • Better than an aggressive hybrid policy • Amenable to dynamic implementations CARG March 14, 2007

  29. Thank You

More Related