1 / 20

Frequently executed path Not frequently executed path Hard to predict path

A. A. Diverge Branch. Hard to predict. nop. B. C. B. p1. D. C. !p1. E. E. F. G. Insert select- µ ops ( φ -nodes SSA). H. CFM point. H. Frequently executed path Not frequently executed path Hard to predict path. A. A. A. A. A. A.

emmett
Download Presentation

Frequently executed path Not frequently executed path Hard to predict path

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A A Diverge Branch Hard to predict nop B C B p1 D C !p1 E E F G Insert select-µops (φ-nodes SSA) H CFM point H Frequently executed path Not frequently executed path Hard to predict path

  2. A A A A A A Diverge-Merge Processor (DMP) [MICRO 2006, TOP PICKS 2007] A C B D E F G H Frequently executed path Not frequently executed path diverge-branch executed block CFM point

  3. Profile-Assisted Compiler Supportfor Dynamic Predicationin Diverge-Merge Processors Hyesoon Kim José A. Joao Onur Mutlu* Yale N. Patt HPS Research Group University of Texas at Austin *

  4. A A A A A . . . . . . . . . . . simple hammock nested hammock frequently-hammock loop non-merging Control-Flow Graphs 66% of mispredicted branches can be dynamically predicated by DMP. Exact CFM points Approximate CFM points 4

  5. Diverge-Merge Processor (DMP) • DMP can dynamically predicate complex branches (in addition to simple hammocks). • The compileridentifies • Diverge branches • Control-flow merge (CFM) points • The microarchitecturedecideswhen and what to predicate dynamically. 5

  6. Why hardware and compiler? • Compiler-centric solution (static predication): predicated ISA, not adaptive, applicable to limited CFG. • Microarchitecture-only solution: complex, expensive, limited in scope. • Compiler-microarchitecture interaction • Each one does what it is good at. 6

  7. Compiler Support Analysis: Identify Diverge Branch Candidates and CFM points Select Diverge Branches and CFM points Code generation: mark the selected diverge branches and CFM points (ISA extensions) 7

  8. A CFM point = IPOSDOM(A) Simple/Nested Hammocks: Alg-exact 8

  9. CFM point candidate ! Frequently-Hammocks: Alg-freq • Use edge-profiling frequencies A pT(X): conditional probability of reaching basic block X if branch A is taken NT T 0.8 0.2 pNT(X): conditional probability of reaching basic block X if branch A is not taken 0.1 0.9 0.5 0.5 • Stop at IPOSDOM or MAX_INSTR • Compute pMerging(X) = pT(X) * pNT(X) 1 1 1 1 pT(H)=1 H pMerging(H) = 1 * 0.5 = 0.5 pNT(H)=0.5 9

  10. Compiler Support Analysis: Identify Diverge Branch Candidates and CFM points  • Simple/nested hammocks • Frequently-hammocks • Chains of CFM points • Short hammocks • Return CFM points • Diverge loop branches Select Diverge Branches and CFM points Code generation: mark the selected diverge branches and CFM points (ISA extensions) 10

  11. Compiler Support Analysis: Identify Diverge Branch Candidates and CFM points  Select Diverge Branches and CFM points • Heuristics • Cost-benefit model Code generation: mark the selected diverge branches and CFM points (ISA extensions) 11

  12. Heuristics-Based Selection Do not select: • CFM points too far from the diverge-branch • Hammocks with too many branches on each path • Approximate CFM points with a low probability of merging Motivation: minimize overhead: wrong path maximize benefit: control-independence 12

  13. Cost-Benefit Model-Based Selection High - - - Same Select if cost < 0 Overhead None Lose Inaccurate Low Accurate possibly WIN Overhead No flush A = accuracy of the conf. estimator = mispredicted / low_conf. cost = (1 – A) x overhead + A x (overhead – misprediction_penalty) 13

  14. Compiler Support Analysis: Identify Diverge Branch Candidates and CFM points  Select Diverge Branches and CFM points  • Heuristics • Cost-benefit model Code generation: mark the selected diverge branches and CFM points (ISA extensions)  14

  15. Methodology • All compiler algorithms implemented on a binary analysis and annotation toolset. • Cycle-accurate execution-driven simulation of a DMP processor: • Alpha ISA • Processor configuration • 16KB perceptron predictor • Minimum 25-cycle branch misprediction penalty • 8-wide, 512-entry instruction window • 2KB 12-bit history enhanced JRS confidence estimator • 32 predicate registers, 3 CFM registers • 12 SPEC CPU 2000 INT, 5 SPEC 95 INT 15

  16. 70 exact 60 exact+freq exact+freq+short 50 exact+freq+short+ret exact+freq+short+ret+loop 40 IPC delta (%) 30 20 10 0 li go vpr gcc eon gap mcf gzip twolf ijpeg bzip2 comp crafty vortex parser hmean perlbmk m88ksim Heuristic-Based Selection 20.4% 16

  17. Cost-Benefit-Based Selection The cost-benefit model is simpler and effective 20.2% 17

  18. Input Set Effects 19.8% 18

  19. Conclusion • Compiler-microarchitecture interaction is good! • DMP exploits frequently-hammocks. • We developed new algorithms that select beneficial diverge-branches and CFM points. • We proposed a new cost-benefit model for dynamic predication. • DMP and our algorithms improve performance by 20%. 19

  20. Thank You! Questions?

More Related