1 / 8

Feedback-directed optimizations with estimated edge profiles from hardware event sampling

Open64 workshop, CGO 2008 April 6, 2008. Feedback-directed optimizations with estimated edge profiles from hardware event sampling. Vinodha Ramasamy, Robert Hundt Google Inc., Dehao Chen, Wenguang Chen Tsinghua University. cd. INSTRUMENTATION. INSTRUMENTED. OPTIMIZED. BUILD. BINARY.

marged
Download Presentation

Feedback-directed optimizations with estimated edge profiles from hardware event sampling

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Open64 workshop, CGO 2008 April 6, 2008 Feedback-directed optimizations with estimated edge profiles from hardware event sampling Vinodha Ramasamy, Robert Hundt Google Inc., Dehao Chen, Wenguang Chen Tsinghua University

  2. cd INSTRUMENTATION INSTRUMENTED OPTIMIZED BUILD BINARY BINARY PROFILE FDO BUILD DATA Background • Traditional FDO model: Instrument – Run – Recompile • Usage Model • Difficulties in generating representative training datasets • High overhead of profile collection • Requires dual-compilation - tightly coupled builds • Benefits • Supports both value and edge profiling • High performance potential TRAINING DATA

  3. Our methodology • Skip the instrumentation step • Use INST_RETIRED event samples for feedback • Source position information used to correlate samples to basic blocks • Generate traditional edge profiles from basic block samples • Feedback data stored in same data structures as instrumented FDO • Leverage feedback-directed optimizations, validation and propagation SAMPLE PROFILE FDO BUILD OPTIMIZED BINARY Input Data Overview

  4. Algorithm • Basic block counts • Scale samples per source line by # of instructions • Samples per source line stored in profile datafile • Annotate IR statements in basic blocks with source line sample counts • Scale basic block sample count BB.count = (∑ IR.count) / num_IR_stmts pbla.c:60 iplus = iplus->pred; // 280 ÷ 4 = 70 100 : 804a8b7: mov    0x10(%ebp),%eax30 : 804a8ba: mov    0x8(%eax),%eax70 : 804a8bd: mov    %eax,0x10(%ebp)80 : 804a8c0: jmp    804a94b <primal_iminus+0x137> IR1 = 70 IR2 = 10 IR3 = 70 IR4 = 0 IR5 = 0 ∑IR.count = 70 + 10 + 70 + 0 + 0 = 150 BB.count = 150 ÷ 5 = 30

  5. Edge frequency estimation • Edge counts from basic block counts • Uses higher level program structure - branch, loop etc., • Recursive algorithm used to smooth sample counts 500 ENTRY: 0 ENTRY: 500 BODY: 0 BODY: 7954 → BR: 7954 BR: 7954 BACK: 0 NT: 30 T: 7922 BACK: 7454 NT: 32 T: 7922 JOIN: 420 JOIN: 7954 EXIT: 0 EXIT: 500

  6. Challenges • Inaccuracies inherent to sampling • Source position information issues • Missing information due to optimization transformations • Disambiguating samples per source line if (cond) {stmt1; stmt2;} • Edge estimation heuristics • Evaluate algorithm proposed by Levin et. al. • Inlining • Annotate early inlined functions with scaled sample counts

  7. Results SPEC2006 C benchmarks • Intel Core-2 platform using 64-bit binaries • -O2 FDO with instrumented runs • 4–5% gain over default –O2 runs • -O2 FDO with sampled profiles • Profile collection using –O2 binaries • ~60% of FDO instrumented gain

  8. Q&A Thank You!

More Related