1 / 30

Load-Reuse Analysis design and evaluation

Load-Reuse Analysis design and evaluation. Rastislav Bodík Rajiv Gupta Mary Lou Soffa. x:= a+b. y:= a+b. Partial Redundancy Elimination (PRE). Partially redundant = computed on some incoming paths. a+b. a+b. a:=. Steps:  find “reuse” paths ,

donoma
Download Presentation

Load-Reuse Analysis design and evaluation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Load-Reuse Analysisdesign and evaluation Rastislav Bodík Rajiv Gupta Mary Lou Soffa

  2. x:=a+b y:=a+b Partial Redundancy Elimination (PRE) • Partially redundant = computed on some incoming paths

  3. a+b a+b a:=..

  4. Steps: find “reuse” paths, •  remove redundancy from “reuse” paths.

  5. Register promotion = PRE of loads • Three steps: load-reuse analysis:find loads that can reuse prior loads/stores alias analysis:which stores may kill reuse? transformation: remove redundancy: PRE [PLDI ‘98] store a1, x load a2 store a3 load a4

  6. Design goal: completeness find all reuse To approach completeness, the analysis is uniform:analyze scalar, array, and pointer loads path-sensitive:different source of reuse on each path Evaluation goal: how complete? compare with ideal analysis Detecting all reuse is undecidable: no ideal algorithm exists instead, use simulation Load-reuse analysis

  7. Experimental framework program input load-reuse analysis simulator 1. 2. data-flow solution profile estimator 3. reuse level weighted solution transformation [PLDI ‘98] comparison 4.

  8. 1. Load-reuse analysis • It’s a data-flow analysis • on a reuse-aware representation: Value Name Graph (VNG):[POPL’98] • What’s new? Sparse version of the VNG • up to 30-times smaller than non-sparse Analyzing indirect loads/stores • also, model killing stores

  9. Naming the value y := b+c a := c-1 x := a+b+1

  10. b+c a+b+1 x names for the value in ‘x’

  11. GEN 1 1 1 x b+c a+b+1

  12. Naming the value across loads f offset: 0 next 4 *p 1 .. := p->f .. := p->next->f GEN **(p+4) 1 *r := ... **(p+4) 1 p := p->next *p 1 *p **(p+4)

  13. kill if r = p+4 or r = *(p+4) KILL 

  14. Sparse representation for I = 1, N { .. := A[I] + A[I-1] } a1 := A+I load a1 a2 := A+I-1 load a2 I := I+1

  15. Ø Ø 1 1 GEN load a1 1 1 1 1 load a2 1 1

  16. 2. The simulator algorithm for I = 1, N { .. := A[I] + A[I-1] } Ø memory access history A[I] load a1 103 102 101 100 history length = 1 to 4 A[I-1] load a2 102 101 100 99 Simulator detects all PRE-exploitable reuse (up to given history length), but also some “noise”: e.g. due to hash table accesses

  17. Ideal amount of load reuse % of all dynamic loads go m88ksim gcc compress li ijpeg vortex tomcatv swim su2cor hydro 1 4 history length 65% of executed loads has reuse exploitable by PRE intra-procedural reuse, history=1

  18. 3. How frequent is the reuse? load x • Edge profile: + cheap and available -cannot reconstruct frequencies of reuse paths 50 100 65 10 35 load x 40 75 35 40 5 30 900 855 25 kill x 75 20 55 load x

  19. Path profile: + precise - more expensive •  Use edge profile, but bound its inherent error: compute lower & upper bound on reuse

  20. Hierarchy of estimators Estimator:data-flow solution + edge profile weighted data-flow solution PRE CMP1 smaller error (but more complex) CMPc CMPr CMPf • Hierarchy:a practical approach •  A simple estimator not precise enough? Use next better one !

  21. The algorithms • 1. The bounds: • generators:points generating reuse • stealers: points with no reuse • upper bound:all reuse consumed • lower bound:all reuse stolen load x 50 100 65 10 35 load x 75 40 35 40 5 30 900 855 25 kill x 75 20 55 load x 150

  22. 2. Separating uncertainty: • using the CMP region • defined for PRE [PLDI ‘98] • CMP = code-motion preventing • all error is contained in the CMP region!

  23. Improving precision “one” region connected regions control flow reachability network flow reachability

  24. error Estimators: precision PRE CMP1 CMPc smaller error CMPr CMPf FP INT

  25. 4. Analysis: how close to ideal ? 100% = reuse seen by simulator **p ideal alias info *p calls array & pointer stores + calls all stores + calls reuse killed by:

  26. Related Work • Load-Reuse Analysis • makes value numbering path-sensitive • Steffen, Knoop, Rüthing Value Flow Graph [ESOP ‘90] we show how analyze indirect loads, via symbolic evaluation • Simulation-based analysis evaluation • Diwan, McKinley, Moss [PLDI’98] Type-based alias analysis: how powerful it needs to be? • Estimators • Ramalingam “Frequency Analysis” [PLDI’96] returns a single estimate, not its bounds

  27. Summary • Load-reuse analysis: • reuse across indirect memory references • sparse representation • Estimators: three principles • confidence: bound the edge-profile error • separation of uncertainty: inside/outside the CMP region • hierarchy: increasing precision and complexity • Evaluation: • about 65% loads are amenable to PRE • our analysis can find about 80% of those

  28. Combine three removal methods PLDI ‘98 control speculation S code motion restructuring R M

  29. S M R Example: 10 50 a+b a+b a+b

  30. S M R Relative removal power Loads removed,dynamic count, normalized INT FP Global CSE path-insensitive

More Related