1 / 33

Performance Visualizations using XML Representations

Performance Visualizations using XML Representations. Presented by Kristof Beyls Yijun Yu Erik H. D’Hollander. Overview. Background: program optimization research XML representations Visualizations Conclusion. Program optimization research.

eavan
Download Presentation

Performance Visualizations using XML Representations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Performance Visualizations using XML Representations Presented by Kristof BeylsYijun YuErik H. D’Hollander

  2. Overview • Background: program optimization research • XML representations • Visualizations • Conclusion

  3. Program optimization research • What slows down a program execution?Need to pinpoint the performance bottlenecks.(by analyzing the program) • How to improve the performance?By program transformations, based on pinpointed bottlenecks. • How to transform the program? • Compileradvantage: automatic optimizationdisadvantage: sometimes hard to understand what program does • Programmer:advantage: has good understanding of program functionalitydisadvantage: requires human effort / How to present performance bottlenecks best? • How to construct a research infrastructure that supports all the above in a common framework? ( XML)

  4. Two main performance factors • Parallelismperforming computation in parallelreduces execution time • Data localityfetching data from fast CPU caches reduces execution time

  5. Overview • Background: program optimization research • XML representations • Visualizations • Conclusion

  6. Why XML representations? yaxx – YACC extension to XML oc – Omega calculator isv – iteration space visualizer cv – cache (trace) visualizer distv – (cache reuse) distance visualizer • Extensible and versatile • Standard and Interoperable • Language Independent

  7. 1. AST (Abstract Syntax Tree) (ast) • XML is a good representation for AST by its hierarchical nature. • ast namespace captures syntactical information of a program • We can construct AST from source code through YAXX and regenerate source code through XSLT.

  8. Program optimization research • What slows down a program execution?Need to pinpoint the performance bottlenecks.(by analyzing the program) • How to improve the performance?By program transformations, based on pinpointed bottlenecks. • Who transforms the program? • Compileradvantage: automatic optimizationdisadvantage: sometimes hard to understand what program does • Programmer:advantage: has good understanding of program functionalitydisadvantage: requires human effort / How to present performance bottlenecks best? • How to construct a research infrastructure that supports all the above in a common framework? ( XML)

  9. 2. Parallel loops (par) • Identified parallel loop are annotated with a <par:true/> element in the “par” namespace. <ast:DO_Loop> <par:true/> … </ast:DO_Loop> • In this way, semantics and syntax information are in orthogonal name spaces. Syntax-based tools (e.g. unparser) can still ignore it, or translate it into directive comments: e.g. Fortran C$DOALL.

  10. XFPT: an extended optimizing compiler

  11. Program optimization research • What slows down a program execution?Need to pinpoint the performance bottlenecks.(by analyzing the program) • How to improve the performance?By program transformations, based on pinpointed bottlenecks. • Who transforms the program? • Compileradvantage: automatic optimizationdisadvantage: sometimes hard to understand what program does • Programmer:advantage: has good understanding of program functionalitydisadvantage: requires human effort / How to present performance bottlenecks best? • How to construct a research infrastructure that supports all the above in a common framework? ( XML)

  12. 3. Traces (trace) • Trace records a sequence of memory address accesses <trace:seq><access addr=“0x00ffe8” bytes=“8” /><access addr=“0x00fff0” bytes=“16” />…… </trace:seq> • Trace alone can be used to identify runtime data dependencesand identify cache missesthrough cache simulator • Associate an address with the array reference number or loop iteration index on the program’s AST, the trace can be used for advanced loop dependenceanalysis and cache reuse distanceanalysis. <trace:seq><access addr=“0x00ffe8” bytes=“8” hotspot:id=“1”> <!-– The 1st reference --> <do_loop hotspot:id=“1” vector=“1 2”/> <!– The 1st DO loop:(I,J)=(1,2) --> <array hotspot:id=“1” vector=“1”/> <!-– Reference to array element X(1) --></access> …… </trace:seq>

  13. 4. Hotspots (hotspot) • Hot spots are identified bottlenecks of the program • Two types are used: • Bottleneck loops: tells which loop is the performance bottlenecks • Bottleneck references: tells which references are performance bottlenecks <hotspot:list> <do_loop id=“1”> <index vector=“I J”/> <start lineno=“3” colno=“1”/> <end lineno=“7” colno=“12”/> </do_loop> …… <array id=“2” name=“X”> <dim><lb>1</lb><ub>10</ub></dim> </array>…… <reference id=“1” type=“R”> <start lineno=“5” colno=“9”/> <end lineno=“5” colno=“14”/> </reference>…… </hotspot:list> • DIM T(3), X(10) • REAL S, X • DO I = 1, 10 • DO J = 1, 10 • S = S + X(I)*J • ENDDO • ENDDO • …

  14. Overview • Background: program optimization research • XML representations • Visualizations • Conclusion

  15. Program optimization research • What slows down a program execution?Need to pinpoint the performance bottlenecks.(by analyzing the program) • How to improve the performance?By program transformations, based on pinpointed bottlenecks. • Who transforms the program? • Compileradvantage: automatic optimizationdisadvantage: sometimes hard to understand what program does • Programmer:advantage: has good understanding of program functionalitydisadvantage: requires human effort / How to present performance bottlenecks best? • How to construct a research infrastructure that supports all the above in a common framework? ( XML)

  16. Performance Visualizations XML plays an important role to glue the visualizers with an optimizing compiler: • Loop dependence visualization • Reuse distance visualization • Cache behavior visualization

  17. An iterationis an instance of the loop body statements. An iteration spaceis the set of integer vector values of the DO loop index variables for the traversed iterations. Loop carried dependence is a dependence caused by two references R1 and R2 that access to the same memory address, while: One of R1, R2 is a write R1 belongs to loop iteration (i1, j1) and R2 belongs to loop iteration (i2, j2)  (i1,j1) A ISDG is a graph with nodes representing the iteration space and edges representing loop carried dependences. DO i=1,5 DO j=1,5 A(i,j) = A(i,j+1) ENDDOENDDO Visualization 1:ISDG: iteration space dependence graph i 5 1 1 5 j

  18. The WTCM CFD application WTCM has a Computational Fluid Dynamics simulator which involves solving partial differential equations (PDE) through a Gauss-Siedel solver 3D geometry + 1D time temperature

  19. The visualized dependences

  20. The loop transformation A 3-D unimodular transformation is found after visualizing the 4D loop nest which has 177 array references at run-time for each iteration. Here we use a regular shape. The transformation makes it possible to speed-up the program around N2/6 times where N is the diameter of the geometry.

  21. Visualization 2:Reuse distances • Reuse distance is the amount of data accessed before a memory address is reused. • reuse distance > cache size  cache miss

  22. Execution time reduction on an Itanium processor (Spec2000 programs).

  23. Visualization 3:Cache miss traces (Tomcatv/Spec95) White: hit Blue: compulsory Green: capacity Red: conflict 56.7%

  24. 4.2 Visualizing hotspots of conflict cache misses X(I,J+1) and X(I,J) has conflict if X has a dimension (512,512). It is resolved by changing thedimension to (524, 524). Also known as, Array Padding

  25. 4.2 Cache misses trace after array padding, most spatial locality is exploited, conflict misses resolved On Intel 550MHz Pentium III (single CPU), the measured speedup with VTune >50% 17.2%

  26. Overview • Background: program optimization research • XML representations • Visualizations • Conclusion

  27. Conclusion • An existing optimizing compiler FPT was extended with an extensible XML interface. • The performance factors, in particular loop parallelism and data locality, were exported from FPT. • These factors were visualized through • Loop dependence visualizer ISV • Execution trace visualizer CacheVis • Reuse distance visualizer ReuseVis • The programmer can use the visualized feedback to improve the performance.

  28. The End. • Any questions?

  29. Program semantics (Software) vs. Architecture capabilities (Hardware) Visualize them!

  30. 2. Major Performance factors • Parallelism • Loop dependences • Loop-level parallelism • Instruction-level parallelism • Partition load balance • Data locality • Temporal locality • Spatial locality • CCC (Compulsory, Capacity, Conflict) cache misses • Reuse distances

  31. 3.6 Cache parameters • To tune different architectural cache configurations, we represent the cache parameters: cache size, cache line size and set associativity, into a configuration file in XML. For example, a 2-level cache is specified as follows: <cache:hierarchy> <parameters level=“1”> <size>1024</size> <line>32</line> <associativity>32</associativity> </parameters> <parameters level=“2”> <size>65536</size> <line>32</line> <associativity>1</associativity> </parameters> </cache:hierarchy>

  32. 4.2 Visualizing data locality histogram distributed over reuse distances

More Related