1 / 18

Data Access Profiling & Improved Structure Field Regrouping in Pegasus

Data Access Profiling & Improved Structure Field Regrouping in Pegasus . Vas Chellappa & Matt Moore May 2, 2005 / Optimizing Compilers / Project Poster Session. Introduction. Structure definitions group fields by semantics, not access contemporaneity

sveta
Download Presentation

Data Access Profiling & Improved Structure Field Regrouping in Pegasus

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Access Profiling & Improved Structure Field Regrouping in Pegasus Vas Chellappa & Matt Moore May 2, 2005 / Optimizing Compilers / Project Poster Session

  2. Introduction • Structure definitions group fields by semantics, not access contemporaneity • Data access profiling can be used to improve cache performance by reordering for contemporaneity In this context, contemporaneity is a measure of how close in time two data accesses to structure fields occur

  3. Problem Statement • Obtaining contemporaneity information for structure fields • Exploiting this information to improve the ordering of the fields • Doing this within the CASH/Pegasus environment

  4. Approach • Pegasus Implementation • Data Access Profiling to track contemporaneous field accesses to build the Field Affinity Graphs • Modify Simulator interface to SimpleScalar (3rd party cache simulator) to achieve this • Regrouping Algorithm • Field Affinity Graphs built by the modified Simulator are then used to recommend reorderings based on a new regrouping algorithm

  5. Project Design

  6. Design Overview • Build stage: Tag structure field accesses in the Pegasus IR • Simulation stage: Propagate tag information through SimpleScalar to the new regroup library • Final stage: Invoke regrouping algorithm to calculate reordering recommendations

  7. Build Stage, Tagging Accesses • Objective: Identify and tag structure field accesses in the Pegasus IR • Not trivial, since SUIF/C2DIL do not preserve required type information during transformation to IR • Need to identify patterns that indicate structure field accesses

  8. Field Accesses in Pegasus

  9. Actual Pegasus Illustration int foo(struct my_t stestfoo) { int retval = stestfoo.f2; return(retval); } Which wire here should havestruct type? int foo(struct my_t* stestfoo) { return(stestfoo->f2); } Which wire here has struct type?

  10. Simulation Process • Tag info on loads and stores is propagated through SimpleScalar to the regrouping library that builds the field affinity graph (done online, during simulation)

  11. Regrouping Stage • After simulation, analyze collected profiling data to produce reordering recommendation • Can be done better than has been done in previous work (greedy) • Cannot be done optimally (NP-hard) • Field Affinity Graph (one per structure): • Vertices: fields in a structure • Edge weights: represent degree of contemporaneity of accesses between the fields

  12. Matching Heuristic • Find a maximum weight matching in the field affinity graph • Fields that will not fit into a cache line together anyway are identified and ignored • Structure is reordered by placing matched fields together

  13. Greedy vs. Matching

  14. NP-Hardness • NP-Hardness is shown by reducing graph coloring problem to regrouping problem

  15. Results • Implemented successfully to handle structure field accesses done through pointers (ptr->fld) • So far, only small programs have been tested • Reordering is done manually and fed into simulator again to obtain the number of cycles for comparison

  16. Results - Example Original: struct my_t { int f1; int f2; char nu[4096]; int f3; int f4; }; int foo(struct my_t *elt) { int i; elt->f1 = 2; elt->f4 = 100; for(i=0; i < 50; i++) { elt->f1++; elt->f4--; } return elt->f1+elt->f4; } Modified: struct my_t { int f1; int f4; int f2; char nu[4096]; int f3; }; int foo(struct my_t *elt) { int i; elt->f1 = 2; elt->f4 = 100; for(i=0; i < 50; i++) { elt->f1++; elt->f4--; } return elt->f1+elt->f4; } 745 Cycles per Call (one less cache miss) 750 Cycles per Call

  17. Conclusion • Performance improvements are achievable even on simple programs using reorganization recommendations • Propagation of full type information in SUIF/c2dil from source would be required to optimize non-pointer accesses • Less memory-exposed languages would allow for easy and quick implementation of the reordering recommendation

  18. References • Trishul M. Chilimbi, Bob Davidson, and James R. Larus, “Cache-Conscious Structure Definition,'' in Proceedings of the ACM SIGPLAN '99 Conference on Programming Language Design and Implementation, pages 13-24, May 1999. • Mathprog (Weighted Matching Algorithm) http://elib.zib.de/pub/Packages/mathprog/matching/weighted/ • Pegasus: http://www-2.cs.cmu.edu/~phoenix/ • SUIF: http://suif.stanford.edu/ • SimpleScalar Tool set: http://www.cs.wisc.edu/~mscalar/simplescalar.html

More Related