1 / 35

GC Advantage: Improving Program Locality

GC Advantage: Improving Program Locality. Xianglong Huang, Zhenlin Wang, Stephen M Blackburn, Kathryn S McKinley, J Eliot B Moss, Perry Cheng. Motivation. Memory gap How are Java programs affected?. Marksweep vs. Copying. pseudojbb. Motivation. Javac with perfect L1 and L2 cache.

serge
Download Presentation

GC Advantage: Improving Program Locality

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. GC Advantage: Improving Program Locality Xianglong Huang, Zhenlin Wang, Stephen M Blackburn, Kathryn S McKinley, J Eliot B Moss, Perry Cheng

  2. Motivation • Memory gap • How are Java programs affected?

  3. Marksweep vs. Copying pseudojbb

  4. Motivation • Javac with perfect L1 and L2 cache. • 16K L1 256K L2 • Appel, GCTk. • Breadth first

  5. Motivation • Copying collector can reorder objects • Goal: take advantage of copying collectors reorder objects to improve locality

  6. Exploring The Space • Different policies for traversing roots • Class-oblivious traversal orders • Which traversing order is the best? • Class-based traversal orders • How to find the “important” data structure?

  7. Different Root Traversal Policies • Two different types of roots: • Stack, global variables • Remember sets (for generational) • Different traversal orders • Copy all roots before traversing any children • Copy each root and its children (root-by-root) • Split roots • Stack first and the children • Remset first and the children

  8. Experiment Setup • JikesRVM, JMTk • Generational copying collector with bounded nursery size of 4MB • PseudoAdaptive 2nd iteration

  9. Different Root Traversal Policies • RxR has the best mutator locality

  10. Different Root Traversal Policies • Total execution time

  11. Exploring The Space • Different policies for traversing roots • Class-oblivious traversal orders • Which traversing order is the best? • Class-based traversal orders • How to find the “important” data structure?

  12. Different Traversal Orders • Breadth first 1,2,3,4,5,6,7 • Pure depth first 1,2,6,3,4,7,5 • Pure depth first, LIFO 1,5,4,7,3,2,6 1 5 4 2 3 7 6

  13. Different Traversal Orders • Breadth first 1,2,3,4,5,6,7 • Pure depth first 1,2,6,3,4,7,5 • Pure depth first, LIFO 1,5,4,7,3,2,6 • Partial depth first, 2 children 1,2,6,3,4,5,7 1 5 4 2 3 7 6

  14. Class Oblivious Type • Different traversal policies • Partial DF is the best

  15. Exploring The Space • Different policies for traversing roots • Class-oblivious traversal orders • Which traversing order is the best? • Class-based traversal orders • How to find the “important” data structure?

  16. Class-based Traversal • Class-oblivious traversal orders inflexible • Class-based object traversal • Static profiling • Dynamic sampling

  17. Static Profiling • Profile object accesses • Find hot pairs with strong correlation • Example • (1,4), (4,7) and (2,6) have strong correlation • Order: 1,4,7,2,6,3,5 1 5 4 2 3 7 6

  18. Online Profiling • Use the adaptive compiler sampling • Hot method • Hot basic block • Use field accesses to indicate hot fields • Example: (In a hot method) { Class A a; a.b=…; … } A b ….. B

  19. Online Profiling • Micro benchmark results

  20. Online Profiling • Geometric mean

  21. Reasons • No advice for most of the objects copied • For jess, db and raytrace, we only pick <<1% of the objects as hot objects • 5% for javac • The hot fields are within the first 2 pointers • 90% of the advised objects for javac

  22. Online Profiling • PseudoJBB mutator results • Generate advice for 23% of the copied objects • 75% of the objects have adviced hot fields other than first 2

  23. Questions • Have we found all the hot objects? • Not all hot objects are connected? • Is class-base good enough? • For pseudojbb, we need instance-based? • Locality for the nursery objects?

  24. Future Work • Sampling technique • Catch more hot objects access • Lower the threshold • Hot objects that are not connected • Dynamically change the advice for phase changing • Nursery locality • Different traversal orders for cold objects • Instance-based

  25. Conclusion • Reorder objects during copying collection can improve locality • In class-oblivious traversal orders partial depth first order is the best • Online profiling, class-based traversal is • more flexible, up to 50% better. • very low overhead, ~0% • Still mysteries

  26. Questions?

  27. Answers? • Lower the threshold of the sampling, not only the hot methods • For objects with only 1 or 2 pointers, it maybe easier just depth first • Maybe the nursery locality is more important • Instance-based advice

  28. Online Profiling • Execution overhead

  29. Online Profiling • Micro benchmark results for mutator time

  30. Different Root Traversal Policies _227_mtrt

  31. Static Profiling • Results

  32. Answers? • Most objects have only one pointer • Percentage of objects copied by advice (whether it is really hot?) • For pseudojbb ~50%, for jess <<1%, for our micro benchmark ~16% • Change! Half of the pairs do not form chains longer than 2 • Maybe the nursery locality is more important

  33. Class Oblivious Orderings • Different traversal policies • Partial DF is better pseudoJBB

  34. Motivation • MarkSweep vs. Copying Collector Mutator time of _213_javac

  35. Motivation Mutator L2 misses _213_javac

More Related