1 / 42

Cork: Dynamic Memory Leak Detection with Garbage Collection

Cork: Dynamic Memory Leak Detection with Garbage Collection. Maria Jump Kathryn S. McKinley {mjump,mckinley}@cs.utexas.edu.

moses
Download Presentation

Cork: Dynamic Memory Leak Detection with Garbage Collection

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cork: Dynamic Memory Leak Detection with Garbage Collection Maria Jump Kathryn S. McKinley {mjump,mckinley}@cs.utexas.edu

  2. A memory leakin a garbage-collected language occurs when a program inadvertently maintains references to objects that it no longer needs, preventing the collector from reclaiming space. • Best case : increases GC workload • Worst case: systematic heap growth causes crash after days of execution Corkaccurately pinpoints systematic heap growth completely online UMCP

  3. Cork’s Solution 1. Summarize heap growth by calculating type points-from graph • Piggybacks on full-heap object scan • Summarizes the heap by type 2. Interpret the summarization using differencing 3. Generate debugging reports • Candidate Report • Slice Report • Allocation Site Report UMCP

  4. =instance =type =HashTable =Queue =PQueue =Company =People Type Points-From Graph Heap TPFG 2 3 1 3 1 1 4 4 1 2 UMCP

  5. Differencing TPFGs 1 1 1 TPFGi 1 1 2 2 1 2 2 2 1 2 2 TPFGi+1 1 3 1 3 3 1 1 1 1 1 1 TPFGi+2 1 4 4 1 UMCP

  6. 1 1 1 1 1 4 4 4 1 Finding Growth (SRT) • Rank growing nodes • Rank all growing nodes • Designate node as a candidateif UMCP

  7. Reported Candidates SRT # of Candidates fop jess SPECjbb UMCP

  8. 1 1 1 1 1 4 4 4 1 Finding Growth (RRT) • Find nodes that are growing • Rank all growing nodes • Designate node as a candidateif UMCP

  9. Reported Candidates SRT RRT # of Candidates fop jess SPECjbb UMCP

  10. 1 1 1 1 1 4 4 4 1 1 Finding Data Structure • Type is not enough • Growing edges identify the data structure • Rank edges • Calculate a slice from each candidate • Set of all paths (n0…nn) such that • “Sees” beyond non-candidate nodes UMCP

  11. Implementation and Methodology • Jikes RVM with MMTk • Benchmarks: • SPECjvm98, DaCapo, SPECjbb2000 • Eclipse 3.1.2 • Garbage collector • Generational with 4MB bounded nursery • For performance, report application only • Replay compilation • 2nd run methodology UMCP

  12. Efficiency and Scalability • Node/type data stored in type information block (TIB) adding 5 words • 1 word for type volume and edge list pointer for each of the previous 4 collections • 1 word for # of phases (p) • Edge data stored in lists • Prune parts of TPFG that are non-growing UMCP

  13. Space Overhead 19% 2.7X 0.233% UMCP

  14. Time Overhead Normalized Total Time Heap Size Relative to Minimum UMCP

  15. fop jess SPECjbb Benchmarks on Cork • Cork identified: • Systematic heap growth • Growing types • Growing data structure • Analysis: • fop– application design • jess – memory leak • SPECjbb2000– memory leak UMCP

  16. SPECjbb2000 Heap Occupancy (MB) Time (MB of allocation) UMCP

  17. Candidate Non-candidate Slice Diagram: SPECjbb2000 Types: 1663 (71) Nodes: 318 Edges: 904 longStaticBTree longBTree longBTreeNode Object[] Orderline NewOrder Date Order UMCP

  18. SPECjbb2000 Heap Occupancy (MB) Time (MB of allocation) UMCP

  19. Eclipse 3.1.2 on Cork • IDE • Big, complex, and open-source • Bug repository details known memory leaks and how to reproduce them • #115789: Memory Leak • Comparing 2 source trees or jar files • Manually repeat while running Cork UMCP

  20. Eclipse 115789 Heap Occupancy (MB) Time (MB of allocation) UMCP

  21. Candidate Non-candidate Slice Diagram: Eclipse 115789 HashMap$ HashIterator Types: 3365 (1773) Nodes: 667 Edges: 4090 HashMap HashMap$ HashEntry[] HashMap$ HashEntry ResourceCompareInput$ MyDiffNode ResourceCompareInput ResourceCompareInput$ FilteredBufferedResourceNode ListenerList ArrayList ElementTree Folder File RuleBasedCollator Object[] ElementTree$ ChildIDsCache Path UMCP

  22. Eclipse 115789 Heap Occupancy (MB) Time (MB of allocation) UMCP

  23. Candidate Non-candidate Slice Diagram: Eclipse 115789 HashMap$ HashIterator Types: 3365 (1773) Nodes: 667 Edges: 4090 HashMap HashMap$ HashEntry[] HashMap$ HashEntry ResourceCompareInput$ MyDiffNode ResourceCompareInput ResourceCompareInput$ FilteredBufferedResourceNode ListenerList ArrayList ElementTree Folder File RuleBasedCollator Object[] ElementTree$ ChildIDsCache Path UMCP

  24. Eclipse 115789 Heap Occupancy (MB) Time (MB of allocation) UMCP

  25. Cork’s Contributions • Very low-overhead technique • <0.5% space overhead • ~2% time overhead • Accurately identifies • Systematic heap growth • Data structure containing the growth • First mechanism for detecting memory leaks in production systems UMCP

  26. Thank You! mjump@cs.utexas.edu http://www.cs.utexas.edu/~mjump UMCP

  27. Second Run Methodology • Replay compilation • Profiling runs chooses hot methods • Deterministically applies optimizing compiler • Mixture of optimized & unoptimized code • Measure 2nd run • First run applies replay compilation • Turn off compilation • Flush compiler objects from heap • Measure second run UMCP

  28. Gartner Report predicts that by 2010, 80% of all new software will be in Java or C# [Wikipedia: Comparison of Java and C++, Dec 2006] UMCP

  29. Panacea for Bugs? • PMD, FindBugs, JLint, … • ESC/Java, Bandera, … • HPROF, JProbe, HAT, Leakbot, … Microsoft reports that, even in C#, 75% of development time is spent in debugging • Provide a good start • Programs still ship with memory and semantic errors UMCP

  30. My Research Focus PROBLEM:Dynamically detect statistical and anomalous per-object behavior5in production systems • Low overhead and high accuracy SOLUTION: • Exploit GC and underlying runtime system • Focus only on interesting objects • Find ways to summarize object properties UMCP

  31. Outline • Motivation: Programs have bugs • Cork: Dynamic Memory Leak Detection for Garbage-Collected Languages • Summarize using a type points-from graph • Interpret the summarization • Find memory leaks with Cork • How to focus only on interesting objects • Heap summarization with focus • Conclusions and future work UMCP

  32. Memory-Related Bugs with GC  • Lost Pointer : lose pointer to memory before freeing • Dangling Pointer : de-referencing pointer to memory previously freed • Unnecessary Reference : keeping pointer to memory no longer needed Reclaims automatically  Object is live  Objects are live, can not reclaim UMCP

  33. Heap Occupancy Graph Heap Occupancy (MB) Time (MB of allocation) UMCP

  34. Related Work • Offline Techniques: • Static analysis [Heine et al. 03] • Heap differencing [JProbe, DePauw et al. 98, 99, 00] • Allocation and/or usage tracking [OptimizeIt, Rationale, Purify, HAT, HPROF, Shaham et al. 00] • Online Techniques: • Leakbot (partially online) [Mitchell et al. 03] • Adaptive usage tracking [Chilimbi et al. 04, Bond et al. 06] Corkaccurately pinpoints systematic heap growth completely online UMCP

  35. Outline • Motivation: Programs have bugs • Cork: Dynamic Memory Leak Detection for Garbage-Collected Languages • Summarize using a type points-from graph • Interpret the summarization • Find memory leaks with Cork • How to focus only on interesting objects • Heap summarization with focus • Conclusions and future work UMCP

  36. What do we know? • Objects have special properties • Lifetime, allocation site, last-use site, calling context, thread usage, etc. • Tracking individual object properties is useful for debugging • Can use dynamic object sampling to gather fine-grained object statisticsat very low overhead [Jump et al. 04] UMCP

  37. Dynamic Object Sampling • Tag objects with special properties • One bit in the header indicates a tag • Sample tag encodes object properties • Examples: • Allocation site • Last-use site • Lifetime • Which data structure UMCP

  38. Dynamic Object Sampling • For example, modify a bump-pointer allocator Sample Tag UMCP

  39. During Garbage Collection • Gather object statistics • Piggyback on object scanning survivors SAMPLE TAG FOUND! 1. Examine tag 2. Collect statistics UMCP

  40. Focus DOS Overhead • Sampling every object • 12% space overhead • 6-7% time overhead • What is interesting depends application • Memory leak detection … candidate types • Malformed data structures … nodes • Dynamic pretenuring … random sampling • Focus only on 6% of objects • 0.8% space overhead • 2-3% time overhead 6% UMCP

  41. DOS in Cork • Encode allocation site and lifetime for candidates • <1.3% space overhead, ~4% time overhead • Find specific allocation sites causing growth • Future work • Encode last-use site in sample tag • Requires read/write barrier for candidates • Will overhead still be low enough for use in production systems? UMCP

  42. Conclusions • Developed synergistic two techniques • Dynamic object sampling • Points-from graphs • See detailed object characteristics in high-level summarizations • Unique ways to debug software in production systems UMCP

More Related