Hans-Juergen Boehm Computer Science Dept. Rice University, Houston Mark Wieser Xerox Corporation, Palo Alto Presented by Srilakshmi Swati Pendyala GARBAGE COLLECTION IN AN UNCOOPERATIVE ENVIRONMENT
Outline • Introduction • Garbage Collection In Different Languages • Problem Domain • Need for conservative garbage collection in uncooperative environments • Overview of the proposed Garbage Collector • Use of the proposed GC as a debugging tool • Implementation Results • Conclusion
Introduction • Garbage Collection Different Languages ? • JAVA http://www.folgmann.com/en/gc.html
Introduction • Garbage Collection Different Languages : • .NET, VB, C# ? • Perl , Python ? • C, C++ ?
Introduction • Garbage Collection Different Languages ? • .NET, VB, C# – Mark and Sweep, Generational • Perl , Python – Reference Counting • C, C++ – No garbage collection, managed options available. • ADA, Modula 3 – Manual & Automated Garbage Collection
Introduction • JAVA, .NET etc. • Automatic Garbage Collection • No memory management effort for the programmer • In the run-time, the program should tell the GC which memory objects are still in use • C, C++ etc. • Program should “free” the allocated memory • Prone to memory leaks etc. • Both cases lead to additional effort from program/compiler. • GC affects the performance of the program • Better performance can be achieved (in some cases) when the program doesn’t worry about GC at all.
What is the need to avoid cooperation? • Programmers don’t want to pay for GC unless needed • Disadvantage in tagging the integers • Reduction available number of bits • Difficulty in manipulating standard machine representation of data. • Need for interfacing routines • To implement specific programming language like Russell • To enable garbage collection in conjunction with C, Pascal etc. • Difficult to design compilers that always preserve garbage collection invariants Need for a Garbage Collector that expects less from the program/compiler
Uncooperative Environment • Program/compiler does not provide information to recognize pointers • Every register/word potential pointer • All the storage that is accessible by the stack, registers etc., may not be needed by the program • Compiled code may fail to destroy the references (for performance issues/because of bugs) • Particular run-time representations may involve unnecessary references not intended by the programmer • Difficult to tell if an object is actually required by the program • Can lead to program failure if necessary objects are deleted • Need for CONSERVATIVE Garbage Collection
Conservative Garbage Collection • Imagine doing a mark and sweep GC, but not knowing for sure if a cell has a pointer in it or some other data. • If it looks like a pointer (that is, is a valid word-aligned address within heap memory bounds), assume that it IS a pointer, and trace that and other pointers in that record too. • Any heap data that is not marked in this way is garbage and can be collected. (There are no pointers to it.)
Discussion • Is conservative Garbage Collection needed in Cooperative systems ? • Disadvantages of Conservative Garbage Collection ? • Some amount of inaccessible memory is not reclaimed. • How can we reduce memory lost because of Conservative Garbage Collection ? • Better checks to detect false pointers
How does the Garbage Collector work? • Uses Mark-Sweep Stop-the-World Garbage Collection Algorithms • Procedure: • Scan all objects referenced directly by pointer variables (roots) from stack & registers • Verify that pointers are actually pointing to intended objects (validity check) and mark the objects referenced by validated pointers • Mark objects directly reachable from newly marked objects. • Finally identify unmarked objects and free them (sweep) • E.g. put them in free lists. • Reuse to satisfy allocation requests. • Objects are not moved.
Mark/Sweep illustration Stack w/ pointer variables
Mark/Sweep illustration (2) Stack w/ pointer variables
Allocator design • Allocation scheme obtains “chunks” of memory. • Chunks are always multiples of 4k in size. • Separate free lists for each object size. • Characteristics: • No per object space overhead (except mark bits) • Partial sweeps are possible.
Heap layout Heap Data Free lists . . . 4k size chunks
Data Structure for Chunks • A list of allocated chunks contains pointers to the beginning of each chunk • Contents of a chunk C: • Size of objects in the chunk • A pointer to the entry for C in list of allocated chunks • An area reserved for mark bits corresponding to the objects in the chunk • Data Objects Is it better than “tagging” integers ?
Finding Roots & Pointers • Possible roots: registers, stack, static areas • No cooperation from compiler • treat every word as potential pointer • ignore interior pointers (standard) • prefer marking from false pointers over ignoring valid pointers Conservative Pointer Identification: given word p; • does p refer to the collected heap? • does it point into heap block allocated by collector? • does it point to the beginning of an object in that block? if yes, • mark object in block header • push object onto mark stack • Sweep: • If a chunk is completely empty, return it to the chunk allocator
Pointer Validity Check Goal: To minimize the marking of false pointers • The pointer “p” should reference to a proper heap-address range for it to correspond to an object • If it corresponds an object, the pointer contained in the chunk header should correspond to the actual address of pointer “p” in the list of allocated chunks • The offset of the supposed object from the chunk header should be a multiple of of the object size given by chunk header and it should be within the end of the chunk
Garbage Collector as a Debugging Tool • Use GC to identify allocated memory that is no longer needed by the program, but not yet freed by it. • Use a tracer to track the memory leaks back to the subroutine responsible for them. • Procedure: • An allocation-and-free tracer. • Subroutine names are recorded on a stack with every call to “malloc”. • Mark the storage as freed when ‘free’ calls are made. • When collector runs, storage having no pointers to it and that was never explicitly deallocated with ‘free’ call is likely for storage leak. • Collector running with the tracer could find most of the storage unmarked by the collector, but never been explicitly “free”d.
Experimental Results • Mark phase of Russell collector took 1.9 seconds per megabyte of accessible memory in the heap. Sweep phase took 0.4 seconds per megabyte. • Garbage Collection was added to TimberWolf and SDI. • The systems were re-linked so that calls to Unix allocation routines instead called the allocator. • SunView presented problems because of dynamic allocated memory remapping and ‘notifier’. • Programming styles involving disguised pointers will not work with the collector method. • Use of the proposed GC as debugging tool has also been demonstrated on SunView system.
Conclusions • GC effective for traditional imperative languages with minimum cooperation from program/compiler • Realistic alternative to explicit memory management for most applications • May not suitable for real-time applications • No big constraints to coding style, except hidden pointer problem • GC’ing allocators competitive even with code not written for GC • The same GC can be used as debugging tool for programs that do manual garbage collection • An implementation of this garbage collector can be downloaded online