1 / 57

Portable, Unobtrusive Garbage Collection for Multiprocessor Systems

Portable, Unobtrusive Garbage Collection for Multiprocessor Systems. Damien Doligez Georges Gonthier POPL 1994 Presented by Eran Yahav (yahave@math.tau.ac.il). Portable, Unobtrusive Garbage Collection for Multiprocessor Systems.

aimon
Download Presentation

Portable, Unobtrusive Garbage Collection for Multiprocessor Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Portable, Unobtrusive Garbage Collection for Multiprocessor Systems Damien Doligez Georges Gonthier POPL 1994 Presented by Eran Yahav (yahave@math.tau.ac.il)

  2. Portable, Unobtrusive Garbage Collection for Multiprocessor Systems A concurrent, generational garbage collector for a multithreaded implementation of ML - Doligez - Leroy (POPL 1993) On-the-fly garbage collection: an exercise in cooperation - Dijkstra et al. (1978)

  3. Overview • Motivation • Concurrent collection strategies • Concurrent collection constraints • The basic algorithm (Dijkstra) • Doligez-Leroy model • Doligez-Leroy concurrent collector

  4. Concurrent GC • Known as a tough problem • Published algorithms contain simplifying assumptions that either: • impose unbearable overhead on mutators • require high degree of hardware/OS support • Other algorithms are buggy

  5. Sync. GC Sync. GC T1 T2 T3 T4 “Stop the world” • all threads synchronically stop and perform GC • introduces sync. between independent threads

  6. Sync. GC “Stop the world” • all threads synchronically stop and perform GC • introduces sync. between independent threads T1 T2 T3 T4

  7. T1 T2 T3 T4 “Stop the world” - Mostly Parallel GC (Bohem et. al) • Uses virtual memory page protections • reduces duration of “stop the world” period • does not prevent synchronization between threads at “stop the world” points marking Sync. GC

  8. T1 T2 T3 T4 “Stop the world” - Scalable mark-sweep GC • Uses a parallelization of Bohem’s mostly parallel collector • reduces duration of “stop the world” periods • does not prevent synchronization between threads at “stop the world” points marking Sync. GC

  9. T1 T2 T3 T4 “Stop the world” - Real Time GC (Nettles & O’Toole) • Incremental copying collector • reduces duration of “stop the world” periods • does not prevent synchronization between threads at the swap point Sync. GC

  10. Concurrent collector • run the collector concurrently with user threads • use as little as possible sync between user threads and GC thread GC T1 T2 T3 T4

  11. Concurrent Collection strategies • Reference counting • copying (relocation) • mark & sweep

  12. M1 M2 M3 Concurrent GC - Reference counting • Locks on reference counters -1 RC = 2 +1 heap

  13. M1 M2 GC Concurrent GC - relocation • relocating objects while mutators are running ? from to ? heap

  14. Concurrent GC - relocation • relocating objects while mutators are running • must ensure that mutators are aware of relocation • test on heap pointer deref • extra indirection word for each object • virtual memory page protections • significant run-time penalty

  15. Global variables Threads 1 2 3 Heap Concurrent GC - mark/sweep • Mark all threads roots • No inherent locks • Mutators may change trace graph during any collection phase

  16. Multiprocessors facts of life • Registers are local • impossible to track down machine registers of a running process • Synchronization is expensive • semaphores and synchronization are only available through expensive system calls

  17. Unobtrusive? • No overhead on frequent actions: • move data between registers and memory • deref a heap pointer • fill a field in a new heap object • imposes sync. overhead only on reserve actions (for which it is unavoidable) • mutator cooperation with collector is done only at mutator’s convenience

  18. Portable ? • No special use of OS synchronization primitives • no hardware support

  19. Where all else fail • relocating GC algorithms break locality or impose large overhead • proposed incremental algorithms requires global synchronization • mark & sweep - collector working while mutators change trace graph - complicated but possible

  20. The basic algorithm • Dijkstra et al. - “On the fly garbage collection” • published in 1978 • breaks locality • assumes fixed set of roots Global variables Threads 1 2 3 GC Heap

  21. Dijkstra’s collector Mark: for each x in Globals do MarkGray(x) Scan: repeat dirty  false for each x in heap do if color[x] = Gray then dirty  true MarkGray(x . Sons) color[x]  black until not dirty Sweep: for each x in heap do if color[x] = white then append x to free list else if color[x] = black then color[x]  white mark black gray update allocate mark sweep sweep white

  22. Doligez-Leroy model • Damein doligez & Xavier Leroy at 1993 • a concurrent, generational GC for multithreaded implementation of ML • relies on ML properties: • compile time distinction between mutable and immutable objects • duplicating immutable objects is semantically transparent • does not stop program threads

  23. Doligez-Leroy model • Do anything to avoid synchronization • trade collection “quality” for level of synchronization - allow large amounts of floating garbage • trade collection “simplicity” for level of synchronization - complicated algorithm (not to mention correctness proof)

  24. 3 1 2 Doligez-Leroy model Threads Stacks Global variables Minor heaps Major heap

  25. Collection generations • Each thread treats the two heaps (private and shared) as two generations • private = young generation • shared = old generation • immutable objects are allocated in private heaps • does not require synchronization • mutable objects handled differently (later)

  26. Minor collection • When private heap is full - stop and perform minor collection • copy live objects from private heap to shared heap (old generation) • after minor collection, whole private heap is free • can be performed in any time • synchronization is only required for allocation of the copied object on shared heap

  27. Major collection • Dedicated GC thread • uses a variation of Dijkstra’s algorithm (mark & sweep) • does not move objects, no synchronization is required when accessing/modifying objects in shared heap • will be described later

  28. 3 1 2 Major and minor collection Threads GC Stacks Global variables Minor heaps Major heap

  29. Not reachable from thread’s roots Copy on update • We assumed no pointers from shared heap to private heap Major heap

  30. Copy on update • Copy the referenced object (and descendents) • similar to minor collectionwith a single root • simply does some of theminor collection right away Major heap

  31. Copy on update • Until next minor collection, copying thread can access original and copied objects • immutable objects - semantically equivalent • what about mutable objects ? Major heap

  32. Allocation of mutable objects • If copied - can update both objects separately • no equivalence of original and copied object • solution: always allocate mutable objects in the shared heap • requires synchronization (free list) • ML programs usually use few mutable objects • mutable objects have longer life span than average

  33. The Concurrent collector • Adapted version of Dijkstra’s algorithm • naming conventions • mutator = thread + minor collection thread • collector = major collector • major collector only requires marking of mutator roots. • does not demand minor collections

  34. Four color marking • White - not yet marked (or unreachable) • Gray - marked but sons not marked • Black - marked and sons marked • Blue - free list blocks Heap

  35. Collection phases • Root enumeration • end of marking • sweeping

  36. Root enumeration • Raise a flag to signal beginning of marking • shade globals • ask mutators to shade roots • wait until all mutators answered • meanwhile - start scanning and marking

  37. Root enumeration Collector Mark: for each x in Globals do MarkGray(x) call mutator to mark roots wait until all mutators answered ... Mutators Cooperate: if call to roots is pending then call MarkGray on all roots answer the call

  38. End of marking • Repeatedly mark gray objects until no more gray objects remain Scan: repeat dirty  false for each x in heap do if color[x] = Gray then dirty  true MarkGray (x . Sons) color[x]  black until not dirty

  39. Sweeping • Scan heap • All white objects are free - set to blue and add to the free list • all black objects are reset to white • some object might have been set to gray since the end of marking phase - set to white

  40. Objects can become reachable by allocation and modification which are performed concurrently with the collection Invariants (1/2) • All objects reachable from mutator roots at the time mutator shaded its roots, or that become reachable after that time are black at the end of the marking phase

  41. Invariants (2/2) • gray objects that are unreachable at the beginning of the mark phase become black during mark, then white during sweep and reclaimed by the next cycle (floating garbage) • all white objects unreachable at the start of the marking phase remain white • No unreachable object ever becomes reachable again • there are no blue objects outside the free list

  42. Concurrent allocation and modification • Mutators must consider collector status when performing modification or allocation of heap objects • first, lets consider modification of heap objects

  43. Concurrent modification • Updating a black object could result in a reachable object that remains white at the end of marking • even worse - the set of roots is not fixed during collection • must shade both the new value and the old value

  44. Mark T1 root T2 updates A T2 pops What happens if we don’t shade new value T1 T2 A Major heap B Root enumeration

  45. Mark T1 root T2 updates A T2 pops What happens if we don’t shade new value T1 T2 A Major heap B Root enumeration

  46. Root enumeration End mark Sweep Mark T1 root T2 updates A T2 pops Mark T2 root What happens if we don’t shade new value T1 T2 A Major heap B

  47. Root enumeration End mark Mark T root T pushes B What happens if we don’t shade old value T A Major heap B

  48. Root enumeration End mark Mark T root T pushes B What happens if we don’t shade old value T A Major heap B

  49. Root enumeration Sweep End mark Mark T root T pushes B T updates A What happens if we don’t shade old value T A Major heap B

  50. Concurrent Allocation • Assign right color to new objects • during marking - allocated objects are black • allocated are reachable • sons of allocated are reachable and will eventually be set to black • sweeping - white if already swept, gray otherwise • set to gray to avoid immediate deallocation

More Related