Portable mostly concurrent mostly copying gc for multi processors
1 / 30

Portable, mostly-concurrent, mostly-copying GC for multi-processors - PowerPoint PPT Presentation

  • Uploaded on

Portable, mostly-concurrent, mostly-copying GC for multi-processors. Tony Hosking Secure Software Systems Lab Purdue University. Platform assumptions. Symmetric multi-processor (SMP/CMP) Multiple mutator threads (Large heaps). Desirable properties. Maximize throughput

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Portable, mostly-concurrent, mostly-copying GC for multi-processors' - latif

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Portable mostly concurrent mostly copying gc for multi processors

Portable,mostly-concurrent,mostly-copying GC formulti-processors

Tony Hosking

Secure Software Systems Lab

Purdue University

Platform assumptions
Platform assumptions

  • Symmetric multi-processor (SMP/CMP)

  • Multiple mutator threads

  • (Large heaps)

Desirable properties
Desirable properties

  • Maximize throughput

  • Minimize collector pauses

  • Scalability

Exploiting parallelism
Exploiting parallelism

  • Avoid contention

  • (Mostly-)Concurrent allocation

  • (Mostly-)Concurrent collection

Concurrent allocation
Concurrent allocation

  • Use thread-private allocation “pages”

  • Threads contend for free pages

  • Each thread allocates from its own page

    • multiple small objects per page, or

    • multiple pages per large object

Concurrent collection the tricolour abstraction
Concurrent collection:The tricolour abstraction

  • Black

    • “live”

    • scanned

    • cannot refer to white

  • Grey

    • “live” wavefront

    • still to be scanned

    • may refer to any color

  • White

    • hypothetical garbage

Garbage collection
Garbage collection

  • White = whole heap

  • Shade root targets grey

  • While grey nonempty

    • Shade one grey object black

    • Shade its white children grey

  • At end, white objects are garbage

Copying collection
Copying collection

  • Partition white from black by copying

  • Reclaim white partition wholesale

  • At next GC, “flip” black to white

Incremental collection
Incremental collection

Mutator threads

Concurrent collection
Concurrent collection

Mutator threads

Background GC thread

Concurrent mutators
Concurrent mutators

  • Mutation changes reachability during GC

  • Loss of black/grey reference is safe

    • Non-white object losing its last reference will be garbage at next GC

  • New reference from black to white

    • New reference may make target live

    • Collector may never see new reference

  • Mutations may require compensation

Compensation options
Compensation options

  • Prevent mutator from creating black-to-white references

    • write barrier on black

    • read barrier on grey to prevent mutator obtaining white refs

  • Prevent destruction of any path from a grey object to a white object without telling GC

    • write barrier on grey

Mostly copying gc bartlett
Mostly-copying GC [Bartlett]

  • Copying collection with ambiguous roots

    • Uncooperative compilers

    • Untidy references

    • Explicit pinning

  • Pin ambiguously-referenced objects

    • Shade their page grey without copying

  • Assume heap accuracy

    • Copy remaining heap-referenced objects

Incremental mcgc detreville
Incremental MCGC[DeTreville]

  • Enforce grey mutator invariant

    • STW greys ambiguously-referenced pages

    • Read barrier on grey using VM page protection

  • Read barrier

    • Stop mutator threads

    • Unprotect page

    • Copy white targets to grey

    • Shade page black

    • Restart threads

  • Atomic system call wrappers unprotect parameter targets (otherwise traps in OS return error)

Concurrent mcgc
Concurrent MCGC?

  • Stopping all threads at each increment is prohibitive on SMP & impedes concurrency

  • BUT barriers difficult to place on ambiguous references with uncooperative compilers

  • ALSO Preemptive scheduling may break wrapper atomicity

Mostly concurrent mcgc
Mostly-concurrent MCGC

  • Enforce black mutator invariant

    • STW blackens ambiguously-referenced pages

    • Read barrier on load of accurate (tidy) grey reference

  • Read barrier:

    • Blacken grey references as they are loaded

  • No system call wrappers: arguments are always black

Read barrier on load of grey
Read barrier on load of grey

  • Object header bit marks grey objects

  • Inline fast path checks grey bit in target header, calls out to slow path if set

  • Out-of-line slow path:

    • Lock heap meta-data

    • For each (grey) source object in target page

      • Copy white targets to grey

      • Clear grey header bit

    • Shade target page black

    • Unlock heap meta-data

Coherence for fast path
Coherence for fast path

  • STW phase synchronizes mutators’ views of heap state

  • Grey bits are set only in newly-copied objects (ie, newly-allocated grey pages) since most recent STW

  • Mutators can never see a cleared grey header unless the page is also black

  • Seeing a spurious grey header due to weak ordering is benign: slow path will synchronize


  • Modula-3:

    • gcc-based compiler back-end

    • No tricky target-specific stack-maps

    • Compiler front-end emits barriers

    • M3 threads map to preemptively-scheduled POSIX pthreads

    • Stop/start threads: signals + semaphores, or OS primitives if available

    • Simple to port: Darwin (OS X), Linux, Solaris, Alpha/OSF


  • Parallelized GCOld benchmark to permit throughput measurements for multiple mutators

  • Measures steady-state GC throughput

  • 2 platforms:

    • 2 x 2.3GHz PowerPC Macintosh Xserve running OS X 10.4.4

    • 8 x 700MHz Intel Pentium 3 SMP running Linux 2.6

Read barriers stw 1 user level mutator thread work 1
Read Barriers: STW1 user-level mutator thread, work=1

Elapsed time s 1 system level mutator thread work 1
Elapsed time (s)1 system-level mutator thread, work=1

Heap size 1 system level mutator thread
Heap size1 system-level mutator thread

Bmu 1 system level mutator thread work 1000 ratio 1
BMU1 system-level mutator thread, work=1000, ratio=1

Scalability work 1000 ratio 1 8xp3
Scalabilitywork=1000, ratio=1, 8xP3

Java hotspot server work 1000 8xp3
Java Hotspot serverwork=1000, 8xP3


  • Mostly-concurrent,mostly-copying collection is feasible for multi-processors (proof-of-existence)

  • Performance is good (scalable)

  • Portable: changes only to compiler front-end to introduce barriers, and to GC run-time system

  • Compiler back-end unchanged: full-blown optimizations enabled, no stack-map overheads

Future work
Future work

  • Convert read barrier to “clean” only target object instead of whole page

Scalability work 10 ratio 1 8xp3
Scalabilitywork=10, ratio=1, 8xP3

Java hotspot server work 10 8xp3
Java Hotspot serverwork=10, 8xP3