portable mostly concurrent mostly copying gc for multi processors
Download
Skip this Video
Download Presentation
Portable, mostly-concurrent, mostly-copying GC for multi-processors

Loading in 2 Seconds...

play fullscreen
1 / 30

Portable, mostly-concurrent, mostly-copying GC for multi-processors - PowerPoint PPT Presentation


  • 63 Views
  • Uploaded on

Portable, mostly-concurrent, mostly-copying GC for multi-processors. Tony Hosking Secure Software Systems Lab Purdue University. Platform assumptions. Symmetric multi-processor (SMP/CMP) Multiple mutator threads (Large heaps). Desirable properties. Maximize throughput

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Portable, mostly-concurrent, mostly-copying GC for multi-processors' - latif


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
portable mostly concurrent mostly copying gc for multi processors

Portable,mostly-concurrent,mostly-copying GC formulti-processors

Tony Hosking

Secure Software Systems Lab

Purdue University

platform assumptions
Platform assumptions
  • Symmetric multi-processor (SMP/CMP)
  • Multiple mutator threads
  • (Large heaps)
desirable properties
Desirable properties
  • Maximize throughput
  • Minimize collector pauses
  • Scalability
exploiting parallelism
Exploiting parallelism
  • Avoid contention
  • (Mostly-)Concurrent allocation
  • (Mostly-)Concurrent collection
concurrent allocation
Concurrent allocation
  • Use thread-private allocation “pages”
  • Threads contend for free pages
  • Each thread allocates from its own page
    • multiple small objects per page, or
    • multiple pages per large object
concurrent collection the tricolour abstraction
Concurrent collection:The tricolour abstraction
  • Black
    • “live”
    • scanned
    • cannot refer to white
  • Grey
    • “live” wavefront
    • still to be scanned
    • may refer to any color
  • White
    • hypothetical garbage
garbage collection
Garbage collection
  • White = whole heap
  • Shade root targets grey
  • While grey nonempty
    • Shade one grey object black
    • Shade its white children grey
  • At end, white objects are garbage
copying collection
Copying collection
  • Partition white from black by copying
  • Reclaim white partition wholesale
  • At next GC, “flip” black to white
incremental collection
Incremental collection

Mutator threads

concurrent collection
Concurrent collection

Mutator threads

Background GC thread

concurrent mutators
Concurrent mutators
  • Mutation changes reachability during GC
  • Loss of black/grey reference is safe
    • Non-white object losing its last reference will be garbage at next GC
  • New reference from black to white
    • New reference may make target live
    • Collector may never see new reference
  • Mutations may require compensation
compensation options
Compensation options
  • Prevent mutator from creating black-to-white references
    • write barrier on black
    • read barrier on grey to prevent mutator obtaining white refs
  • Prevent destruction of any path from a grey object to a white object without telling GC
    • write barrier on grey
mostly copying gc bartlett
Mostly-copying GC [Bartlett]
  • Copying collection with ambiguous roots
    • Uncooperative compilers
    • Untidy references
    • Explicit pinning
  • Pin ambiguously-referenced objects
    • Shade their page grey without copying
  • Assume heap accuracy
    • Copy remaining heap-referenced objects
incremental mcgc detreville
Incremental MCGC[DeTreville]
  • Enforce grey mutator invariant
    • STW greys ambiguously-referenced pages
    • Read barrier on grey using VM page protection
  • Read barrier
    • Stop mutator threads
    • Unprotect page
    • Copy white targets to grey
    • Shade page black
    • Restart threads
  • Atomic system call wrappers unprotect parameter targets (otherwise traps in OS return error)
concurrent mcgc
Concurrent MCGC?
  • Stopping all threads at each increment is prohibitive on SMP & impedes concurrency
  • BUT barriers difficult to place on ambiguous references with uncooperative compilers
  • ALSO Preemptive scheduling may break wrapper atomicity
mostly concurrent mcgc
Mostly-concurrent MCGC
  • Enforce black mutator invariant
    • STW blackens ambiguously-referenced pages
    • Read barrier on load of accurate (tidy) grey reference
  • Read barrier:
    • Blacken grey references as they are loaded
  • No system call wrappers: arguments are always black
read barrier on load of grey
Read barrier on load of grey
  • Object header bit marks grey objects
  • Inline fast path checks grey bit in target header, calls out to slow path if set
  • Out-of-line slow path:
    • Lock heap meta-data
    • For each (grey) source object in target page
      • Copy white targets to grey
      • Clear grey header bit
    • Shade target page black
    • Unlock heap meta-data
coherence for fast path
Coherence for fast path
  • STW phase synchronizes mutators’ views of heap state
  • Grey bits are set only in newly-copied objects (ie, newly-allocated grey pages) since most recent STW
  • Mutators can never see a cleared grey header unless the page is also black
  • Seeing a spurious grey header due to weak ordering is benign: slow path will synchronize
implementation
Implementation
  • Modula-3:
    • gcc-based compiler back-end
    • No tricky target-specific stack-maps
    • Compiler front-end emits barriers
    • M3 threads map to preemptively-scheduled POSIX pthreads
    • Stop/start threads: signals + semaphores, or OS primitives if available
    • Simple to port: Darwin (OS X), Linux, Solaris, Alpha/OSF
experiments
Experiments
  • Parallelized GCOld benchmark to permit throughput measurements for multiple mutators
  • Measures steady-state GC throughput
  • 2 platforms:
    • 2 x 2.3GHz PowerPC Macintosh Xserve running OS X 10.4.4
    • 8 x 700MHz Intel Pentium 3 SMP running Linux 2.6
conclusions
Conclusions
  • Mostly-concurrent,mostly-copying collection is feasible for multi-processors (proof-of-existence)
  • Performance is good (scalable)
  • Portable: changes only to compiler front-end to introduce barriers, and to GC run-time system
  • Compiler back-end unchanged: full-blown optimizations enabled, no stack-map overheads
future work
Future work
  • Convert read barrier to “clean” only target object instead of whole page
ad