1 / 45

David F. Bacon T.J. Watson Research Center

Parallel and Concurrent Real-time Garbage Collection Part III: Tracing, Snapshot, and Defragmentation. David F. Bacon T.J. Watson Research Center. Part 2: Trace (aka Mark). Clearing Monitor Table clearing JNI Weak Global clearing Debugger Reference clearing JVMTI Table clearing

mele
Download Presentation

David F. Bacon T.J. Watson Research Center

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parallel and ConcurrentReal-time Garbage CollectionPart III:Tracing, Snapshot, and Defragmentation David F. Bacon T.J. Watson Research Center

  2. Part 2: Trace (aka Mark) • Clearing • Monitor Table clearing • JNI Weak Global clearing • Debugger Reference clearing • JVMTI Table clearing • Phantom Reference clearing • Re-Trace 2 • Trace Master • (Trace*) • (Trace Terminate***) • Class Unloading • Completion • Finalizer Wakeup • Class Unloading Flush • Clearable Compaction** • Book-keeping • Initiation • Setup • turn double barrier on • Root Scan • Active Finalizer scan • Class scan • Thread scan** • switch to single barrier, color to black • Debugger, JNI, Class Loader scan • Trace • Trace* • Trace Terminate*** • Re-materialization 1 • Weak/Soft/Phantom Reference List Transfer • Weak Reference clearing** (snapshot) • Re-Trace 1 • Trace Master • (Trace*) • (Trace Terminate***) • Re-materialization 2 • Finalizable Processing • Flip • Move Available Lists to Full List* (contention) • turn write barrier off • Flush Per-thread Allocation Pages** • switch allocation color to white • switch to temp full list • Sweeping • Sweep* • Switch to regular Full List** • Move Temp Full List to regular Full List* (contention) * Parallel ** Callback *** Single actor symmetric

  3. U Y T W X Z a a a a a a b b b b b b Let’s Assume a Stack Snapshot Stack p r

  4. r U Y T X W Z a a a a a a b b b b b b Yuasa Algorithm Review:2(a): Copy Over-written Pointers Stack p

  5. U T W X Z Y Z Y W X a a a a a a a a a a b b b b b b b b b b Yuasa Algorithm Review: 2(b): Trace Stack p * Color is per-object mark bit

  6. s W Z U X Y T V X Z Y W a a a a a a a a a a a b b b b b b b b b b b Yuasa Algorithm Review: 2(c): Allocate “Black” Stack p

  7. Non-monotonicity in Tracing 1 GB

  8.   Which Design Pattern is This? Shared Monotonic Work Pool pool of work Per-Thread State Update

  9. Trace is Non-Monotonic…and requires thread-local data GC Master Thread GC Worker Threads Application Threads pool of non-monotonic work

  10. Basic Solution • Check if there are more work packets • If some found, trace is not done yet • If none found, “probably done” • Pause all threads • Re-scan for non-empty buffers • Resume all threads • If none, done • Otherwise, try again later

  11. Ragged Barriers:How to Stop without Stopping Epoch 1 Epoch 2 Epoch 3 Epoch 4 • Local Epoch • Global Min • Global Max A B C

  12. “Trace” Phase

  13. thread list pthread pthread_t Epoch agreed newest PhaseInfo phase workers writebuf 43 35 trace 3 43 color color BarrierOn BarrierOn true true pthread pthread_t writebuf 41 color color epoch epoch 37 43 The Thread’s Full Monty 16 64 256 sweep  dblb  Thread 1 next inVM  16 64 256 sweep  dblb  Thread 2 next inVM 

  14. WBufEpoch 39 39 37 37 Work Packet Data Structures wbuf.epoch < Epoch.agreed WBufCount 3 totrace wbuf trace wbuf fill free

  15. trace() { thread->epoch = Epoch.newest; bool canTerminate = true; if (WBufCount > 0) getWriteBuffers(); canTerminate = false; while (b = wbuf-trace.pop()) if (! moreTime()) return; int traceCount = traceBufferContents(b); canTerminate &= (traceCount == 0); while (b = totrace.pop()) if (! moreTime()) return; int TraceCount = traceBufferContents(b); canTerminate &= (traceCount == 0); if (canTerminate) traceTerminate(); }

  16. Getting Write Buffer Roots getWriteBuffers() { thread->epoch = fetchAndAdd(Epoch.newest, 1); WBufEpoch = thread->epoch; // mutators will dump wbufs LOCK(wbuf-fill); LOCK(wbuf-trace); for each (wbuf in wbuf-fill) if (wbuf.epoch < Epoch.agreed) remove wbuf from wbuf-fill; add wbuf to wbuf-trace; UNLOCK(wbuf-trace); UNLOCK(wbuf-fill); }

  17. Write Barrier writeBarrier(Object object, Field field, Object new) { if (BarrierOn) Object old = object[field]; if (old != null && ! old.marked) outOfLineBarrier(old); if (thread->dblb) // double barrier outOfLineBarrier(new); }

  18. Write Barrier Slow Path outOfLineBarrier(Object obj) { if (obj == null || obj.marked) return; obj.marked = true; bool epochOK = thread->wbuf->epoch == WBufEpoch; bool haveRoom = thread->wbuf->data < thread->wbuf->end; if (! (epochOK && enoughSpace)) thread->wbuf = flushWBufAndAllocNew(thread->wbuf); // Updates WBufEpoch, Epoch.newest *thread->wbuf->data++ = obj; }

  19. “Trace Terminate” Phase

  20. WBufEpoch 39 Trace Termination WBufCount 0 totrace wbuf trace wbuf fill free

  21. Epoch agreed newest PhaseInfo phase workers PhaseInfo phase workers 42 37 trace 1 terminate 1 BarrierOn BarrierOn true true Asynchronous Agreement desiredEpoch = Epooch.newest; … WAIT FOR Epoch.agreed == desiredEpoch if (WBufCount == 0) DONE else RESUME TRACING

  22. Ragged Barrier bool raggedBarrier(desiredEpoch, urgent) { if (Epoch.agreed >= desiredEpoch) return true; LOCK(threadlist); int latest = MAXINT; for each (Thread thread in threadlist) latest = min(latest, thread.epoch); Epoch.agreed = latest; UNLOCK(threadlist); if (epoch.agreed >= desiredEpoch) return true; else doCallbacks(RAGGED_BARRIER, true, urgent); return false; } * Non-locking implementation?

  23. Part 1: Scan Roots • Clearing • Monitor Table clearing • JNI Weak Global clearing • Debugger Reference clearing • JVMTI Table clearing • Phantom Reference clearing • Re-Trace 2 • Trace Master • (Trace*) • (Trace Terminate***) • Class Unloading • Completion • Finalizer Wakeup • Class Unloading Flush • Clearable Compaction** • Book-keeping • Initiation • Setup • turn double barrier on • Root Scan • Active Finalizer scan • Class scan • Thread scan** • switch to single barrier, color to black • Debugger, JNI, Class Loader scan • Trace • Trace* • Trace Terminate*** • Re-materialization 1 • Weak/Soft/Phantom Reference List Transfer • Weak Reference clearing** (snapshot) • Re-Trace 1 • Trace Master • (Trace*) • (Trace Terminate***) • Re-materialization 2 • Finalizable Processing • Flip • Move Available Lists to Full List* (contention) • turn write barrier off • Flush Per-thread Allocation Pages** • switch allocation color to white • switch to temp full list • Sweeping • Sweep* • Switch to regular Full List** • Move Temp Full List to regular Full List* (contention) * Parallel ** Callback *** Single actor symmetric

  24. Fuzzy Snapshot • Finally, we assume no magic • Initiate Collection

  25. “Initiate Collection” Phase

  26. thread list pthread pthread_t Epoch agreed newest PhaseInfo phase workers writebuf 42 37 initiate 3 43 color color BarrierOn BarrierOn true true pthread pthread_t writebuf 41 color color epoch epoch 37 43 Initiate: Color Black, Double Barrier 16 64 256 sweep  dblb  dblb  Thread 1 next inVM  16 64 256 sweep  dblb  dblb  Thread 2 next inVM 

  27. r U Y T X W Z a a a a a a b b b b b b What is a Double Barrier?Store both Old and New Pointers Stack p

  28. X Z W V a a a a b b b b Why Double Barrier? T2: m.b = n (writes X.b = W) T3: j.b = k (writes X.b = V) T1: q = p.b (reads X.b: V, W, or Z??) T1 Stack T2 Stack T3 Stack m j p q n k “Snapshot” = { V, W, X, Z }

  29. T2 T3 X Z W V a a a a b b b b Yuasa (Single) Barrier with 2 Writers T2: m.b = n (X.b = W) T3: j.b = k (X.b = V) T1 Stack T2 Stack T3 Stack m j p q n k

  30. T2 T3 X Z W V a a a a b b b b Yuasa Barrier Lost Update T2: m.b = n (X.b = W) T3: j.b = k (X.b = V) T1 Stack T2 Stack T3 Stack m j p q n k

  31. T1 T2 T3 n X Z W V a a a a b b b b Hosed! T1: Scan Stack T2: m.b = n (X.b = W) T3: j.b = k (X.b = V) T1: q = p.b (q <- W) T2: n = null T2: Scan Stack T1 Stack T2 Stack T3 Stack m j p q k

  32. “Thread Stack Scan” Phase

  33. thread list pthread pthread_t Epoch agreed newest PhaseInfo phase workers writebuf 37 42 initiate 3 43 color color BarrierOn BarrierOn true true pthread pthread_t writebuf 41 color color epoch epoch 37 43 Scan Stacks (double barrier off) 16 64 256 sweep  dblb  dblb  Scan Stack 1 Thread 1 next inVM  16 64 256 sweep  dblb  dblb  Scan Stack 2 Thread 2 next inVM 

  34. All Done!

  35. Boosting: Ensuring Progress Boost to master priority GC Master Thread GC Worker Threads Application Threads (may do GC work) Revert(?)

  36. Part 4: Defragmentation • Clearing • Monitor Table clearing • JNI Weak Global clearing • Debugger Reference clearing • JVMTI Table clearing • Phantom Reference clearing • Re-Trace 2 • Trace Master • (Trace*) • (Trace Terminate***) • Class Unloading • Completion • Finalizer Wakeup • Class Unloading Flush • Clearable Compaction** • Book-keeping • Initiation • Setup • turn double barrier on • Root Scan • Active Finalizer scan • Class scan • Thread scan** • switch to single barrier, color to black • Debugger, JNI, Class Loader scan • Trace • Trace* • Trace Terminate*** • Re-materialization 1 • Weak/Soft/Phantom Reference List Transfer • Weak Reference clearing** (snapshot) • Re-Trace 1 • Trace Master • (Trace*) • (Trace Terminate***) • Re-materialization 2 • Finalizable Processing • Flip • Move Available Lists to Full List* (contention) • turn write barrier off • Flush Per-thread Allocation Pages** • switch allocation color to white • switch to temp full list • Sweeping • Sweep* • Switch to regular Full List** • Move Temp Full List to regular Full List* (contention) • Defragmentation * Parallel ** Callback *** Single actor symmetric

  37. free 16 free 64 free 256 alloc’d sweep 256 16 64 ? free

  38. pointers have changed objects have moved Two-way Communication GC Master Thread GC Worker Threads Application Threads (may do GC work)

  39. T’ C D X Z B U E T a a a a a a a a a b b b b b b b b b Defragmentation

  40. T’ T T T T T’ a a a a a a b b b b b b Staccato Algorithm create store FREE NORMAL access load allocate nop defragment cas reap store access (abort) cas MOVED COPYING access load  commit cas

  41. Scheduling

  42. Guaranteeing Real Time • Guaranteeing usability without realtime: • Must know maximum live memory • If fragmentation & metadata overhead bounded • We also require: • Maximum allocation rate (MB/s) • How does the user figure this out??? • Very simple programming style • Empirical measurement • (Research) Static analysis

  43. Conclusions • Systems are made of concurrent components • Basic building blocks: • Locks • Try-locks • Compare-and-Swap • Non-locking stacks, lists, … • Monotonic phases • Logical clocks and asynchronous agreement • Encapsulate so others won’t suffer!

  44. http://www.research.ibm.com/metronomehttps://sourceforge.net/projects/tuningforkvphttp://www.research.ibm.com/metronomehttps://sourceforge.net/projects/tuningforkvp

  45. GC Phases … … … APP APP APP APP APP APP APP APP … … APP APP APP APP APP APP APP … … APP APP APP APP APP APP APP APP APP … APP APP APP APP APP APP APP Legend APP SNAPSHOT TRACE FLIP SWEEP TERMINATE

More Related