1 / 43

Ulterior Reference Counting Fast GC Without The Wait

Ulterior Reference Counting Fast GC Without The Wait. Steve Blackburn – Kathryn McKinley Presented by: Dimitris Prountzos Slides adapted from presentation by Steve Blackburn. Outline. Throughput-Responsiveness problem Reference counting & optimizations Ulterior in detail BG-RC in action

jolene
Download Presentation

Ulterior Reference Counting Fast GC Without The Wait

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ulterior Reference CountingFast GC Without The Wait Steve Blackburn – Kathryn McKinley Presented by: Dimitris Prountzos Slides adapted from presentation by Steve Blackburn

  2. Outline • Throughput-Responsiveness problem • Reference counting & optimizations • Ulterior in detail • BG-RC in action • Experimental evaluation • Conclusion

  3. mutator mutator poor responsiveness maximum pause GC CPU Utilization (time) Throughput/Responsiveness Trade-off • GC and mutator share CPU • Throughput: net GC/mutator ratio • Responsivness: length of GC pauses

  4. The Ulterior approach • Match mechanisms to object demographics • Copying nursery (young space) • Highly mutated, high mortality young objects • Ignores most mutations • GC time proportional to survivors, space efficient • RC mature space • Low mutation, low mortality old objects • GC time proportional to mutations, space efficient • Generalize deferred RC to heap objects • Defer fields of highly mutated objects & enumerate them quickly • Reference count only infrequently mutated fields

  5. Pure Reference Counting • Tracks mutations: RCM(p) RCM(p) generates a decrement and an increment for the before and after values of p: RCM(p) RC(pbefore)--, RC(pafter)++ • If RC==0, Free 1 a 1 b RC space

  6. Pure Reference Counting • Tracks mutations: RCM(p) RCM(p) generates a decrement and an increment for the before and after values of p: RCM(p) RC(pbefore)--, RC(pafter)++ • If RC==0, Free 1 a 0 1 b c RC space

  7. Pure Reference Counting • Tracks mutations: RCM(p) RCM(p) generates a decrement and an increment for the before and after values of p: RCM(p) RC(pbefore)--, RC(pafter)++ • If RC==0, Free 1 a  0 1 b c RC space

  8. Pure Reference Counting • Tracks mutations: RCM(p) RCM(p) generates a decrement and an increment for the before and after values of p: RCM(p) RC(pbefore)--, RC(pafter)++ • If RC==0, Free 1 a 1 c RC space RCM(p) for every mutation is very expensive

  9. RC Optimizations • Buffering: apply RC(p)--, RC(p)++ later • Coalescing: apply RCM(p) only for the initial and final values of p (coalesce intermediate values): {RCM(p), RCM(p1), ... RCM(pn)} RC(pinitial)--, RC(pfinal)++ • Deferral of RCM events

  10. Deferred Reference CountingGoal: Ignore RCM(p) for stacks & registers • Deferral of p • A mutation of pdoes not generate an RCM(p) • Correctness: • For all deferred p: RCR(p) at each GC • Retain Event: RCR(p) • po temporarily retains oregardless of RC(o) • Deutsch/Bobrow use a Zero Count Table • Bacon et al. use a temporary increment

  11. Stacks & Regs Classic DeferralIn deferral phase: Ignore RCM(p) for stacks & registers 0 a 1 b RC space

  12. Stacks & Regs Classic DeferralIgnore RCM(p) for stacks & registers 0 a 0 1 b c RC space Breaks RC==0 Invariant

  13. Classic Deferral (Bacon et al.) • Divide execution in epochs • Store information in buffers • Root buffer (RB): Store 1st level objects • Increment buffer (IB): Store increments to 1st level objects • Decrement buffer (DB): Store decrements to 1st level objects • At GC time do: • Look at RB and apply temporary increments to all objects there • Process IB of this epoch • Look at RB of previous epoch and apply decrements to all objects there • Process DB of previous epoch • During DB processing recycle o if RC(o)=0 • Avoid race conditions by • Processing IB before DB • Processing DB of one epoch behind

  14. Stacks & Regs Classic Deferral (Bacon et al.) At GC time, RCR(p) for root pointers applies temporary increments. 1 a 1 1 b c RC space a b dec buf root buf

  15. Stacks & Regs Classic Deferral (Bacon et al.) At next GC, apply decrements 1 a 1 1 b c RC space a b dec buf root buf

  16. Stacks & Regs Classic Deferral (Bacon et al.) Key: Efficient enumeration of deferred pointers At next GC, apply decrements 1 a 1 1 b c RC space a b dec buf root buf

  17. Stacks & Regs Classic Deferral (Bacon et al.) Better, but not good enough! 1 a 1 1 b c RC space dec buf root buf

  18. Ulterior Reference Counting • Idea: Extend deferral to select heap pointers • e.g. All pointers within nursery objects • Deferral is not a fixed property of p • e.g. A nursery object gets promoted Integrate Event I(p) • Changes p from deferred to not deferred

  19. BG-RCBounded Nursery Generational - RC • Heap organization • Bounded copying nursery • Ignore mutations to nursery pointer fields • RC old space • Object remembering, coalescing, buffering • Collection • Process roots • Nursery phase promotes live p to old space and I(p) • RC phase processes object buffer, dec buffer

  20. Stacks Regs View of heap in Ulterior RC defer remember 1 1 r s a b defer 1 1 t d e RC space non-RC space • How can we efficiently • Enumerate all deferred pointer fields ? • Remember old to young pointers ?

  21. Bringing it Together • Deferral: • Defer nursery & roots • Perform I(p) on nursery promotion • Piggyback on copying nursery collection • Coalescing: • Remember mutated RC objects • Upon first mutation, dec each referent • At GC time, inc each referent • Piggyback remset onto this mechanism

  22. BG-RC Write Barrier 1privatevoid writeBarrier(VM_AddresssrcObj, 2VM_AddresssrcSlot, 3VM_AddresstgtObj) 4throwsVM_PragmaInline { 5 if (getLogState(srcObj) != LOGGED) 6 writeBarrierSlow(srcObj); 7VM_Magic.setMemoryAddress(srcSlot, tgtObj); 8 } 9 } // unsync check for uniqueness 10privatevoid writeBarrierSlow(VM_AddresssrcObj) 11throwsVM_PragmaNoInline { 12 if(attemptToLog(srcObj)) { 13 modifiedBuffer.push(srcObj); 14 enumeratePointersToDecBuffer(srcObj); // trade-off for sparsely 15 setLogState(srcObj, LOGGED); // modified objects 16 } 17 }

  23. Stacks Regs BG-RCMutation Phase 1 0 b a 1 1 d e RC space non-RC space obj buf dec buf root buf

  24. Stacks Regs BG-RCMutation Phase 1 0 b a  1 1 d e RC space non-RC space b d e obj buf dec buf root buf

  25. Stacks Regs BG-RCMutation Phase 1 0 b a 1 1 d e RC space non-RC space b d e obj buf dec buf root buf

  26. Stacks Regs BG-RCMutation Phase 1 0 r b a 1 1 d e RC space non-RC space b d e obj buf dec buf root buf

  27. Stacks Regs BG-RCMutation Phase 1 0 r s b a 1 1 d e RC space non-RC space b d e obj buf dec buf root buf

  28. Stacks Regs BG-RCMutation Phase 1 0 r s b a 1 1 t d e RC space non-RC space b d e obj buf dec buf root buf

  29. Stacks Regs BG-RCMutation Phase 1 0 r s b a 1 1 t d e RC space non-RC space b d e obj buf dec buf root buf

  30. Stacks Regs BG-RCNursery Collection: Scan Roots 1 1 r s b a 1 1 t d e RC space non-RC space b b d e obj buf dec buf root buf

  31. Stacks Regs BG-RCNursery Collection: Scan Roots 1 1 1 r s b a s 1 1 t d e RC space non-RC space b b d e s obj buf dec buf root buf

  32. Stacks Regs BG-RCNursery Collection: Scan Roots 1 1 1 r s b a s 1 1 2 t d t e RC space non-RC space b b d e s obj buf dec buf root buf

  33. Stacks Regs BG-RCNursery Collection: Process Object Buffer 2 1 1 1 r s b a r s 1 1 3 t d t e RC space non-RC space b b d  e s obj buf dec buf root buf

  34. Stacks Regs BG-RCNursery Collection: Reclaim Nursery 2 1 1 1 b a r r s s Reclaim 1 1 3 d e t t RC space non-RC space b d e s obj buf dec buf root buf

  35. Stacks Regs BG-RCRC Collection: Process Decrement Buffer 2 1 1 1 b a r s 0 1 3 d t e RC space non-RC space b d  e s obj buf dec buf root buf

  36. Stacks Regs BG-RCRC Collection: Recursive Decrement 1 1 1 1 b a r s  0 1 3 free d t e RC space non-RC space b e s obj buf dec buf root buf

  37. Stacks Regs BG-RCRC Collection: Process Decrement Buffer 1 1 1 1 b a r s 1 2 t e RC space non-RC space b e  s obj buf dec buf root buf

  38. Stacks Regs BG-RCCollection Complete! 1 1 1 1 b a r s 1 2 t e RC space non-RC space b b  s s  obj buf dec buf root buf

  39. Controlling Pause Times • Modest bounded nursery size • Meta Data • Decrement and modified object buffers • Trigger a collection if too big • RC time cap • Limits time recursively decrementing RC obj & in cycle detection • Cycles - pure RC is incomplete • Use Bacon/Rajan trial deletion algorithm

  40. Experimental evaluation • Jikes RVM with MMTK • Compare MS, BG-MS, BG-RC, RC • Examine various heap sizes • Collection triggers • Each 4MB of allocation for BG-RC (1 MB for RC) • Time cap of 60 ms • Cycle detection at 512 KB

  41. MS BG-MS BG-RC RC norm time max pause norm time max pause norm time max pause norm time max pause jess 1.91 182 1.00 181 0.99 44 2.36 131 javac 1.01 268 1.00 285 1.00 68 1.78 580 jack 1.52 184 1.00 185 0.94 44 1.66 72 raytrace 1.31 203 1.00 184 1.03 49 1.71 133 mtrt 1.29 241 1.00 180 1.04 49 1.75 130 cmpress .98 160 1.00 175 0.88 68 0.93 72 pjbb 1.00 264 1.00 281 1.00 53 1.33 297 db 1.01 238 1.00 244 1.01 59 1.11 43 mpeg 1.05 185 1.00 178 0.96 43 1.14 121 mean 1.23 214 1.00 210 0.98 53 1.53 175 Throughput/Pause time Moderate Heap Size

  42. Throughput & Responsiveness

  43. Conclusion • Ulterior design based on careful study of object demographics and making collector aware of them • Extends deferred RC to heap objects • Practically shows that high throughput & low pause times are compatible

More Related