1 / 42

David F. Bacon Perry Cheng V.T. Rajan IBM T.J. Watson Research Center

The Metronome: A Hard Real-time Garbage Collector. David F. Bacon Perry Cheng V.T. Rajan IBM T.J. Watson Research Center. The Problem. Real-time systems growing in importance Many CPUs in a BMW: “80% of innovation in SW” Programmers left behind Still using assembler and C

clovis
Download Presentation

David F. Bacon Perry Cheng V.T. Rajan IBM T.J. Watson Research Center

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Metronome: A Hard Real-time Garbage Collector David F. Bacon Perry Cheng V.T. Rajan IBM T.J. Watson Research Center

  2. The Problem • Real-time systems growing in importance • Many CPUs in a BMW: “80% of innovation in SW” • Programmers left behind • Still using assembler and C • Lack productivity advantages of Java • Result • Complexity • Low reliability or very high validation cost

  3. Problem Domain • Hard real-time garbage collection • Uniprocessor • Multiprocessors very rare in real-time systems • Complication: collector must be finely interleaved • Simplification: memory model easier to program • No truly concurrent operations • Sequentially consistent

  4. Metronome Project Goals • Make GC feasible for hard real-time systems • Provide simple application interface • Develop technology that is efficient: • Throughput, Space comparable to stop-the-world • BREAK THE MILLISECOND BARRIER • While providing even CPU utilization

  5. Outline • What is “Real-Time” Garbage Collection? • Problems with Previous Work • Overview of the Metronome • Scheduling • Empirical Results • Conclusions

  6. What is “Real-Time”Garbage Collection?

  7. 3 Uniprocessor GC Types STW Inc RT time GC #1 GC #2

  8. Utilization vs Time: 2s Window STW Inc RT

  9. Utilization vs Time: .4 s Window STW Inc RT

  10. Minimum Mutator Utilization STW Inc RT

  11. What is Real-time? It Is… • Maximum pause time < required response • CPU Utilization sufficient to accomplish task • Measured with MMU • Memory requirement < resource limit

  12. Problems with Previous Work

  13. No Compaction • Hypothesis: fragmentation not a problem • Can use avoidance and coalescing [ Johnstone] • Non-moving incremental collection is simpler • Problem: long-running applications • Reboot your car every 500 miles? 4 X max live 3 X max live 2 X max live 1 X max live

  14. Copying Collection • Idea: copy into to-space concurrently • Compaction is part of basic operation • Problem: space usage • 2 semi-spaces plus space to mutate during GC • Requires 4-8 X max live data in practice 4 X max live 3 X max live 2 X max live 1 X max live

  15. Work-Based Scheduling • The Baker fallacy: • “A real-time list processing system is one in which the time required by the elementary list processing operations…is bounded by a small constant” [Baker’78] • Implicitly assumes GC work done in mutator • What does “small constant” mean? • Typically, constant is not so small • And there is variability (fault rate analogy)

  16. Overview of the Metronome

  17. Is it real-time? Yes • Maximum pause time < 4 ms • MMU > 50% ±2% • Memory requirement < 2 X max live

  18. Components of the Metronome • Segregated free list allocator • Geometric size progression limits internal fragmentation • Write barrier: snapshot-at-the-beginning [Yuasa] • Read barrier: to-space invariant [Brooks] • New techniques with only 4% overhead • Incremental mark-sweep collector • Mark phase fixes stale pointers • Selective incremental defragmentation • Moves < 2% of traced objects • Arraylets: bound fragmentation, large object ops • Time-based scheduling Old New

  19. Segregated Free List Allocator • Heap divided into fixed-size pages • Each page divided into fixed-size blocks • Objects allocated in smallest block that fits 12 16 24

  20. Fragmentation on a Page • Internal: wasted space at end of object • Page-internal: wasted space at end of page • External: blocks needed for other size page-internal internal external

  21. Limiting Internal Fragmentation • Choose page size P and block sizes sk such that • sk = sk-1(1+ρ) • smax = P ρ • Then • Internal and page-internal fragmentation < ρ • Example: • P =16KB, ρ =1/8, smax = 2KB • Internal and page-internal fragmentation < 1/8

  22. Fragmentation: ρ=1/8 vs ρ=1/2

  23. Write Barrier: Snapshot-at-start • Problem: mutator changes object graph • Solution: write barrier prevents lost objects • Logically, collector takes atomic snapshot • Objects live at snapshot will not be collected • Write barrier saves overwritten pointers [Yuasa] • Write buffer must be drained periodically B B A C WB

  24. Read Barrier: To-space Invariant • Problem: Collector moves objects (defragmentation) • and mutator is finely interleaved • Solution: read barrier ensures consistency • Each object contains a forwarding pointer [Brooks] • Read barrier unconditionally forwards all pointers • Mutator never sees old versions of objects X X Y A Y A A′ Z Z From-space To-space BEFORE AFTER

  25. Read Barrier Optimizations • Barrier variants: when to redirect • Lazy: easier for collector (no fixup) • Eager: better for performance (loop over a[i]) • Standard optimizations: CSE, code motion • Problem: pointers can be null • Augment read barrier for GetField(x,offset): tmp = x[offset]; return tmp == null ? null : tmp[redirect] • Optimize by null-check combining, sinking

  26. Read Barrier Results • Conventional wisdom: read barriers too slow • Previous studies: 20-40% overhead [Zorn,Nielsen]

  27. Incremental Mark-Sweep • Mark/sweep finely interleaved with mutator • Write barrier ensures no lost objects • Must treat objects in write buffer as roots • Read barrier ensures consistency • Marker always traces correct object • With barriers, interleaving is simple

  28. Pointer Fixup During Mark • When can a moved object be freed? • When there are no more pointers to it • Mark phase updates pointers • Redirects forwarded pointers as it marks them • Object moved in collection n can be freed: • At the end of mark phase of collection n+1 X Y A A′ Z From-space To-space

  29. Selective Defragmentation • When do we move objects? • When there is fragmentation • Usually, program exhibits locality of size • Dead objects are re-used quickly • Defragment either when • Dead objects are not re-used for a GC cycle • Free pages fall below limit for performing a GC • In practice: we move 2-3% of data traced • Major improvement over copying collector

  30. Arraylets A • Large arrays create problems • Fragment memory space • Can not be moved in a short, bounded time • Solution: break large arrays into arraylets • Access via indirection; move one arraylet at a time • Optimizations • Type-dependent code optimized for contiguous case • Opportunistic contiguous allocation A1 A2 A3

  31. Scheduling

  32. Work-based Scheduling • Trigger the collector to collect CW bytes • Whenever the mutator allocates QW bytes MMU (CPU Utilization) Space (MB) Window Size (s) - log Time (s)

  33. Time-based Scheduling • Trigger collector to run for CT seconds • Whenever mutator runs for QT seconds MMU (CPU Utilization) Space (MB) Window Size (s) - log Time (s)

  34. Parameterization Tuner Δt s u Mutator a*(ΔGC) m Allocation Rate Real Time Interval Maximum Live Memory Maximum Used Memory Collector R CPU Utilization Collection Rate

  35. Empirical Results

  36. Pause time distribution: javac Time-based Scheduling Work-based Scheduling 12 ms 12 ms

  37. Utilization vs Time: javac Time-based Scheduling Work-based Scheduling 1.0 1.0 0.8 0.8 Utilization (%) 0.6 0.6 0.45 0.4 0.4 0.2 0.2 0.0 0.0 Time (s) Time (s)

  38. MMU: javac

  39. Space Usage: javac

  40. Conclusions

  41. Conclusions • The Metronome provides true real-time GC • First collector to do so without major sacrifice • Short pauses (4 ms) • High MMU during collection (50%) • Low memory consumption (2x max live) • Critical features • Time-based scheduling • Hybrid, mostly non-copying approach • Integration w/compiler

  42. Future Work • Two main goals: • Reduce pause time, memory requirement • Increase predictability • Pause time: • Expect sub-millisecond using current techniques • For 10’s of microseconds, need interrupt-based • Predictability • Studying parameterization of collector • Good research area

More Related