1 / 42

A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization

A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization. David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research. Roadmap. What is Real-time Garbage Collection? Pause Time, CPU utilization (MMU), and Space Usage Heap Architecture

chelsa
Download Presentation

A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research

  2. Roadmap • What is Real-time Garbage Collection? • Pause Time, CPU utilization (MMU), and Space Usage • Heap Architecture • Types of Fragmentation • Incremental Compaction • Read Barriers • Barrier Performance • Scheduling: Time-Based vs. Work-Based • Empirical Results • Pause Time Distribution • Minimum Mutator Utilization (MMU) • Pause Times • Summary and Conclusion

  3. Problem Domain • Real-time Embedded Systems • Memory usage important • Uniprocessor

  4. 3 Styles of Uniprocessor Garbage Collection: Stop-the-World vs. Incremental vs. Real-Time STW Inc RT time

  5. 1.7s 1.5s Pause Times (Average and Maximum) 1.6s STW 0.5s 0.5s 0.9s 0.7s 0.3s 0.3s 0.5s Inc 0.18s 0.15 - 0.19 s RT

  6. 2.0 s window Coarse-Grained Utilization vs. Time STW Inc RT

  7. 0.4 s window STW Fine-Grained Utilization vs. Time Inc RT

  8. STW Minimum Mutator Utilization (MMU) Inc RT

  9. Space Usage over Time 2 X max live trigger max live

  10. Problems with Existing RT Collectors 4 X max live 3 X max live Non-moving Collector 2 X max live max live 4 X max live 3 X max live Replicating Collector 2 X max live max live Not fully incremental, Tight coupling, Work-based scheduling

  11. Goals Results • Real-Time ~10 ms • Low Space Overhead ~2X • Good Utilization during GC ~ 40% • Solution • Incremental Mark-Sweep Collector • Write barrier – snapshot-at-the-beginning [Yuasa] • Segregated free list heap architecture • Read Barrier – to support defragmentation [Brooks] • Incremental defragmentation • Segmented arrays – to bound fragmentation Our Collector

  12. Roadmap • What is Real-time Garbage Collection? • Pause Time, CPU utilization (MMU), and Space Usage • Heap Architecture • Types of Fragmentation • Incremental Compaction • Read Barriers • Barrier Performance • Scheduling: Time-Based vs. Work-Based • Empirical Results • Pause Time Distribution • Minimum Mutator Utilization (MMU) • Pause Times • Summary and Conclusion

  13. Fragmentation and Compaction • Intuitively: available but unusable memory • avoidance and coalescing - no guarantees • compaction used free needed

  14. sz 24 sz 32 • Segregated Free Lists • heap divided into pages • each page has equally-sizes blocks (1 object per block) • Large arrays are segmented Heap Architecture used free page-internal internal external

  15. Controlling Internal and Page-Internal Fragmentation • Choose page size (page) and block sizes (sk) • If sk = sk-1 (1 + q), internal fragmentation [ q • page-internal fragmentation [ page / smax • E.g. If page = 16K, q = 1/8, smax= 2K, maximum non-external fragmentation to 12.5%.

  16. Fragmentation - small heap (q = 1/8 vs. q = 1/2) q =1/2 q =1/8

  17. Incremental Compaction • Compact only a part of the heap • Requires knowing what to compact ahead of time • Key Problems • Popular objects • Determining references to moved objects used

  18. Incremental Compaction: Redirection • Access all objects via per-object redirection pointers • Redirection is initially self-referential • Move an object by updating ONE redirection pointer replica original

  19. Consistency via Read Barrier [Brooks] • Correctness requires always using the replica • E.g. field selection must be modified x x x[offset] x x[redirect][offset] normal access read barrier access

  20. Our read barrier is decoupled from collection • Complication: In Java, any reference might be null • actual read barrier for GetField(x,offset) must be augmented Some Important Details tmp = x[offset]; return (tmp == null) ? null : tmp[redirect] • CSE, code motion (LICM and sinking), null-check combining • Barrier Variants - when to redirect • lazy - easier for collector • eager - better for optimization

  21. Conventional wisdom says read barriers are too expensive • Studies found overhead of 20-40% (Zorn, Nielsen) • Our barrier has 4-6% overhead with optimizations Barrier Overhead to Mutator

  22. Program Start Stack Heap (one size only)

  23. Program is allocating Stack Heap free allocated

  24. GC starts Stack Heap free unmarked

  25. Program allocating and GC marking Stack Heap free unmarked marked or allocated

  26. Sweeping away blocks Stack Heap free unmarked marked or allocated

  27. GC moving objects and installing redirection Stack Heap free evacuated allocated

  28. 2nd GC starts tracing and redirection fixup Stack Heap free evacuated unmarked marked or allocated

  29. 2nd GC complete Stack Heap free allocated

  30. Roadmap • What is Real-time Garbage Collection? • Pause Time, CPU utilization (MMU), and Space Usage • Heap Architecture • Types of Fragmentation • Incremental Compaction • Read Barriers • Barrier Performance • Scheduling: Time-Based vs. Work-Based • Empirical Results • Pause Time Distribution • Minimum Mutator Utilization (MMU) • Pause Times • Summary and Conclusion

  31. Scheduling Issues • bad CPU utilization and space usage • loose program and collector coupling • Time-Based • Trigger the collector to run for CT seconds whenever the program runs for QT seconds • Work-Based • Trigger the collector to collect CW work whenever the program allocate QW bytes Scheduling the Collector

  32. Time-Based Scheduling • Trigger the collector to run for CT seconds whenever the program runs for QT seconds MMU (CPU Utilization) Space (Mb) Window Size (s) Time (s)

  33. Work-Based Scheduling • Trigger the collector to collect CW bytes whenever the program allocates QW bytes MMU (CPU Utilization) Space (Mb) Window Size (s) Time (s)

  34. Roadmap • What is Real-time Garbage Collection? • Pause Time, CPU utilization (MMU), and Space Usage • Heap Architecture • Types of Fragmentation • Incremental Compaction • Read Barriers • Barrier Performance • Scheduling: Time-Based vs. Work-Based • Empirical Results • Pause Time Distribution • Minimum Mutator Utilization (MMU) • Pause Times • Summary and Conclusion

  35. Pause Time Distribution for javac (Time-Based vs. Work-Based) 12 ms 12 ms

  36. Utilization vs. Time for javac (Time-Based vs. Work-Based) 1.0 1.0 0.8 0.8 Utilization (%) Utilization (%) 0.6 0.6 0.4 0.45 0.4 0.2 0.2 0 0 Time (s) Time (s)

  37. Minimum Mutator Utilization for javac (Time-Based vs. Work-Based)

  38. Space Usage for javac (Time-Based vs. Work-Based)

  39. Intrinsic Tradeoff • 3 inter-related factors: Space Bound (tradeoff) Utilization (tradeoff) Allocation Rate (lower is better) • Other factors Collection rate (higher is better) Pointer density (lower is better)

  40. Summary: Mostly Non-moving RT GC • Read Barriers • Permits incremental defragmentation • Overhead is 4-6% with compiler optimizations • Low Space Overhead • Space usage is only about 2 X max live data • Fragmentation still bounded • Consistent Utilization • Always at least 45% at 12 ms resolution

  41. Conclusions • Real-time GC is real • There are tradeoffs just like in traditional GC • Scheduling should be primarily time-based • Fallback to work-based due to user’s incorrect parameter estimations • Incremental defragmentation is possible • Compiler support is important!

  42. Future Work • Lowering the real-time resolution • Sub-millisecond worst-case pause • Main issue: breaking up stack scan • Segmented array optimizations • Reduce segmented array cost below ~2% • Opportunistic contiguous layout • Type-based specialization with invalidation • Strip-mining

More Related