david f bacon perry cheng v t rajan ibm t j watson research center l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
David F. Bacon Perry Cheng V.T. Rajan IBM T.J. Watson Research Center PowerPoint Presentation
Download Presentation
David F. Bacon Perry Cheng V.T. Rajan IBM T.J. Watson Research Center

Loading in 2 Seconds...

play fullscreen
1 / 42

David F. Bacon Perry Cheng V.T. Rajan IBM T.J. Watson Research Center - PowerPoint PPT Presentation


  • 225 Views
  • Uploaded on

The Metronome: A Hard Real-time Garbage Collector. David F. Bacon Perry Cheng V.T. Rajan IBM T.J. Watson Research Center. The Problem. Real-time systems growing in importance Many CPUs in a BMW: “80% of innovation in SW” Programmers left behind Still using assembler and C

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'David F. Bacon Perry Cheng V.T. Rajan IBM T.J. Watson Research Center' - clovis


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
david f bacon perry cheng v t rajan ibm t j watson research center

The Metronome:

A Hard Real-time

Garbage Collector

David F. Bacon

Perry Cheng

V.T. Rajan

IBM T.J. Watson Research Center

the problem
The Problem
  • Real-time systems growing in importance
    • Many CPUs in a BMW: “80% of innovation in SW”
  • Programmers left behind
    • Still using assembler and C
    • Lack productivity advantages of Java
  • Result
    • Complexity
    • Low reliability or very high validation cost
problem domain
Problem Domain
  • Hard real-time garbage collection
  • Uniprocessor
    • Multiprocessors very rare in real-time systems
    • Complication: collector must be finely interleaved
    • Simplification: memory model easier to program
      • No truly concurrent operations
      • Sequentially consistent
metronome project goals
Metronome Project Goals
  • Make GC feasible for hard real-time systems
  • Provide simple application interface
  • Develop technology that is efficient:
    • Throughput, Space comparable to stop-the-world
  • BREAK THE MILLISECOND BARRIER
    • While providing even CPU utilization
outline
Outline
  • What is “Real-Time” Garbage Collection?
  • Problems with Previous Work
  • Overview of the Metronome
  • Scheduling
  • Empirical Results
  • Conclusions
3 uniprocessor gc types
3 Uniprocessor GC Types

STW

Inc

RT

time

GC #1

GC #2

what is real time it is
What is Real-time? It Is…
  • Maximum pause time < required response
  • CPU Utilization sufficient to accomplish task
    • Measured with MMU
  • Memory requirement < resource limit
no compaction
No Compaction
  • Hypothesis: fragmentation not a problem
    • Can use avoidance and coalescing [ Johnstone]
    • Non-moving incremental collection is simpler
  • Problem: long-running applications
    • Reboot your car every 500 miles?

4 X max live

3 X max live

2 X max live

1 X max live

copying collection
Copying Collection
  • Idea: copy into to-space concurrently
  • Compaction is part of basic operation
  • Problem: space usage
    • 2 semi-spaces plus space to mutate during GC
    • Requires 4-8 X max live data in practice

4 X max live

3 X max live

2 X max live

1 X max live

work based scheduling
Work-Based Scheduling
  • The Baker fallacy:
    • “A real-time list processing system is one in which the time required by the elementary list processing operations…is bounded by a small constant” [Baker’78]
  • Implicitly assumes GC work done in mutator
    • What does “small constant” mean?
    • Typically, constant is not so small
      • And there is variability (fault rate analogy)
is it real time yes
Is it real-time? Yes
  • Maximum pause time < 4 ms
  • MMU > 50% ±2%
  • Memory requirement < 2 X max live
components of the metronome
Components of the Metronome
  • Segregated free list allocator
    • Geometric size progression limits internal fragmentation
  • Write barrier: snapshot-at-the-beginning [Yuasa]
  • Read barrier: to-space invariant [Brooks]
    • New techniques with only 4% overhead
  • Incremental mark-sweep collector
    • Mark phase fixes stale pointers
  • Selective incremental defragmentation
    • Moves < 2% of traced objects
  • Arraylets: bound fragmentation, large object ops
  • Time-based scheduling

Old

New

segregated free list allocator
Segregated Free List Allocator
  • Heap divided into fixed-size pages
  • Each page divided into fixed-size blocks
  • Objects allocated in smallest block that fits

12

16

24

fragmentation on a page
Fragmentation on a Page
  • Internal: wasted space at end of object
  • Page-internal: wasted space at end of page
  • External: blocks needed for other size

page-internal

internal

external

limiting internal fragmentation
Limiting Internal Fragmentation
  • Choose page size P and block sizes sk such that
    • sk = sk-1(1+ρ)
    • smax = P ρ
  • Then
    • Internal and page-internal fragmentation < ρ
  • Example:
    • P =16KB, ρ =1/8, smax = 2KB
    • Internal and page-internal fragmentation < 1/8
write barrier snapshot at start
Write Barrier: Snapshot-at-start
  • Problem: mutator changes object graph
  • Solution: write barrier prevents lost objects
  • Logically, collector takes atomic snapshot
    • Objects live at snapshot will not be collected
    • Write barrier saves overwritten pointers [Yuasa]
    • Write buffer must be drained periodically

B

B

A

C

WB

read barrier to space invariant
Read Barrier: To-space Invariant
  • Problem: Collector moves objects (defragmentation)
    • and mutator is finely interleaved
  • Solution: read barrier ensures consistency
    • Each object contains a forwarding pointer [Brooks]
    • Read barrier unconditionally forwards all pointers
    • Mutator never sees old versions of objects

X

X

Y

A

Y

A

A′

Z

Z

From-space

To-space

BEFORE

AFTER

read barrier optimizations
Read Barrier Optimizations
  • Barrier variants: when to redirect
    • Lazy: easier for collector (no fixup)
    • Eager: better for performance (loop over a[i])
  • Standard optimizations: CSE, code motion
  • Problem: pointers can be null
    • Augment read barrier for GetField(x,offset):

tmp = x[offset];

return tmp == null ? null : tmp[redirect]

    • Optimize by null-check combining, sinking
read barrier results
Read Barrier Results
  • Conventional wisdom: read barriers too slow
    • Previous studies: 20-40% overhead [Zorn,Nielsen]
incremental mark sweep
Incremental Mark-Sweep
  • Mark/sweep finely interleaved with mutator
  • Write barrier ensures no lost objects
    • Must treat objects in write buffer as roots
  • Read barrier ensures consistency
    • Marker always traces correct object
  • With barriers, interleaving is simple
pointer fixup during mark
Pointer Fixup During Mark
  • When can a moved object be freed?
    • When there are no more pointers to it
  • Mark phase updates pointers
    • Redirects forwarded pointers as it marks them
  • Object moved in collection n can be freed:
    • At the end of mark phase of collection n+1

X

Y

A

A′

Z

From-space

To-space

selective defragmentation
Selective Defragmentation
  • When do we move objects?
    • When there is fragmentation
  • Usually, program exhibits locality of size
    • Dead objects are re-used quickly
  • Defragment either when
    • Dead objects are not re-used for a GC cycle
    • Free pages fall below limit for performing a GC
  • In practice: we move 2-3% of data traced
    • Major improvement over copying collector
arraylets
Arraylets

A

  • Large arrays create problems
    • Fragment memory space
    • Can not be moved in a short, bounded time
  • Solution: break large arrays into arraylets
    • Access via indirection; move one arraylet at a time
  • Optimizations
    • Type-dependent code optimized for contiguous case
    • Opportunistic contiguous allocation

A1

A2

A3

work based scheduling32
Work-based Scheduling
  • Trigger the collector to collect CW bytes
    • Whenever the mutator allocates QW bytes

MMU (CPU Utilization)

Space (MB)

Window Size (s) - log

Time (s)

time based scheduling
Time-based Scheduling
  • Trigger collector to run for CT seconds
    • Whenever mutator runs for QT seconds

MMU (CPU Utilization)

Space (MB)

Window Size (s) - log

Time (s)

parameterization
Parameterization

Tuner

Δt

s

u

Mutator

a*(ΔGC)

m

Allocation Rate

Real Time

Interval

Maximum Live

Memory

Maximum Used

Memory

Collector

R

CPU Utilization

Collection Rate

pause time distribution javac
Pause time distribution: javac

Time-based Scheduling

Work-based Scheduling

12 ms

12 ms

utilization vs time javac
Utilization vs Time: javac

Time-based Scheduling

Work-based Scheduling

1.0

1.0

0.8

0.8

Utilization (%)

0.6

0.6

0.45

0.4

0.4

0.2

0.2

0.0

0.0

Time (s)

Time (s)

conclusions41
Conclusions
  • The Metronome provides true real-time GC
    • First collector to do so without major sacrifice
      • Short pauses (4 ms)
      • High MMU during collection (50%)
      • Low memory consumption (2x max live)
  • Critical features
    • Time-based scheduling
    • Hybrid, mostly non-copying approach
    • Integration w/compiler
future work
Future Work
  • Two main goals:
    • Reduce pause time, memory requirement
    • Increase predictability
  • Pause time:
    • Expect sub-millisecond using current techniques
    • For 10’s of microseconds, need interrupt-based
  • Predictability
    • Studying parameterization of collector
    • Good research area