Getting Real, Getting Dirty (without getting real dirty)

Getting Real, Getting Dirty (without getting real dirty) Funded by the National Science Foundation under grant 0081214 Funded by DARPA under contract F33615-00-C-1697 Ron K. Cytron Joint work with Krishna Kavi University of Alabama at Huntsville Dante Cannarozzi, Sharath Cholleti, Morgan Deters, Steve Donahue Mark Franklin, Matt Hampton, Michael Henrichs, Nicholas Leidenfrost, Jonathan Nye, Michael Plezbert, Conrad Warmbold Center for Distributed Object Computing Department of Computer Science Washington University April 2001

Outline • Motivation • Allocation • Collection • Conclusion

CPU + cache L2 cache M M M M Traditional architecture and object-oriented programs • Caches are still biased toward Fortran-like behavior • CPU is still responsible for storage management • Object-management activity invalidates caches • GC disruptive • Compaction

An OO-biased design using IRAMs(with Krishna Kavi) • CPU and cache stay the same, off-the-shelf • Memory system redesigned to support OO programs CPU + cache L2 cache IRAM Logic M M M M

CPU + cache L2 cache IRAM interface Stable address for an object allows better cache behavior Object can be relocated within IRAM, but its address to the CPU is constant IRAM malloc Logic M M addr M M

CPU + cache L2 cache IRAM interface Object referencing—tracked inside IRAM–-supports garbage collection IRAM putfield/getfield Logic M M value M M

CPU + cache L2 cache IRAM interface Goal: relegate storage-management functions to IRAM gc compact prefetch IRAM Logic M M M M

CPU + cache L2 cache Macro accesses Observe: code sequences contain common gestures (superoperators) p.getLeft().getNext() *(*(p+12)+32) IRAM Logic M M M M

CPU + cache L2 cache Gesture abstraction Goal: decrease traffic between CPU and storage M143(x): *(*(x+12)+32) p.getLeft().getNext() *(*(p+12)+32) IRAM Logic M M M M

CPU + cache L2 cache Gesture application M143(x): *(*(x+12)+32) p.getLeft().getNext() IRAM Macro 143 (p) Logic M M M M

CPU + cache L2 cache Gesture application M143(x): *(*(x+12)+32) p.getLeft().getNext() IRAM Macro 143 (p) Logic M M p.getLeft().getNext() M M

Automatic prefetching Goal: decrease traffic between CPU and storage IRAM Fetch p CPU + cache Logic p M M L2 cache M M

Automatic prefetching Goal: decrease traffic between CPU and storage IRAM Fetch p CPU + cache Logic p M M L2 cache p.getLeft().getNext() M M

Challenges • Algorithmic • Bounded-time methods for allocation and collection • Good average performance as well • Architectural • Lean interface between the CPU and IRAM • Efficient realization

Storage Allocation (Real Time) • Not necessarily fast • Necessarily predictable • Able to satisfy any reasonable request • Developer should know “maxlive” characteristics of the application • This is true for non-embedded systems as well

How much storage? Handles • curlive—the number of objects live at a point in time • curspace—the number of bytes live at a point in time Object Space

Objects concurrently live

How much object space?

Storage Allocation—Free List • Linked list of free blocks Search for desired fit Worst case O(n) for n blocks in the list

Worst-case free-list behavior • The longer the free-list, the more pronounced the effect • No a priori bound on how much worse the list-based scheme could get • Average performance similar

256 128 64 32 16 8 4 2 1 Knuth’s Buddy System Free-list segregated by size All requests rounded up to a power of 2

Knuth’s Buddy System (1) Begin with one large block Suppose we want a block of size 16 256 128 64 32 16 8 4 2 1

Knuth’s Buddy System (2) Begin with one large block 256 128 64 32 Recursively subdivide 16 8 4 2 1

Knuth’s Buddy System (5) Begin with one large block 256 128 64 32 Yield 2 blocks size 16 16 8 4 2 1

Knuth’s Buddy System (6) Begin with one large block 256 128 64 32 Yield: 2 blocks size 16 16 8 One of those blocks can be given to the program 4 2 1

Worst-case free-list behavior • The longer the free-list, the more pronounced the effect • No a priori bound on how much worse the list-based scheme could get • Average performance similar

Spec Benchmark Results

Buddy System • If a block can be found, it can be found in log(N), where N is the size of the heap • The application cannot make that worse

Defragmentation • To keep up with the diversity of requested block sizes, an allocator may have to reorganize smaller blocks into larger ones

Defragmentation—Free List Free list • Free-list permutes adjacent blocks • Storage becomes fragmented, with many small blocks and no large ones Blocks in memory

Defragmentation—Free List Free list Free-list permutes adjacent blocks Two issues: • Join adjacent blocks Blocks in memory

Defragmentation—Free List Free list Free-list permutes adjacent blocks Two issues: • Join adjacent blocks • Reorganize holes (move live storage) Blocks in memory

Defragmentation—Free List Free list Free-list permutes adjacent blocks Two issues: • Join adjacent blocks • Reorganize holes • Organization by address can help [Kavi] Blocks in memory

Buddies—joining adjacent blocks • The blocks resulting from subdivision are viewed as “buddies” • Their address differs by exactly one bit • The address of a block of size 2 differs with its buddy’s address at bit n …0… …1… n

Knuth’s Buddy System (6) 256 128 64 32 16 8 4 2 1

Knuth’s Buddy System (5) When a block becomes free, it tries to rejoin its buddy A bit in its buddy tells whether the buddy is free If so, they glue together and make a block twice as big 256 128 64 32 16 8 4 2 1

Two problems • Oscillation—Buddy looks like it may split, glue, split, glue—isn’t this wasted effort? • Fragmentation—What happens when Buddy can’t glue but has space it would like to combine?

Buddy—oscillation 256 128 64 32 16 8 4 2 1

Problem is lack of hysteresis • Some programs allocate objects which are almost immediately deallocated. • Continuous, incremental approaches to garbage collection only make this worse! • Oscillation is expensive: blocks are glued only to be quickly subdivided again

Getting Real, Getting Dirty (without getting real dirty)

Getting Real, Getting Dirty (without getting real dirty)

Presentation Transcript

Real-World Data Is Dirty

Getting Real About Going Independent

Getting real about Differentiated instruction

A Guide to Getting Your Hands Dirty: How Campus Green Spaces Promote Real World Problem Solving

Getting Real With ProportionS

Getting down and dirty with detergents: quantitation, screening, and synthesis

Getting Dirty with CUDA

Getting Real Real Characters, Real Messes

Getting Dirty with SOILS

Getting Real!

Real-World Data Is Dirty

Getting Your Hands Dirty

Getting Dirty on Mars

Getting Dirty with SOILS

GETTING REAL

Getting onto real estate investing

Getting The Real Estate To Work

Clean your fire sprinklers without getting your hands dirty

Getting Real with RDA

Getting Dirty with SOILS

Getting Real With ProportionS

Real Estate British Columbia - Getting a Real Deal