1 / 148

Getting Real, Getting Dirty (without getting real dirty)

Getting Real, Getting Dirty (without getting real dirty). Funded by the National Science Foundation under grant 0081214 Funded by DARPA under contract F33615-00-C-1697. Ron K. Cytron Joint work with Krishna Kavi University of Alabama at Huntsville.

zuriel
Download Presentation

Getting Real, Getting Dirty (without getting real dirty)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Getting Real, Getting Dirty (without getting real dirty) Funded by the National Science Foundation under grant 0081214 Funded by DARPA under contract F33615-00-C-1697 Ron K. Cytron Joint work with Krishna Kavi University of Alabama at Huntsville Dante Cannarozzi, Sharath Cholleti, Morgan Deters, Steve Donahue Mark Franklin, Matt Hampton, Michael Henrichs, Nicholas Leidenfrost, Jonathan Nye, Michael Plezbert, Conrad Warmbold Center for Distributed Object Computing Department of Computer Science Washington University April 2001

  2. Outline • Motivation • Allocation • Collection • Conclusion

  3. CPU + cache L2 cache M M M M Traditional architecture and object-oriented programs • Caches are still biased toward Fortran-like behavior • CPU is still responsible for storage management • Object-management activity invalidates caches • GC disruptive • Compaction

  4. An OO-biased design using IRAMs(with Krishna Kavi) • CPU and cache stay the same, off-the-shelf • Memory system redesigned to support OO programs CPU + cache L2 cache IRAM Logic M M M M

  5. CPU + cache L2 cache IRAM interface Stable address for an object allows better cache behavior Object can be relocated within IRAM, but its address to the CPU is constant IRAM malloc Logic M M addr M M

  6. CPU + cache L2 cache IRAM interface Object referencing—tracked inside IRAM–-supports garbage collection IRAM putfield/getfield Logic M M value M M

  7. CPU + cache L2 cache IRAM interface Goal: relegate storage-management functions to IRAM gc compact prefetch IRAM Logic M M M M

  8. CPU + cache L2 cache Macro accesses Observe: code sequences contain common gestures (superoperators) p.getLeft().getNext() *(*(p+12)+32) IRAM Logic M M M M

  9. CPU + cache L2 cache Gesture abstraction Goal: decrease traffic between CPU and storage M143(x): *(*(x+12)+32) p.getLeft().getNext() *(*(p+12)+32) IRAM Logic M M M M

  10. CPU + cache L2 cache Gesture application M143(x): *(*(x+12)+32) p.getLeft().getNext() IRAM Macro 143 (p) Logic M M M M

  11. CPU + cache L2 cache Gesture application M143(x): *(*(x+12)+32) p.getLeft().getNext() IRAM Macro 143 (p) Logic M M p.getLeft().getNext() M M

  12. Automatic prefetching Goal: decrease traffic between CPU and storage IRAM Fetch p CPU + cache Logic p M M L2 cache M M

  13. Automatic prefetching Goal: decrease traffic between CPU and storage IRAM Fetch p CPU + cache Logic p M M L2 cache p.getLeft().getNext() M M

  14. Challenges • Algorithmic • Bounded-time methods for allocation and collection • Good average performance as well • Architectural • Lean interface between the CPU and IRAM • Efficient realization

  15. Storage Allocation (Real Time) • Not necessarily fast • Necessarily predictable • Able to satisfy any reasonable request • Developer should know “maxlive” characteristics of the application • This is true for non-embedded systems as well

  16. How much storage? Handles • curlive—the number of objects live at a point in time • curspace—the number of bytes live at a point in time Object Space

  17. Objects concurrently live

  18. How much object space?

  19. Storage Allocation—Free List • Linked list of free blocks Search for desired fit Worst case O(n) for n blocks in the list

  20. Worst-case free-list behavior • The longer the free-list, the more pronounced the effect • No a priori bound on how much worse the list-based scheme could get • Average performance similar

  21. 256 128 64 32 16 8 4 2 1 Knuth’s Buddy System Free-list segregated by size All requests rounded up to a power of 2

  22. Knuth’s Buddy System (1) Begin with one large block Suppose we want a block of size 16 256 128 64 32 16 8 4 2 1

  23. Knuth’s Buddy System (2) Begin with one large block 256 128 64 32 Recursively subdivide 16 8 4 2 1

  24. Knuth’s Buddy System (3) Begin with one large block 256 128 64 32 Recursively subdivide 16 8 4 2 1

  25. Knuth’s Buddy System (4) Begin with one large block 256 128 64 32 Recursively subdivide 16 8 4 2 1

  26. Knuth’s Buddy System (5) Begin with one large block 256 128 64 32 Yield 2 blocks size 16 16 8 4 2 1

  27. Knuth’s Buddy System (6) Begin with one large block 256 128 64 32 Yield: 2 blocks size 16 16 8 One of those blocks can be given to the program 4 2 1

  28. Worst-case free-list behavior • The longer the free-list, the more pronounced the effect • No a priori bound on how much worse the list-based scheme could get • Average performance similar

  29. Spec Benchmark Results

  30. Buddy System • If a block can be found, it can be found in log(N), where N is the size of the heap • The application cannot make that worse

  31. Defragmentation • To keep up with the diversity of requested block sizes, an allocator may have to reorganize smaller blocks into larger ones

  32. Defragmentation—Free List Free list • Free-list permutes adjacent blocks • Storage becomes fragmented, with many small blocks and no large ones Blocks in memory

  33. Defragmentation—Free List Free list Free-list permutes adjacent blocks Two issues: • Join adjacent blocks Blocks in memory

  34. Defragmentation—Free List Free list Free-list permutes adjacent blocks Two issues: • Join adjacent blocks • Reorganize holes (move live storage) Blocks in memory

  35. Defragmentation—Free List Free list Free-list permutes adjacent blocks Two issues: • Join adjacent blocks • Reorganize holes • Organization by address can help [Kavi] Blocks in memory

  36. Buddies—joining adjacent blocks • The blocks resulting from subdivision are viewed as “buddies” • Their address differs by exactly one bit • The address of a block of size 2 differs with its buddy’s address at bit n …0… …1… n

  37. Knuth’s Buddy System (6) 256 128 64 32 16 8 4 2 1

  38. Knuth’s Buddy System (5) When a block becomes free, it tries to rejoin its buddy A bit in its buddy tells whether the buddy is free If so, they glue together and make a block twice as big 256 128 64 32 16 8 4 2 1

  39. Knuth’s Buddy System (4) 256 128 64 32 16 8 4 2 1

  40. Knuth’s Buddy System (3) 256 128 64 32 16 8 4 2 1

  41. Knuth’s Buddy System (2) 256 128 64 32 16 8 4 2 1

  42. Knuth’s Buddy System (1) 256 128 64 32 16 8 4 2 1

  43. Two problems • Oscillation—Buddy looks like it may split, glue, split, glue—isn’t this wasted effort? • Fragmentation—What happens when Buddy can’t glue but has space it would like to combine?

  44. Buddy—oscillation 256 128 64 32 16 8 4 2 1

  45. Buddy—oscillation 256 128 64 32 16 8 4 2 1

  46. Buddy—oscillation 256 128 64 32 16 8 4 2 1

  47. Buddy—oscillation 256 128 64 32 16 8 4 2 1

  48. Buddy—oscillation 256 128 64 32 16 8 4 2 1

  49. Buddy—oscillation 256 128 64 32 16 8 4 2 1

  50. Problem is lack of hysteresis • Some programs allocate objects which are almost immediately deallocated. • Continuous, incremental approaches to garbage collection only make this worse! • Oscillation is expensive: blocks are glued only to be quickly subdivided again

More Related