smart memory for smart phones l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Smart Memory for Smart Phones PowerPoint Presentation
Download Presentation
Smart Memory for Smart Phones

Loading in 2 Seconds...

play fullscreen
1 / 33

Smart Memory for Smart Phones - PowerPoint PPT Presentation


  • 368 Views
  • Uploaded on

Smart Memory for Smart Phones Chris Clack University College London clack@cs.ucl.ac.uk Outline Target Architecture Problems Focus on Fragmentation Results from UT A fast allocator (not embedded) Doug Lea’s Allocator Can We Do Better? Overheads Results Target Architecture

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Smart Memory for Smart Phones' - Jims


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
smart memory for smart phones

Smart Memory for Smart Phones

Chris Clack

University College London

clack@cs.ucl.ac.uk

outline
Outline
  • Target Architecture
  • Problems
  • Focus on Fragmentation
    • Results from UT
  • A fast allocator (not embedded)
    • Doug Lea’s Allocator
  • Can We Do Better?
  • Overheads
  • Results
target architecture
Target Architecture
  • Small hand-held integrated phone/PDA devices
  • Soft real-time, “open box”, constrained applications heap
  • Competition pressure for more, more flexible, and better (larger) applications
problems 1

To compact: copy when nearly full

Problems (1)

live

free

TOP

A free fragment

  • Memory overhead
  • Compaction delay
problems 2
Problems (2)

live

free

TOP

To compact: do sliding compaction when nearly full

  • Compaction delay
problems 3
Problems (3)

live

free

FREE LIST

To compact: do sliding compaction when allocation fails

  • Compaction delay
focus on fragmentation
Focus on Fragmentation
  • What happens in real programs?
  • Great paper by Mark Johnstone and Paul Wilson (UT):
    • “The Memory Fragmentation Problem: Solved?”, M.Johnstone & P.Wilson, 1997
  • Fragmentation experiments using real programs running on real data
slide8

Max live at any time

Max Kb at any time

Average lifetime of an allocated byte

measure of fragmentation

#4

#3

e.g. %frag #4 = (value_at_3 – value_at_2) * 100 / value_at_2

MEASURE OF FRAGMENTATION
johnstone wilson s conclusion
Johnstone & Wilson’s conclusion
  • The best free-list management policy
    • in terms of fragmentation behaviour
    • on real programs

is BEST-FIT

  • (Knuth notwithstanding)
a fast best fit allocator
A Fast Best-Fit Allocator
  • IMPLICATION: use Best-fit allocation and we (maybe?) won’t ever need to compact
    • At least, compaction delays will be minimized
  • BUT: best-fit allocation is S-L-O-W
    • Worst-case: have to scan the entire free list
    • Let’s look at a widely-used best-fit allocator: Doug Lea’s malloc
    • (arguably) the fastest best-fit allocator
slide14

Boundary tag – used for coalescing

Boundary tag

Boundary tag

slide15

Sorted by size

Worst case: all free blocks in one bin – reduces to O(n) search

exact-fit bins

Fixed-width bins

W

Costs time to sort

can we do better
Can we do better?
  • Support boundary tags and coalescing
  • Simple Idea (1) (of 4):
    • Probability of fragmentation triggering compaction depends on RANGE of allocatable block sizes
      • Very large block alloc more likely to fail due to frags
      • Very small free blocks create frags
    • (NB if all blocks same size, fragmentation is zero!)
slide17

No need to sort

  • Restrict range of allocatable sizes and create an exact-fit table:

lb

lb+1

lb+2

lb+3

ub-2

ub-1

ub

Worst case: O(n) search  for next highest occupied bin

slide18

lb

lb+1

lb+2

lb+3

ub-2

ub-1

ub

  • Old idea
    • Use an occupancy bitmap
    • If (ub-lb) = 31, bitmap is just one word
    • To search/allocate: read bitmap; AND with mask; find highest set bit; maybe modify bit and write

00110000000000000000000000000101

problem
Problem
  • What if range is very large?
  • E.g. Nikhil wants to allocate blocks that vary from 2 words to 212 words
    • 212 different block sizes
      • Worst case = linear search of 128 bitmap words (128 reads + …)
  • Two solutions:
    • Use more efficient bitmapping
    • Use unconstrained hybrid scheme (see later)
more efficient bitmapping
More efficient bitmapping
  • Simple Idea (2)
  • Use a bitmap tree:
    • Requires 128 + 4 + 1 words
    • Requires worst case 5 reads, 3 tests for zero, 3 masks, 3 finds of greatest set bit, 3 modify&writes
  • Generally: O(log32 ((ub-lb)/32))
    • (Depends what you are counting … but it is fast!)
    • Ten times faster than any other scheme we know
lifo fifo
LIFO/FIFO?
  • Simple Idea (3)
    • Although J&W found no difference between LIFO/FIFO/AO best fit, this might be different for embedded apps
    • So far, we can only do LIFO
    • We can achieve FIFO if we double-link ALL free blocks into one big chain
      • Drawback – now free takes as long as malloc (but still O(log32 ((ub-lb)/32)))
slide22

Or for FIFO: search bitmap tree to the left , then follow link to next highest free block

If requested size not available, for LIFO: search bitmap tree to the right 

Bitmap tree

Freed blocks placed at heads of chains

lb

lb+1

lb+2

lb+3

ub-2

ub-1

ub

slide23
Simple Idea (4)
    • We can trivially also support Worst-fit by adding a pointer that always refers to the biggest block
    • And this is where we put our wilderness block!
    • We have no data on fragmentation behaviour of worst-fit
      • If it turns out to be similar to best fit, it would be preferable because we would have O(1) alloc and O(log32 ((ub-lb)/32)) free.
slide24

max

Bitmap tree

lb

lb+1

lb+2

lb+3

ub-2

ub-1

ub

W

overheads
Overheads
  • Dynamic per-block overhead
    • Depends on (ub-lb) – can be very small
    • Example (total 32 bits per live block):
      • 16 bit signed int for size and availability of current block
      • 16 bit signed int for size and availability of previous block
        • Could optimize for live block overhead: 1 bit in header + free blocks also hold size at end of block
        • But, if 4-byte aligned and ANY overhead per block, can’t do better than this!
    • Free blocks additionally need to hold two pointers
      • minimum block size = header + 2 pointers
slide26
Static overheads
    • Code
    • A few registers (e.g. max)
    • Data structures:
      • Bitmap tree: 133 words
      • Table: (ub-lb) words
    • NOTE
      • if (ub-lb=heap) then table size is the size of the heap! (same overhead as semi-space)
      • So we don’t want to use this scheme for large size ranges!!! – instead use a hybrid
hybrid scheme
Hybrid scheme
  • Most used range of block sizes:
    • Use the bitmap tree and exact-fit bins as described
  • Bigger block sizes:
    • These are all kept on the double-linked chain above the biggest exact-fit block.
    • Can use fixed-width bins like Lea, together with a separate bitmap tree,
    • We lose the worst-case property of the primary scheme
results28
RESULTS
  • Re-run Johnstone and Wilson’s tests, using our allocator on their trace files
test 1

Memory required by gmalloc

Memory required by new allocator

Memory requested by the program

Test 1

Memory requirement halved !

Roughly 5% fragmentation?

slide30

Memory required by gmalloc

Memory required by new allocator

Memory requested by the program

Test 2

slide31

Memory required by gmalloc

Memory required by new allocator

Memory requested by the program

Test 3

slide32

Memory required by gmalloc

Memory requirements consistently halved!

Fragmentation consistently ~ 5% (?)

Memory required by new allocator

Memory requested by the program

Test 4

status
Status
  • Currently working with Symbian to conduct malloc-replacement trials using real smartphone applications