Smart Memory for Smart Phones

Smart Memory for Smart Phones Chris Clack University College London clack@cs.ucl.ac.uk

Outline • Target Architecture • Problems • Focus on Fragmentation • Results from UT • A fast allocator (not embedded) • Doug Lea’s Allocator • Can We Do Better? • Overheads • Results

Target Architecture • Small hand-held integrated phone/PDA devices • Soft real-time, “open box”, constrained applications heap • Competition pressure for more, more flexible, and better (larger) applications

To compact: copy when nearly full Problems (1) live free TOP A free fragment • Memory overhead • Compaction delay

Problems (2) live free TOP To compact: do sliding compaction when nearly full • Compaction delay

Problems (3) live free FREE LIST To compact: do sliding compaction when allocation fails • Compaction delay

Focus on Fragmentation • What happens in real programs? • Great paper by Mark Johnstone and Paul Wilson (UT): • “The Memory Fragmentation Problem: Solved?”, M.Johnstone & P.Wilson, 1997 • Fragmentation experiments using real programs running on real data

Max live at any time Max Kb at any time Average lifetime of an allocated byte

No difference within experimental error RESULTS

#4 #3 e.g. %frag #4 = (value_at_3 – value_at_2) * 100 / value_at_2 MEASURE OF FRAGMENTATION

No difference within experimental error

Johnstone & Wilson’s conclusion • The best free-list management policy • in terms of fragmentation behaviour • on real programs is BEST-FIT • (Knuth notwithstanding)

A Fast Best-Fit Allocator • IMPLICATION: use Best-fit allocation and we (maybe?) won’t ever need to compact • At least, compaction delays will be minimized • BUT: best-fit allocation is S-L-O-W • Worst-case: have to scan the entire free list • Let’s look at a widely-used best-fit allocator: Doug Lea’s malloc • (arguably) the fastest best-fit allocator

Boundary tag – used for coalescing Boundary tag Boundary tag

Sorted by size Worst case: all free blocks in one bin – reduces to O(n) search exact-fit bins Fixed-width bins W Costs time to sort

Can we do better? • Support boundary tags and coalescing • Simple Idea (1) (of 4): • Probability of fragmentation triggering compaction depends on RANGE of allocatable block sizes • Very large block alloc more likely to fail due to frags • Very small free blocks create frags • (NB if all blocks same size, fragmentation is zero!)

No need to sort • Restrict range of allocatable sizes and create an exact-fit table: … lb lb+1 lb+2 lb+3 ub-2 ub-1 ub Worst case: O(n) search  for next highest occupied bin

… lb lb+1 lb+2 lb+3 ub-2 ub-1 ub • Old idea • Use an occupancy bitmap • If (ub-lb) = 31, bitmap is just one word • To search/allocate: read bitmap; AND with mask; find highest set bit; maybe modify bit and write 00110000000000000000000000000101

Problem • What if range is very large? • E.g. Nikhil wants to allocate blocks that vary from 2 words to 212 words • 212 different block sizes • Worst case = linear search of 128 bitmap words (128 reads + …) • Two solutions: • Use more efficient bitmapping • Use unconstrained hybrid scheme (see later)

More efficient bitmapping • Simple Idea (2) • Use a bitmap tree: • Requires 128 + 4 + 1 words • Requires worst case 5 reads, 3 tests for zero, 3 masks, 3 finds of greatest set bit, 3 modify&writes • Generally: O(log32 ((ub-lb)/32)) • (Depends what you are counting … but it is fast!) • Ten times faster than any other scheme we know

LIFO/FIFO? • Simple Idea (3) • Although J&W found no difference between LIFO/FIFO/AO best fit, this might be different for embedded apps • So far, we can only do LIFO • We can achieve FIFO if we double-link ALL free blocks into one big chain • Drawback – now free takes as long as malloc (but still O(log32 ((ub-lb)/32)))

Or for FIFO: search bitmap tree to the left , then follow link to next highest free block If requested size not available, for LIFO: search bitmap tree to the right  Bitmap tree Freed blocks placed at heads of chains … lb lb+1 lb+2 lb+3 ub-2 ub-1 ub

Simple Idea (4) • We can trivially also support Worst-fit by adding a pointer that always refers to the biggest block • And this is where we put our wilderness block! • We have no data on fragmentation behaviour of worst-fit • If it turns out to be similar to best fit, it would be preferable because we would have O(1) alloc and O(log32 ((ub-lb)/32)) free.

max Bitmap tree … lb lb+1 lb+2 lb+3 ub-2 ub-1 ub W

Overheads • Dynamic per-block overhead • Depends on (ub-lb) – can be very small • Example (total 32 bits per live block): • 16 bit signed int for size and availability of current block • 16 bit signed int for size and availability of previous block • Could optimize for live block overhead: 1 bit in header + free blocks also hold size at end of block • But, if 4-byte aligned and ANY overhead per block, can’t do better than this! • Free blocks additionally need to hold two pointers • minimum block size = header + 2 pointers

Static overheads • Code • A few registers (e.g. max) • Data structures: • Bitmap tree: 133 words • Table: (ub-lb) words • NOTE • if (ub-lb=heap) then table size is the size of the heap! (same overhead as semi-space) • So we don’t want to use this scheme for large size ranges!!! – instead use a hybrid

Hybrid scheme • Most used range of block sizes: • Use the bitmap tree and exact-fit bins as described • Bigger block sizes: • These are all kept on the double-linked chain above the biggest exact-fit block. • Can use fixed-width bins like Lea, together with a separate bitmap tree, • We lose the worst-case property of the primary scheme

RESULTS • Re-run Johnstone and Wilson’s tests, using our allocator on their trace files

Memory required by gmalloc Memory required by new allocator Memory requested by the program Test 1 Memory requirement halved ! Roughly 5% fragmentation?

Memory required by gmalloc Memory required by new allocator Memory requested by the program Test 2

Memory required by gmalloc Memory required by new allocator Memory requested by the program Test 3

Memory required by gmalloc Memory requirements consistently halved! Fragmentation consistently ~ 5% (?) Memory required by new allocator Memory requested by the program Test 4

Status • Currently working with Symbian to conduct malloc-replacement trials using real smartphone applications

Smart Memory for Smart Phones

Smart Memory for Smart Phones

Presentation Transcript

Smart Memory

Smart-phones

Introduction to Smart Phones

Smart Phones and Mobile Devices

Smart Phones

Cell Phones and Smart Phones

How Safe Are Smart Phones?

Symbian os with smart phones

Smart Phones

Revolution of Smart Phones

Smart Phones

Questionnare Smart phones - analysis

SMart Phones

Smart Phones Deserve Smarter Content

Top Best smart phones

Tablets and Smart Phones

Convergence: Smart Phones

Smart Phones

VIVEGAM SMART PHONES

Tips For Buying Refurbished Smart Phones

Samsung Smart Phones

Smart Technicians For Smart Mobile Phones