Hardware support for dynamic memory management
This presentation is the property of its rightful owner.
Sponsored Links
1 / 16

Hardware Support for Dynamic Memory Management PowerPoint PPT Presentation


  • 43 Views
  • Uploaded on
  • Presentation posted in: General

Hardware Support for Dynamic Memory Management. J. Morris Chang Witawas Srisa-an Chia-Tien Dan Lo Illinois Institute of Technology Edward F. Gehringer North Carolina State University. The Problem. O-o applications make frequent requests for dynamic memory.

Download Presentation

Hardware Support for Dynamic Memory Management

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Hardware support for dynamic memory management

Hardware Support for Dynamic Memory Management

J. Morris Chang

Witawas Srisa-an

Chia-Tien Dan Lo

Illinois Institute of Technology

Edward F. Gehringer

North Carolina State University


The problem

The Problem

  • O-o applications make frequent requests for dynamic memory.

    • C++ programs do an order of magnitude more than C programs.

  • Most objects are abandoned quickly.

  • --> Much time used in memory mgt.

    • Up to 30% in C programs ...

  • Garbage collection has been optimized, but still takes time.


Hardware implemented allocation

Hardware-Implemented Allocation

  • Makes use of an allocation vector (A-vector) and a bit-flipper.

address

0

1

2

3

4

5

6 7

the A-vector

before

the allocation

1 0 1 1

0

0

1 1

(a) Combinational logic (the complete binary tree) determines that

there is enough free memory to fill the request for two blocks

(b) The address of the free block is 100

.

2

(c) The bits at 100

and 101

are flipped.

2

2

the A-vector

after

the allocation

1 0 1 1

1

1

1 1


The complete binary tree

The Complete Binary Tree

  • A binary tree of bits is used to locate the first free region combinationally.

Level 0

Size 24

1

Level 1

Size 23

1

1

Level 2

Size 22

1

0

1

0

Level 3

Size 21

1

1

0

0

0

1

0

0

Size 20

Level 4

1 1 0 1 0 0 0 0 0 0 1 1 0 0 0 0

A-vector address

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15


Keeping track of object size

Keeping Track of Object Size

  • Meanwhile, the size bit-vector (S-vector) records the boundaries between objects.

Complete Binary Tree

Allocation bit-vector

(A-vector)

S-Unit

(Size encoder)

Size bit-vector

(S-vector)


Five hardware implemented instructions

Five Hardware-Implemented Instructions

  • h_malloc• mark

  • h_free• sweep

  • h_realloc

  • All are implemented in the Dynamic Memory Management Unit.

  • DMMU manages the heap


The dmmu

The DMMU

  • Each entry contains three bit-vectors.

  • X-vector used for reallocation & g.c.

A-vector

S-vector

X-vector

h_malloc / h_free / h_realloc

mark / sweep

gc_ack

O.S.

sbrk/brk

CPU

DMMU

object_size

Kernel

object_pointer


The alb

The ALB

  • Each entry in the DMMU tracks the allocation status of a region of memory.

  • Compare with a TLB, which tracks the location of a region of virtual memory.

  • So, these entries make up the Allocation Lookaside Buffer.

  • Entries can be saved and fetched to A-, S-, and X- bitmaps.


Steps in allocation

Steps in Allocation

  • Compare requested size with largest_available_size in each ALB entry.

  • Select an entry & pass requested size to CBT

  • CBT locates first available chunk.

  • Chunk is allocated using buddy system.

  • Unused words at end are returned to free memory.

  • Address of block is returned, and status changed to allocated.

  • S-vector is updated accordingly.

Size (A1)

Address pointer (A1)

Complete Binary Tree (

CBT )

h_

malloc

Allocation bit-vector

(A bit-vector)

(A2)

(Size encoder)

S-Unit

(A3)

Size bit-vector

(S bit-vector)


Steps in deallocation

Steps in Deallocation

  • Deallocation is very similar to allocation.

Address pointer (D1)

Complete Binary Tree (

CBT )

h_

free

Allocation bit-vector

(A bit-vector)

(D2)

(Size encoder)

S-Unit

(D3)

Size bit-vector

Size boundaries

(S bit-vector)


Steps in marking

Steps in Marking

  • Each live-object pointer sent to CBT, one after another.

    • Page # of object pointer selects a bit-vector.

  • Signal generated by CBT is latched in X-vector.

Address pointer

Complete Binary Tree (

CBT )

mark

Auxiliary bit-vector

Live-object pointers

(X-vector)


Steps in sweeping

Steps in Sweeping

  • Bit-sweeper receives the sweep signal.

  • Size info from S-vector and liveness status from X-vector generate new alloc. status and largest_avail_size.

Allocation bit-vector

(A vector)

(E2)

(Size encoder)

S-Unit

(E2)

(E1)

Size bit-vector

(S vector)

(E1)

sweep (E1)

GC_

ack

(E3)

Bit-Sweeper/ X-Unit

(C1)

Auxiliary bit-vector

(X vector)


Putting it all together

Putting it All Together

Size (A1)

Address pointer (A1)

Complete Binary Tree (

CBT )

Address pointer (B1, D1)

h_

malloc

, h_ free, mark (A1, B1, D1)

Allocation/

deallocation

output (A1, B1)

(D1)

Allocation bit-vector

(A vector)

(A2,B2,E2)

(Size encoder)

S-Unit

(A2,B2,E2)

(E1)

Size bit-vector

Size boundaries (B1).

(S vector)

(C1, E1)

h_

realloc

(C1) / sweep (E1)

GC_

ack

(E3)

Bit-Sweeper/ X-Unit

(C1)

(E1)

Starting_address (C1)

(C1)

Auxiliary bit-vector

Enable signal

live object pointer

Ending_address(C1)

generator

(X-vector)

(C2)

Reallocation Status (RS-Unit)

A. Steps required for allocation

B. Steps required for

deallocation

C. Steps required for reallocation

D. Steps required for marking

Reallocation Status (C2)

E. Steps required for sweeping


Memory usage

Memory Usage

  • Most schemes encode size information in objects themselves.

    • This is more efficient with large objects.

    • Bit-vector is more efficient with small objects.

  • If object contains 8 bytes for size and1 for marking, bitmap scheme more efficient when avg. size < 384 bytes.

  • Avg. object size for C++ & Java programs:  101 bytes.


Performance gain

Performance Gain

  • ALB miss penalty.

    • Bit-vector length of 500 bits ( 64 bytes) gives 97% hit ratio.

    • This => ALB entry is 192 bytes.

    • 64-bit 100 MHz bus gives 800 MB/s. transfer rate.

    • => miss penalty is 96 cycles (192x400/800)

  • With ALB hit, it takes 2 cycles to allocate memory.

  • => avg. hw. malloc time is 4.82 cycles.

  • Software malloc varies from 51 to 900 cycles, with avg. 192.

  • In an application that spends 30% of time allocating, speedup would be 41%.


Summary

Summary

  • O-o applications spend a lot of their time allocating memory.

  • To allocate in hardware, we use a bit-vector based approach.

  • Allocation/deallocation done combinationally using a complete binary tree on top of the bit-vector.

  • Yields speedup of > 40% on memory-intensive programs.


  • Login