Associative Mapping

1 / 18

Associative Mapping - PowerPoint PPT Presentation

Associative Mapping. The strict mapping restriction enforced by the direct mapping strategy can be relieved by allowing a memory block to be written to any cache block, i.e., the mapping function f is still: f ( q )  {0,1,...., n -1} But: - the mapping strategy is much more flexible

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

PowerPoint Slideshow about 'Associative Mapping' - Olivia

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Associative Mapping

The strict mapping restriction enforced by the direct mapping strategy can

be relieved by allowing a memory block to be written to any cache block,

i.e., the mapping function f is still:

f(q)  {0,1,....,n-1}

But:

- the mapping strategy is much more flexible

and

- the identification problem (telling if a given memory block

is in cache) becomes more complex

cse241 1

Suppose we stick with the previous basic model viz.,

16 bit word

64 KW memory (128 KB)

16 word block

2 KW cache

Then the lower 4 bits of an address identify the word in block component of a cache

But with associative mapping, the remaining 12 bits carry no mapping information;

rather, they all must be stored by the cache blocks to identify which memory block

is in the cache

12 bits of tag 4 bits, word in block

cse241 2

Associative Map

Mp block maps to any free

cache block;

cache tag is the upper address

bits (all bits except the 4

word-in-block bits),

here 12 bits

Since any free cache block can be used, a block in the cache

is replaced only if the cache is full.

cse241 3

Is Mp block q in cache?

Consider the tag:

Suppose we have 128 blocks in cache. We could implement the tag bits of cache

by using a 128x12 RAM (generally, an (N x t) RAM, where N = # of cache blocks

and t = # of bits required for the tag field).

Then we can search for a given tag field in the RAM to determine if the Mp block

with that tag is in the cache.

Here, we use log2(N) bits as the RAM address (remember, the tag field is stored at

some address in this RAM); the content is the tag value of the

Mp block currently loaded in that cache block (you might see a logical problem

here).

A better solution might be to use a (2t x N)-bit RAM; this way we could use the

t bits of the tag as the address whose contents would be the value of (label of) the

cache block that Mp block is stored in if it is present. (You might see the same

logical difficulty here).

cse241 4

In order to see if a tag is in the tag RAM, the entire tag RAM must be searched for the tag.

An alternative solution is to use a CAM (or associative memory). Unlike a RAM,

a CAM functions as follows:-

1. a search key is applied to the CAM

2. the CAM emits those addresses whose contents match the search key

(in whole or in part, depending on the CAM design) :-

Content match lines, 1 for

each word in the CAM;

these are set to 1 if the CAM

contents at that word match

the search key

Search

key

cse241 5

CAM Operation

When a search key is presented to a CAM

-- all words in the CAM see the search key at exactly the same time

-- all words in the CAM set their “match” lines at exactly the

same time

Thus, in an N-word by m-bit CAM, it takes a constant amount of time to find

every occurrence of the search key (T(N) = O(1))

This is significantly different from the best search time in a RAM

Question: as a function of n, the size (in words) of a RAM, what is the fastest

time to find:

a) any occurrence of a pattern and

b) all occurrences of a pattern?

cse241 6

A Trivial 1 -bit CAM cell

Query bit

Match bit

Data bit

Initialization bits

cse241 7

Associative Processors

Associative processors are machines whose Mp is dominated by CAM; because they

can perform constant-time pattern matching (partial or whole pattern matches), they

have important value in problems dominated by searching -- especially time-critical

problems in which it is undesirable to have an increased search time merely because

there are more entities to be searched.

--Air Traffic Control (Goodyear STARAN project)

cse241 8

Block-set (set-associative) mapping

Purely associative cache mapping is flexible, but costly. On the one hand

- mapping an Mp to any cache block can improve cache performance

but

- there is a penalty which has to be paid (in cache cost or in search time)

The standard alternative is as follows:-

- map Mp blocks to groups of cache blocks in a direct (modulo) fashion

but

- within a group of cache blocks, allow the Mp block to be loaded

to any block

This is called n-way block-set associative mapping (n blocks per group).

cse241 9

Set-associative mapping

Mp block maps directly to

a specific cache block set.

Within the set, the Mp block

is associatively mapped to

one of the set’s blocks.

cse241 10

Suppose we have our 2KW cache as previously, with 128 blocks of 16 words each,

but suppose there are 4 blocks per set.

There are therefore 32 (25) sets. Thus the address partition is:-

7 bits (Tag field) 5 bits set (direct map) 4 bits (word in block)

Identifies which

in a cache block

(cache tag bits)

Used to map the Mp

blocks directly to the

cache sets

Identifies words within a block

cse241 11

Valid Bits

It is possible that an Mp block resident in memory will become invalid

because the corresponding Mp block is updated by a source which is not

the CPU (see DMA later).

To handle this situation, valid bits (not the same a dirty bits) are set to

0 on power up.

When a cache block is loaded from Mp, the valid bit is set; it stays set

unless the Mp block is updated by another (non-CPU) device.

cse241 12

Replacement Algorithms

In a purely direct-mapped cache, the replacement algorithm is trivial:-

if Mp block q maps to cache block p, load q into p even if p contains

another Mp block

However, if q can map to more than one cache block (N in the case of a purely

associative cache, or the number of blocks per set in a block-set associative

cache), then a decision must be made as to the replacement policy.

The following policy is obvious:-

If Mp maps to an empty cache block, use that cache block.

cse241 13

Oldest First

Suppose we have an n-way block-set associative cache (n blocks per set).

The oldest-first policy is:-

if there is an unused cache block in the set, load the Mp block into it

else

load the Mp block into the cache block in the set which has been

in the cache the longest

This turns out not to be a particularly good policy because it does not take

into consideration how recently a cache block was used.

In fact, the oldest loaded cache block in the set may be the most used

cache block in the set.

cse241 14

Random Replacement

Suppose we have an n-way block-set associative cache (n blocks per set).

The random policy is:-

if there is an unused cache block in the set, load the Mp block into it

else

load the Mp block into a randomly chosen cache block in the set

Oddly enough, this policy turns out to be reasonably successful in practice

cse241 15

LRU (Least Recently Used)

Suppose we have an n-way block-set associative cache (n blocks per set).

The LRU policy is:-

if there is an unused cache block in the set, load the Mp block into it

else

load the Mp block into the cache block in the set which is the

least recently used

LRU is the most effective policy, but it needs ongoing updating computations.

A simple algorithm is as follows:-

cse241 16

LRU Algorithm

Suppose we have a four-block set. A 2-bit counter is associated with each block in the set. The algorithm is:-

On a cache hit,

- set the counter of the referenced block to 0

- increment by 1 the counters of blocks whose counter values

were less than the original value of the referenced block

- leave the other counters alone

On a cache miss with the set not full

- the counter of the new block is set to 0

- increment all other counters by 1

On a cache miss with the set full

- the block with counter value = “11” (3) is removed

- the new block is loaded and its counter set to 0

- all other counters in the set are incremented by 1

cse241 17

LRU sample

Event Block counter value

b0 b1 b2 b3

init 0 0 0 0

miss 0 1 1 1 b0 loaded

miss 1 0 2 2 b1 loaded

miss 2 1 0 3 b2 loaded

miss 3 2 1 0 b3 loaded

hit 3 0 2 1 hit b1; blocks b2 and b3 inc by 1

hit 3 0 2 1 hit b1; no blocks had lower counts

miss 0 1 3 2 miss; load block b0; inc blocks by 1

hit 0 1 3 2 hit b0

hit 1 2 0 3 hit b2 (inc those orig. lower)

miss 2 3 1 0 miss; load b3

Note that the counters are always distinct!

cse241 18