slide1 l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Outline PowerPoint Presentation
Download Presentation
Outline

Loading in 2 Seconds...

play fullscreen
1 / 59

Outline - PowerPoint PPT Presentation


  • 248 Views
  • Uploaded on

Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers By Sreemukha Kandlakunta Phani Shashank Nagari. Outline. Base Line Design Reducing Conflict Misses Miss Caching Victim Caching Reducing Capacity and Compulsory Misses

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Outline' - althea


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

Improving Direct-Mapped Cache Performance by the Additionof a Small Fully-Associative Cache and Prefetch BuffersBySreemukha KandlakuntaPhani Shashank Nagari

outline
Outline

Base Line Design

Reducing Conflict Misses

Miss Caching

Victim Caching

Reducing Capacity and Compulsory Misses

Stream Buffers

Multi-way Stream Buffers

base line design contd
Base Line Design Contd..
  • Size of on-chip caches usually varies
  • High speed technologies result in smaller on chip caches
  • L1 caches are assumed to be direct mapped
  • L1 cache line sizes – 16 - 32 B
  • L2 cache line sizes – 128-256B
parameters assumed
Parameters assumed
  • Processor Speed: 1000 MIPS
  • L1 Inst and Data Cache

Size : 4Kb

Line Size : 16B

  • L2 Inst and Data Cache

Size : 1MB

Line Size : 128B

parameters assumed contd
Parameters assumed Contd..

Miss Penalty

L1- 24 Inst times

L2- 320 Inst times

inferences
Inferences
  • Potential performance loss in memory hierarchy
  • Improving performance of memory hierarchy rather than CPU performance
  • H/w Techniques are used for improving the performance of the baseline M-H
how direct mapped cache works
Main Memory

Tag Data Block Number

00

000

00

001

How Direct Mapped Cache works

00

010

00

011

000

00

100

01

001

00

101

010

00

110

Direct Mapped Cache with 8 Blocks

011

00

111

100

01

000

101

01

001

001

110

01

010

111

01

011

01

100

01

101

01

110

  • How to search
  • 00101, 01101, 10101, 11101 maps to block 101

01

111

10

000

10

001

001

10

010

10

011

10

100

  • How to identify?
  • Match the Tag
  • Tag 01 in block 001 means address 01001 is there

10

101

10

110

10

111

11

000

11

001

001

11

010

11

011

11

100

11

101

how fully associative cache works
Main Memory

Tag Data Block Number

00

000

00

001

How Fully-associative Cache works

00

010

00

011

000

00

100

001

00

101

Fully Associative Cache with 8 Blocks

010

00

110

011

00

111

100

01

000

101

01

001

110

01

010

111

01

011

01

100

01

101

01

110

01

111

10

000

10

001

10

010

  • Where to search?
  • Every Block in Cache

10

011

10

100

10

101

10

110

10

111

11

000

  • Very Expensive

11

001

11

010

11

011

11

100

11

101

cache misses
Cache Misses
  • Three Kinds

- Instruction read miss: Causes most delay, CPU has to wait until the instruction Is fetched from the DRAM

- Data read miss: Causes less delay, Inst not dependent on cache miss can continue execution until data is returned from DRAM

- Data write miss: causes least delay, write can be queued & CPU can continue until queue is full

types of misses
Types of Misses
  • Conflict Misses

Reduced by caching : Miss and Victim

  • Compulsory Misses
  • Capacity Misses

Both are reduced by prefecthing:

Stream Buffers

Multi-way Buffers

conflict miss
Conflict Miss
  • Conflict Misses are the misses which would not occur if the cache was Fully associative and had LRU
  • If an item has been evicted from the cache and the next miss corresponds to that item then that kind of miss is called the conflict miss
conflict misses contd
Conflict Misses Contd..
  • Conflict Misses account to
    • 20-40% of overall D-M misses
    • 39% of L1-D$ misses
    • 29% of L1-I$ misses
outline18
Outline

Base Line Design

Reducing Conflict Misses

Miss Caching

Victim Caching

Reducing Capacity and Compulsory Misses

Stream Buffers

Multi-way Stream Buffers

miss caching
Miss Caching
  • Small, Fully associative on-chip cache
  • On Miss

Data is returned to

-Direct mapped cache

-Small Miss cache ( Where it replaces LRU item)

  • Processor probes both D-M and Miss cache
observations
Observations
  • Eliminates long off-chip miss penalty
  • More data conflicts misses are removed than Instruction conflict misses

- Instructions within a procedure do not conflict as long as the procedure size is < cache size

- If an instruction within the program calls another program which may be mapped else where, a conflict arises- instruction conflict

miss cache performance
Miss Cache Performance
  • For 4 KB D$ size

- Miss cache of 2 entries can remove

25% of D$ conflict misses i.e.

13% of overall D$ misses

- Miss cache of 4 entries can remove

36% of D$ conflict misses i.e.

18% of overall D$ misses

  • After 4 entries the improvement is minor
outline25
Outline

Base Line Design

Reducing Conflict Misses

Miss Caching

Victim Caching

Reducing Capacity and Compulsory Misses

Stream Buffers

Multi-way Stream Buffers

victim caching
Victim Caching
  • Duplication of the data wastes storage space in miss cache
  • Loads F-A cache with victim line from the

D-M cache

  • When data misses in the D-M cache but hits in the Victim cache, contents are swapped
victim cache performance
Victim Cache Performance
  • Victim cache consisting of just one line is better than miss cache consisting of 2 lines
  • Significant improvement in the performance of all the benchmark programs
effect of d m cache size on victim cache performance
Effect of D-M cache size on Victim cache performance
  • Smaller D-M caches – Most benefited due to addition of victim cache
  • As D-M cache size increases, likelihood of conflict misses removed by victim cache decreases
  • As the percentage of conflict misses decreases, the percentage of these misses removed by victim cache decreases
effect of line size on victim cache performance
Effect of Line Size on Victim CachePerformance
  • As line size increases the number of conflict misses increases
  • As a result percentage of misses removed by victim cache increases
victim caches and l2 caches
Victim caches and L2 Caches
  • Victim caches are also useful for L2 caches due to large line sizes
  • Using L1 victim cache can also reduce the number of L2 conflict misses
outline37
Outline

Base Line Design

Reducing Conflict Misses

Miss Caching

Victim Caching

Reducing Capacity and Compulsory Misses

Stream Buffers

Multi-way Stream Buffers

reducing capacity and compulsory misses
Reducing Capacity and Compulsory Misses
  • Compulsory Misses

First reference to a piece of data

  • Capacity Misses

Due to insufficient cache size

prefetching algorithms
Prefetching Algorithms
  • Prefetch Always: Access to line “i” implies to prefetch access for “i+1”
  • Prefetch on miss : Reference to block “i”

causes prefetch to block “i+1” Iff the block was a miss

  • Tagged Prefetch : Tag bit is set to `0` when a block is prefetched and to set 1 when block is used
outline41
Outline

Base Line Design

Reducing Conflict Misses

Miss Caching

Victim Caching

Reducing Capacity and Compulsory Misses

Stream Buffers

Multi-way Stream Buffers

stream buffers
Stream Buffers
  • Prefetched lines are placed in buffer in order to avoid polluting
  • Each entry consists of tag ,an available bit and data line
  • If a reference misses in the cache but hits in the buffer , the cache can be reloaded
  • When a line is moved from the SB , entries in the SB shift up and new successive data is fetched
stream buffer mechanism contd
Stream Buffer Mechanism Contd..
  • On Miss
    • Prefetch successive lines
    • Enter tag for address in to the SB
    • Set available bit to false
  • On return of the prefetched data
    • Place data in entry with its tag
    • Set available bit to true
stream buffer performance
Stream Buffer Performance
  • Most instruction references break the purely sequential access pattern by the time the 6th successive line is fetched
  • Data references end even sooner
  • As a result , Stream buffers show better performance at removing I$ misses
limitations of stream buffers
Limitations of Stream Buffers
  • Stream buffers considered are FIFO queues
  • Head of the queue has tag comparator
  • Elements must be removed strictly in sequence
  • Works only for sequential line misses
  • Fails for a non-sequential line miss
outline48
Outline

Base Line Design

Reducing Conflict Misses

Miss Caching

Victim Caching

Reducing Capacity and Compulsory Misses

Stream Buffers

Multi-way Stream Buffers

multi way stream buffers
Multi-way stream buffers
  • Single data SB`s could remove 72% of I$ misses and 25% of D$ misses
  • Multi-way SB was simulated- to improve performance of SB`s for data references
  • Consists of 4 SB in parallel
  • On Miss the least recently Hit SB is cleared and data is started fetching from the miss address
observations51
Observations
  • Performance of the instruction stream remains virtually unchanged
  • Significant improvement in the performance of the data stream
  • Removes 43% of the misses for the test programs i.e. almost twice the performance of single SB
performance evaluation
Performance Evaluation
  • Over the set of 6 benchmarks on an average 2.5% of 4KB D-M D$ misses that hit in a 4 entry victim cache also hit in a 4 way SB
  • The combination of buffers and victim caches reduces the L1 miss rate to less than half of that of the base line system
  • Resulting in an average of 143% improvement in system performance for the 6 benchmarks
future enhancements
Future Enhancements
  • The study has concentrated on applying these H/W techniques to L1 caches
  • Application of these techniques to L2 caches forms an interesting area of future work
  • Performance of victim caching and stream buffers can be investigated for OS design and for multi-programming work loads
conclusions
Conclusions
  • Miss caches remove tight conflicts where several addresses map to the same cache line
  • Victim caches are an improvement to miss caching that save the victim of the cache miss
  • Stream buffers prefetch cache lines after missed cache line
  • Multi-way stream buffers are a set of stream buffers that can do concurrent prefetches