Cse 8383 advanced computer architecture
This presentation is the property of its rightful owner.
Sponsored Links
1 / 42

CSE 8383 - Advanced Computer Architecture PowerPoint PPT Presentation


  • 92 Views
  • Uploaded on
  • Presentation posted in: General

CSE 8383 - Advanced Computer Architecture. Week-2 Week of Jan 19, 2004 engr.smu.edu/~rewini/8383. Contents. Placement Policies (Quick Review) Replacement Policies FIFO, Random, Optimal, LRU, MRU Cache Write Policies Pipelines. Memory Hierarchy. Latency Bandwidth. CPU Registers. Cache.

Download Presentation

CSE 8383 - Advanced Computer Architecture

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Cse 8383 advanced computer architecture

CSE 8383 - Advanced Computer Architecture

Week-2

Week of Jan 19, 2004

engr.smu.edu/~rewini/8383


Contents

Contents

  • Placement Policies (Quick Review)

  • Replacement Policies

    • FIFO, Random, Optimal, LRU, MRU

  • Cache Write Policies

  • Pipelines


Memory hierarchy

Memory Hierarchy

Latency

Bandwidth

CPU Registers

Cache

Main Memory

Secondary Storage

Speed

Cost per bit


Placement policies

Placement Policies

  • How to Map memory blocks (lines) to Cache block frames (line frames)

Block Frames

(Line Frames)

Blocks

(lines)

Cache

Memory


Placement policies1

Placement Policies

  • Direct Mapping

  • Fully Associative

  • Set Associative


Example direct mapping

Example – Direct Mapping

  • Memory  4K blocks

  • Block size  16 words

  • Address size log2 (4K * 16) = 16

  • Cache  128 blocks

Tag

Block frame

Word

5

7

4


Example direct mapping1

Example - Direct Mapping

0

1

31

5 bits

0

0

128

3968

1

1

129

2

127

127

255

4095

Tag

cache

Memory


Example fully associative

Example – Fully Associative

  • Memory  4K blocks

  • Block size  16 words

  • Address size log2 (4K * 16) = 16

  • Cache  128 blocks

Tag

Word

12

4


Example fully associative1

Example – Fully Associative

0

1

12 bits

0

1

2

4094

127

4095

Tag

cache

Memory


Example set assciative

Example – Set Assciative

  • Memory  4K blocks

  • Block size  16 words

  • Address size log2 (4K * 16) = 16

  • Cache  128 blocks

  • Num of blocks per set = 4

  • Number of sets = 32

Tag

Set

Word

7

5

4


Example set associative

Example – Set Associative

0

1

127

7 bits

0

32

0

1

33

1

2

Set 0

3

cache

Tag

124

31

63

4095

125

126

Set 31

Memory

127


Comparison

Comparison

  • Simplicity

  • Associative Search

  • Cache Utilization

  • Replacement


Group exercise

Group Exercise

The instruction set for your architecture has 40-bit addresses, with each addressable item being a byte. You elect to design a four-way set-associative cache with each of the four blocks in a set containing 64 bytes. Assume that you have 256 sets in the cache.

Show the Format of the address


Group exercise cont

Group Exercise (Cont.)

  • Consider the following sequence of addresses. (All are hex numbers)

  • 0E1B01AA05 0E1B01AA07 0E1B2FE305 0E1B4FFD8F 0E1B01AA0E

  • In your cache, what will be the tags in the sets(s) that contain these references at the end of the sequence? Assume that the cache is initially flushed (empty).


Group exercise cont1

Group Exercise (cont.)

  • Address size = 40

  • Block size  64 words

  • Num of blocks per set = 4

  • Number of sets = 256

  • Cache  256*4 blocks

Tag

Set

Word

26

8

6


Group exercise cont2

Group Exercise (cont.)

0E1B01AA05

0E1B011010101000000101

0E1B01AA07

0E1B011010101000000111

  • 0E1B2FE305

    0E1B2F1110001100000101


Group exercise cont3

Group Exercise (cont.)

0E1B4FFD8F

0E1B4F1111110110001111

0E1B01AA0E

0E1B011010101000001110


Replacement techniques

Replacement Techniques

  • FIFO

  • LRU

  • MRU

  • Random

  • Optimal


Group exercise1

Group Exercise

Suppose that your cache can hold only three blocks and the block requests are as follows:

7, 0, 1, 2, 0, 3, 0, 4, 2, 3, 0, 3, 2, 1, 2, 0, 1, 7, 0, 1

Show the contents of the cache if the replacement policy is a) LRU, b) FIFO, c) Optimal


Group exercise cont4

0

2

7

2

2

7

4

4

4

2

0

0

7

7

7

0

0

0

0

2

2

1

3

3

0

2

0

1

0

3

2

0

2

1

1

0

1

0

3

1

0

3

0

2

2

0

1

1

3

1

3

2

2

1

2

1

3

Group Exercise (Cont.)

FIFO

MRU


Group exercise cont5

7

7

0

0

1

1

Group Exercise (Cont.)

OPT

LRU


Cache write policies

Cache Write Policies

  • Cache Hit

    • Write Through

    • Write Back

  • Cache Miss

    • Write-allocate

    • Write-no-allocate


Read policy cache miss

Read Policy -- Cache Miss

  • Missed block is brought to cache – required word forwarded immediately to the CPU

  • Missed block is entirely stored in the cache and the required word is then forwarded to the CPU


Pentium iv two level cache

Pentium IV two-level cache

Processor

Cache

Level 2

L2

Main Memory

Cache

Level 1

L1


Cache l1

Cache L1

Cache organizationSet-Associative

Block Size64 bytes

Cache L1 size 8KB

Number of blocks per setFour

CPU AddressingByte addressable


Cpu and memory interface

CPU and Memory Interface

b

CPU

0

n lines

1

MAR

2

b lines

MDR

Main Memory

R / W

2n - 1


Cse 8383 advanced computer architecture

Pipelining


Contents1

Contents

  • Introduction

  • Linear Pipelines

  • Nonlinear pipelines


Basic idea

Basic Idea

  • Assembly Line

  • Divide the execution of a task among a number of stages

  • A task is divided into subtasks to be executed in sequence

  • Performance improvement compared to sequential execution


Pipeline

n

2

1

Sub-tasks

Task

Pipeline

Stream of

Tasks

n

2

1

Pipeline


5 tasks on 4 stage pipeline

1

5

2

6

3

7

4

8

5 Tasks on 4 stage pipeline

Time

Task 1

Task 2

Task 3

Task 4

Task 5


Speedup

t

t

t

Speedup

Stream of

m Tasks

n

2

1

Pipeline

T (Seq) = n * m * t

T(Pipe) = n * t + (m-1) * t

Speedup = n * m/n + m -1


Linear pipeline

Linear Pipeline

  • Processing Stages are linearly connected

  • Perform fixed function

  • Synchronous Pipeline

    • Clocked latches between Stage i and Stage i+1

    • Equal delays in all stages

  • Asynchronous Pipeline (Handshaking)


Latches

Latches

S1

S2

S3

L1

L2

Slowest stage determines delay

Equal delays  clock period


Reservation table

Reservation Table

Time

S1

S2

S3

S4


5 tasks on 4 stages

5 tasks on 4 stages

Time

S1

S2

S3

S4


Non linear pipelines

Non Linear Pipelines

  • Variable functions

  • Feed-Forward

  • Feedback


3 stages 2 functions

3 stages & 2 functions

Y

X

S1

S2

S3


Reservation tables for x y

Reservation Tables for X & Y

S1

S2

S3

S1

S2

S3


Linear instruction pipelines

Linear Instruction Pipelines

  • Assume the following instruction execution phases:

    • Fetch (F)

    • Decode (D)

    • Operand Fetch (O)

    • Execute (E)

    • Write results (W)


Pipeline instruction execution

Pipeline Instruction Execution

F

D

O

E

W


Instruction dependencies

Instruction Dependencies

  • Data Dependency

    (Operand is not ready yet)

  • Instruction Dependency

    (Branching)

    Will that Cause a Problem?


  • Login