slide1
Download
Skip this Video
Download Presentation
Memory Arithmetic Unit Interface

Loading in 2 Seconds...

play fullscreen
1 / 28

Memory Arithmetic Unit Interface - PowerPoint PPT Presentation


  • 74 Views
  • Uploaded on

Memory Arithmetic Unit Interface. Jason M. Meier Justin S. Teller Tom J. Keeley. Current Paradigm. CPU. Done: Task 1. CPU:. Task 1. Task 2. MEMORY CTRL:. MEMORY:. DRAM System. Memory Controller. Active Pages Implementation. Used Configurable DRAM - RADRAM.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Memory Arithmetic Unit Interface' - hewitt


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

Memory Arithmetic Unit Interface

Jason M. Meier

Justin S. Teller

Tom J. Keeley

slide2

Current Paradigm

CPU

Done: Task 1

CPU:

Task 1

Task 2

MEMORY

CTRL:

MEMORY:

DRAM System

Memory

Controller

slide3

Active Pages Implementation

  • Used Configurable DRAM - RADRAM
  • Reconfigurable logic implements various memory functions
  • “Active Page” consists of a page of data and a set of associated functions
  • Works on individual DRAM chips
  • Processor-centric and Memory-centric partitioning

* Active Pages - Oskin, Chong, Sherwood – ISCA ‘98

slide4

MAUI Implementation

CPU

Done: Task 1

CPU:

Task 1

Task 2

MEMORY

CTRL/MAUI:

Task 1

MEMORY:

MAU

MAUI

DRAM System

Memory

Controller

slide5

MAUI Instruction Set

MAU

MAUI

DRAM System

Memory

Controller

MAUI_LD <m_rd>,offset(<cpu_rs>)

1)CPU sends an MAU_LOAD register command to the MC (along with the reg # and address to read) across the front-side bus.

2)MC interprets command and places a Read command in the transaction queue.

3)DRAM performs read.

4) Result is stored in appropriate register in the

MAUI register file.

CPU:

LOAD REG

1

4

MC/MAUI:

2

3

DRAM:

R

4

3

2

1

slide6

MAUI Instruction Set II

MAUI_LDI <rd>,<cpu_rs>

1)CPU sends an MAU_LOADI register command to the MC (along with the reg # and integer to save) across the front-side bus.

2)MC interprets command and places integer in the appropriate register in the MAUI register file.

CPU:

LOADI REG

1

2

MC/MAUI:

DRAM:

MAU

MAUI

2

DRAM System

Memory

Controller

1

slide7

MAUI Instruction Set III

CPU

MAUI_ADD <rd>,<rs1>,<rs2>,<rsz>

1

1)CPU invalidates addresses in the cache that fall within the range of the destination array. Addresses within the range of the source arrays are written back if dirty.

2)CPU sends an MAUI_ADD command to the MC (along with the reg #’s) across the front-side bus.

3)MC interprets command, MAUI adds the appropriate registers and places a Write command and next two Read commands in the transaction queue.

4) Step 3 repeats for the length of the array.

CPU:

MAU_ADD

2

MC/MAUI:

4

3

DRAM:

W

R

R

W

1

3

MAU

MAUI

DRAM System

4

Memory

Controller

2

slide9

Issues: Address Mapping

Virtual Space

Memory that is Contiguous in Virtual Spacemay not be Contiguous in Physical Space

  • MAUI assumes consecutive addressing (size register)
  • MAUI operations which cross page boundaries must be split into separate operations for each page

TLB

  • Programmer will not know mapping scheme

Physical

Space

  • Result: All MAUI operations will need to be privileged instructions, accessed by programs through a system call.
slide10

Issues: Compiler Issues

  • The compiler will be responsible for deciding when MAUI instructions should be used.
  • This decision will be based on the size of the array, and if it’s likely to be in the cache, or if it’s likely to used by an instruction that isn’t implemented in the MAUI.
slide11

Issues: Task Interrupts

CPU

CPU:

Task 1

Task 2

Task 2

MEMORY

CTRL/MAUI:

Task 1

Task 2

Task 1

MEMORY:

MAU

MAUI

DRAM System

Memory

Controller

slide12

Example: maui_add I

BIU

maui_ld r1, 0

Memory

maui_ld r1, 0

Transaction Queue

Memory Controller

slide13

Example: maui_add II

BIU

maui_ld r2, 5

Memory

maui_ld r2, 5

Transaction Queue

Memory Controller

slide14

Example: maui_add III

BIU

maui_ld r3, 10

Memory

maui_ld r3, 10

Transaction Queue

Memory Controller

slide15

Example: maui_add IV

BIU

maui_ld r4, 2

Memory

maui_ld r4, 2

Transaction Queue

Memory Controller

slide16

Example: maui_add V

BIU

maui_add r3, r1, r2

Memory

maui_add r3, r1, r2

Transaction Queue

R, 0

R, 5

Memory Controller

slide17

Example: maui_add VI

BIU

Read 10

Memory

maui_add r3, r1, r2*

Transaction Queue

D1[0]

Memory Controller

slide18

Example: maui_add VII

BIU

Read 10

Memory

maui_add r3, r1, r2*

Transaction Queue

D2[0]

Memory Controller

slide19

Example: maui_add VIII

BIU

Read 10

Memory

maui_add r3, r1, r2*

Transaction Queue

R, 1

R, 6

W,10, D1[0]+D2[0]

Memory Controller

slide20

Example: maui_add IX

BIU

Write 6, D

Memory

maui_add r3, r1, r2*

Transaction Queue

D1[1]

Memory Controller

slide21

Example: maui_add X

BIU

Write 6, D

Memory

maui_add r3, r1, r2*

Transaction Queue

D2[1]

Memory Controller

slide22

Example: maui_add XI

BIU

Memory

Next Instruction

Transaction Queue

W,10, D1[1]+D2[1]

Memory Controller

slide23

Advantages & Disadvantages

  • Advantages
  • Better performance for DRAM latency bound computations
  • Lower latency to DRAM compared to CPU
  • Reduced traffic on front-side bus
  • Concurrent execution
  • Disadvantages
  • MAUI operates at a lower clock frequency
  • Increased compiler complexity
  • Increased fabrication costs (More Logic = More $$)
  • Recently used data may not be cached
slide24

Alternative Implementation

CPU

MAUI Occupies its Own Read & Write Bus

  • GOOD
  • GOOD
  • Eliminate Contention with CPU for DRAM system resources.
  • Create Circular Data flow resulting in increased performance
  • Need Specialized Triple-Ported DRAM system leading to increased production costs

X BAD

MAU

MAUI Read &

Write Bus

MAUI

Memory

Controller

DRAM System

slide25

Test Setup

  • Simulated on SimpleScalar version 4.0
  • One set of test benches with dual array operations running in both the MAUI and CPU with four different array sizes. This trial was repeated for both shared and independent memory access busses.
  • Found up to a 43% speedup!
results
Results

Total CPU Cycles

slide27

Future Enhancements I

MAUS

MAUI

DRAM System

Memory

Controller

MAU Multi-tasking

CPU:

Task 1

Task 2

Task 3

Task 3

Task 2

MEMORY

CTRL/MAUI:

Task 1

MEMORY:

More MAUs for

Parallelism

Larger Register

File

Small

Cache

slide28

Future Enhancements II

MAU

MAUI

DRAM System

Memory

Controller

Better Pipelining

CPU:

MAU_ADD

MC/MAUI:

DRAM:

R

R

R

R

R

R

R

R

W

W

W

W

Larger Register

File to Hold

Intermediate Results

ad