Memory Arithmetic Unit Interface
Download
1 / 28

Memory Arithmetic Unit Interface - PowerPoint PPT Presentation


  • 74 Views
  • Uploaded on

Memory Arithmetic Unit Interface. Jason M. Meier Justin S. Teller Tom J. Keeley. Current Paradigm. CPU. Done: Task 1. CPU:. Task 1. Task 2. MEMORY CTRL:. MEMORY:. DRAM System. Memory Controller. Active Pages Implementation. Used Configurable DRAM - RADRAM.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Memory Arithmetic Unit Interface' - hewitt


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Memory Arithmetic Unit Interface

Jason M. Meier

Justin S. Teller

Tom J. Keeley


Current Paradigm

CPU

Done: Task 1

CPU:

Task 1

Task 2

MEMORY

CTRL:

MEMORY:

DRAM System

Memory

Controller


Active Pages Implementation

  • Used Configurable DRAM - RADRAM

  • Reconfigurable logic implements various memory functions

  • “Active Page” consists of a page of data and a set of associated functions

  • Works on individual DRAM chips

  • Processor-centric and Memory-centric partitioning

* Active Pages - Oskin, Chong, Sherwood – ISCA ‘98


MAUI Implementation

CPU

Done: Task 1

CPU:

Task 1

Task 2

MEMORY

CTRL/MAUI:

Task 1

MEMORY:

MAU

MAUI

DRAM System

Memory

Controller


MAUI Instruction Set

MAU

MAUI

DRAM System

Memory

Controller

MAUI_LD <m_rd>,offset(<cpu_rs>)

1)CPU sends an MAU_LOAD register command to the MC (along with the reg # and address to read) across the front-side bus.

2)MC interprets command and places a Read command in the transaction queue.

3)DRAM performs read.

4) Result is stored in appropriate register in the

MAUI register file.

CPU:

LOAD REG

1

4

MC/MAUI:

2

3

DRAM:

R

4

3

2

1


MAUI Instruction Set II

MAUI_LDI <rd>,<cpu_rs>

1)CPU sends an MAU_LOADI register command to the MC (along with the reg # and integer to save) across the front-side bus.

2)MC interprets command and places integer in the appropriate register in the MAUI register file.

CPU:

LOADI REG

1

2

MC/MAUI:

DRAM:

MAU

MAUI

2

DRAM System

Memory

Controller

1


MAUI Instruction Set III

CPU

MAUI_ADD <rd>,<rs1>,<rs2>,<rsz>

1

1)CPU invalidates addresses in the cache that fall within the range of the destination array. Addresses within the range of the source arrays are written back if dirty.

2)CPU sends an MAUI_ADD command to the MC (along with the reg #’s) across the front-side bus.

3)MC interprets command, MAUI adds the appropriate registers and places a Write command and next two Read commands in the transaction queue.

4) Step 3 repeats for the length of the array.

CPU:

MAU_ADD

2

MC/MAUI:

4

3

DRAM:

W

R

R

W

1

3

MAU

MAUI

DRAM System

4

Memory

Controller

2



Issues: Address Mapping

Virtual Space

Memory that is Contiguous in Virtual Spacemay not be Contiguous in Physical Space

  • MAUI assumes consecutive addressing (size register)

  • MAUI operations which cross page boundaries must be split into separate operations for each page

TLB

  • Programmer will not know mapping scheme

Physical

Space

  • Result: All MAUI operations will need to be privileged instructions, accessed by programs through a system call.


Issues: Compiler Issues

  • The compiler will be responsible for deciding when MAUI instructions should be used.

  • This decision will be based on the size of the array, and if it’s likely to be in the cache, or if it’s likely to used by an instruction that isn’t implemented in the MAUI.


Issues: Task Interrupts

CPU

CPU:

Task 1

Task 2

Task 2

MEMORY

CTRL/MAUI:

Task 1

Task 2

Task 1

MEMORY:

MAU

MAUI

DRAM System

Memory

Controller


Example: maui_add I

BIU

maui_ld r1, 0

Memory

maui_ld r1, 0

Transaction Queue

Memory Controller


Example: maui_add II

BIU

maui_ld r2, 5

Memory

maui_ld r2, 5

Transaction Queue

Memory Controller


Example: maui_add III

BIU

maui_ld r3, 10

Memory

maui_ld r3, 10

Transaction Queue

Memory Controller


Example: maui_add IV

BIU

maui_ld r4, 2

Memory

maui_ld r4, 2

Transaction Queue

Memory Controller


Example: maui_add V

BIU

maui_add r3, r1, r2

Memory

maui_add r3, r1, r2

Transaction Queue

R, 0

R, 5

Memory Controller


Example: maui_add VI

BIU

Read 10

Memory

maui_add r3, r1, r2*

Transaction Queue

D1[0]

Memory Controller


Example: maui_add VII

BIU

Read 10

Memory

maui_add r3, r1, r2*

Transaction Queue

D2[0]

Memory Controller


Example: maui_add VIII

BIU

Read 10

Memory

maui_add r3, r1, r2*

Transaction Queue

R, 1

R, 6

W,10, D1[0]+D2[0]

Memory Controller


Example: maui_add IX

BIU

Write 6, D

Memory

maui_add r3, r1, r2*

Transaction Queue

D1[1]

Memory Controller


Example: maui_add X

BIU

Write 6, D

Memory

maui_add r3, r1, r2*

Transaction Queue

D2[1]

Memory Controller


Example: maui_add XI

BIU

Memory

Next Instruction

Transaction Queue

W,10, D1[1]+D2[1]

Memory Controller


Advantages & Disadvantages

  • Advantages

  • Better performance for DRAM latency bound computations

  • Lower latency to DRAM compared to CPU

  • Reduced traffic on front-side bus

  • Concurrent execution

  • Disadvantages

  • MAUI operates at a lower clock frequency

  • Increased compiler complexity

  • Increased fabrication costs (More Logic = More $$)

  • Recently used data may not be cached


Alternative Implementation

CPU

MAUI Occupies its Own Read & Write Bus

  • GOOD

  • GOOD

  • Eliminate Contention with CPU for DRAM system resources.

  • Create Circular Data flow resulting in increased performance

  • Need Specialized Triple-Ported DRAM system leading to increased production costs

X BAD

MAU

MAUI Read &

Write Bus

MAUI

Memory

Controller

DRAM System


Test Setup

  • Simulated on SimpleScalar version 4.0

  • One set of test benches with dual array operations running in both the MAUI and CPU with four different array sizes. This trial was repeated for both shared and independent memory access busses.

  • Found up to a 43% speedup!


Results
Results

Total CPU Cycles


Future Enhancements I

MAUS

MAUI

DRAM System

Memory

Controller

MAU Multi-tasking

CPU:

Task 1

Task 2

Task 3

Task 3

Task 2

MEMORY

CTRL/MAUI:

Task 1

MEMORY:

More MAUs for

Parallelism

Larger Register

File

Small

Cache


Future Enhancements II

MAU

MAUI

DRAM System

Memory

Controller

Better Pipelining

CPU:

MAU_ADD

MC/MAUI:

DRAM:

R

R

R

R

R

R

R

R

W

W

W

W

Larger Register

File to Hold

Intermediate Results


ad