1 / 28

Memory Arithmetic Unit Interface

Memory Arithmetic Unit Interface. Jason M. Meier Justin S. Teller Tom J. Keeley. Current Paradigm. CPU. Done: Task 1. CPU:. Task 1. Task 2. MEMORY CTRL:. MEMORY:. DRAM System. Memory Controller. Active Pages Implementation. Used Configurable DRAM - RADRAM.

hewitt
Download Presentation

Memory Arithmetic Unit Interface

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Memory Arithmetic Unit Interface Jason M. Meier Justin S. Teller Tom J. Keeley

  2. Current Paradigm CPU Done: Task 1 CPU: Task 1 Task 2 MEMORY CTRL: MEMORY: DRAM System Memory Controller

  3. Active Pages Implementation • Used Configurable DRAM - RADRAM • Reconfigurable logic implements various memory functions • “Active Page” consists of a page of data and a set of associated functions • Works on individual DRAM chips • Processor-centric and Memory-centric partitioning * Active Pages - Oskin, Chong, Sherwood – ISCA ‘98

  4. MAUI Implementation CPU Done: Task 1 CPU: Task 1 Task 2 MEMORY CTRL/MAUI: Task 1 MEMORY: MAU MAUI DRAM System Memory Controller

  5. MAUI Instruction Set MAU MAUI DRAM System Memory Controller MAUI_LD <m_rd>,offset(<cpu_rs>) 1)CPU sends an MAU_LOAD register command to the MC (along with the reg # and address to read) across the front-side bus. 2)MC interprets command and places a Read command in the transaction queue. 3)DRAM performs read. 4) Result is stored in appropriate register in the MAUI register file. CPU: LOAD REG 1 4 MC/MAUI: 2 3 DRAM: R 4 3 2 1

  6. MAUI Instruction Set II MAUI_LDI <rd>,<cpu_rs> 1)CPU sends an MAU_LOADI register command to the MC (along with the reg # and integer to save) across the front-side bus. 2)MC interprets command and places integer in the appropriate register in the MAUI register file. CPU: LOADI REG 1 2 MC/MAUI: DRAM: MAU MAUI 2 DRAM System Memory Controller 1

  7. MAUI Instruction Set III CPU MAUI_ADD <rd>,<rs1>,<rs2>,<rsz> 1 1)CPU invalidates addresses in the cache that fall within the range of the destination array. Addresses within the range of the source arrays are written back if dirty. 2)CPU sends an MAUI_ADD command to the MC (along with the reg #’s) across the front-side bus. 3)MC interprets command, MAUI adds the appropriate registers and places a Write command and next two Read commands in the transaction queue. 4) Step 3 repeats for the length of the array. CPU: MAU_ADD 2 MC/MAUI: 4 3 DRAM: W R R W 1 3 MAU MAUI DRAM System 4 Memory Controller 2

  8. Issues: Read & Write Locks

  9. Issues: Address Mapping Virtual Space Memory that is Contiguous in Virtual Spacemay not be Contiguous in Physical Space • MAUI assumes consecutive addressing (size register) • MAUI operations which cross page boundaries must be split into separate operations for each page TLB • Programmer will not know mapping scheme Physical Space • Result: All MAUI operations will need to be privileged instructions, accessed by programs through a system call.

  10. Issues: Compiler Issues • The compiler will be responsible for deciding when MAUI instructions should be used. • This decision will be based on the size of the array, and if it’s likely to be in the cache, or if it’s likely to used by an instruction that isn’t implemented in the MAUI.

  11. Issues: Task Interrupts CPU CPU: Task 1 Task 2 Task 2 MEMORY CTRL/MAUI: Task 1 Task 2 Task 1 MEMORY: MAU MAUI DRAM System Memory Controller

  12. Example: maui_add I BIU maui_ld r1, 0 Memory maui_ld r1, 0 Transaction Queue Memory Controller

  13. Example: maui_add II BIU maui_ld r2, 5 Memory maui_ld r2, 5 Transaction Queue Memory Controller

  14. Example: maui_add III BIU maui_ld r3, 10 Memory maui_ld r3, 10 Transaction Queue Memory Controller

  15. Example: maui_add IV BIU maui_ld r4, 2 Memory maui_ld r4, 2 Transaction Queue Memory Controller

  16. Example: maui_add V BIU maui_add r3, r1, r2 Memory maui_add r3, r1, r2 Transaction Queue R, 0 R, 5 Memory Controller

  17. Example: maui_add VI BIU Read 10 Memory maui_add r3, r1, r2* Transaction Queue D1[0] Memory Controller

  18. Example: maui_add VII BIU Read 10 Memory maui_add r3, r1, r2* Transaction Queue D2[0] Memory Controller

  19. Example: maui_add VIII BIU Read 10 Memory maui_add r3, r1, r2* Transaction Queue R, 1 R, 6 W,10, D1[0]+D2[0] Memory Controller

  20. Example: maui_add IX BIU Write 6, D Memory maui_add r3, r1, r2* Transaction Queue D1[1] Memory Controller

  21. Example: maui_add X BIU Write 6, D Memory maui_add r3, r1, r2* Transaction Queue D2[1] Memory Controller

  22. Example: maui_add XI BIU Memory Next Instruction Transaction Queue W,10, D1[1]+D2[1] Memory Controller

  23. Advantages & Disadvantages • Advantages • Better performance for DRAM latency bound computations • Lower latency to DRAM compared to CPU • Reduced traffic on front-side bus • Concurrent execution • Disadvantages • MAUI operates at a lower clock frequency • Increased compiler complexity • Increased fabrication costs (More Logic = More $$) • Recently used data may not be cached

  24. Alternative Implementation CPU MAUI Occupies its Own Read & Write Bus • GOOD • GOOD • Eliminate Contention with CPU for DRAM system resources. • Create Circular Data flow resulting in increased performance • Need Specialized Triple-Ported DRAM system leading to increased production costs X BAD MAU MAUI Read & Write Bus MAUI Memory Controller DRAM System

  25. Test Setup • Simulated on SimpleScalar version 4.0 • One set of test benches with dual array operations running in both the MAUI and CPU with four different array sizes. This trial was repeated for both shared and independent memory access busses. • Found up to a 43% speedup!

  26. Results Total CPU Cycles

  27. Future Enhancements I MAUS MAUI DRAM System Memory Controller MAU Multi-tasking CPU: Task 1 Task 2 Task 3 Task 3 Task 2 MEMORY CTRL/MAUI: Task 1 MEMORY: More MAUs for Parallelism Larger Register File Small Cache

  28. Future Enhancements II MAU MAUI DRAM System Memory Controller Better Pipelining CPU: MAU_ADD MC/MAUI: DRAM: R R R R R R R R W W W W Larger Register File to Hold Intermediate Results

More Related