1 / 2

SCIMA: Software Controlled Integrated Memory Architecture for HPC

T b : CPU busy time T l : Latency stall time T t : Throughput stall time. NIA. ・・・. Memory. (DRAM). Network. SCIMA: Software Controlled Integrated Memory Architecture for HPC. Background Memory wall problem Conventional Cache is not good in HPC unwilling line conflict

Download Presentation

SCIMA: Software Controlled Integrated Memory Architecture for HPC

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Tb: CPU busy timeTl: Latency stall timeTt: Throughput stall time NIA ・・・ Memory (DRAM) Network SCIMA: Software Controlled Integrated Memory Architecture for HPC • Background • Memory wall problem • Conventional Cache is not good in HPC • unwilling line conflict • fixed size of Off-Chip Memory access • Solution: SCIMA(Software Controlled Integrated Memory Architecture) • strategy: software controllability • addressableOn-Chip Memoryin addition to conventional cache • On-Chip Memory and cache are reconfigurable • explicit data transfer between On-Chip Memory and Off-Chip Memory by page-load/page-store instruction • burst transfer and stride transfer are supported advantages of SCIMA • Schematic View ALU FPU register On-Chip Mem. cache L1 cache On-ChipMem. Overview of SCIMA Address Space

  2. reusability not-reusable reusable consecutive-ness use On-Chip Mem. as a stream buffer reserve On-ChipMem. for reused data consecutive reserve On-Chip Mem. for reused data stride use On-Chip Mem. as a stream buffer reserve On-Chip Mem. for reused data irregular use cache ::: latency-stall reduction by burst transfer latency & throughput-stall reduction by stride transfer throughput-stall reduction by software controllability Latency/Throughput stall is reduced for wide variety of data access Throughput ratio=2:1Latency=40 Throughput ratio=8:1Latency=160 future technology trend Throughput ratio=2:1Latency=40 Throughput ratio=8:1Latency=160 future technology trend • SCIMA provides various data placement and utilization scheme according to the characteristics of data access SCIMA: Experimental Results • Evaluation Results Throughput Ratio = Ratio between on-chip and off-chip memory throughputLatency = Memory access latency for off-chip memory (latency for the first data) FT QCD SCIMA is robust to large throughput ratio and long memory access latency caused by current technology trend of CPU-memory speed gap

More Related