1 / 23

Design Issues for Memory Architecture in Embedded Systems

Design Issues for Memory Architecture in Embedded Systems. Chia-Lin Yang Dept. of Computer Science and Information Engineering National Taiwan University. Memory Hierarchy in an Embedded System. DRAM. DRAM. Flash-Based Storage. Flash-Based Storage. CPU. CPU. DSP.

page
Download Presentation

Design Issues for Memory Architecture in Embedded Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Design Issues for Memory Architecture in Embedded Systems Chia-Lin Yang Dept. of Computer Science and Information Engineering National Taiwan University

  2. Memory Hierarchy in an Embedded System DRAM DRAM Flash-BasedStorage Flash-BasedStorage CPU CPU DSP Digital SignalProcessing (DSP) DMA Memory Controller Memory Controller Memory or $ I-$ I-Cache D-$ D-Cache Local Memory Flash Controller Flash Controller DMA On-chip Interconnection Network On-chip Interconnection Network Memory Banks Memory Banks IP Core IP Core Shared Memory Shared Memory IP Core Multi-banking flash memory chips Multi-banking flash memory chips Private Memory Local Memory

  3. IP Core CPU DSP I-$ D-$ Private Memory On-chip Interconnection Network IP Core IP Core Shared Memory Private Memory On-Chip Memory Design Issue • Memory/Data allocation • Decide the configuration of on-chip memory architecture • Which data are allocated to on-chip memories • Consider data access frequency and life time • How much on-chip memory resources are required? • How many DMAs are sufficient? • Private or shared memory? • Private memory : lower contention, lower storage efficiency due to duplicated copy • Shared memory: higher contention, higher storage efficiency

  4. bitline bitline Vdd Virtual Gnd gated - Vdd control Gnd On-Chip Memory Issue (cont’d) • Low-leakage memory State destroying techniqueState preserving technique • When to turn a memory line into low-leakage modes • Hardware-Controlled Cache • Periodically turn off a cache line, or • Turn off a cache line when it is not accessed for a period of time • Software-managed addressable memory • Data lifetime analysis at compile time drowsy bit voltage controller drowsy (set) power line drowsy word line driver row decoder 1V SRAMs 0.3V drowsy word line wake-up (reset) Wordline gate drowsy signal

  5. DRAM Management DRAM DRAM Flash-BasedStorage Flash-BasedStorage CPU General PurposeProcessor DSP Digital SignalProcessing (DSP) DMA Memory Controller Memory Controller I-$ I-Cache D-$ D-Cache Memory or $ Local Memory Flash Controller Flash Controller On-chip Interconnection Network On-chip Interconnection Network Memory Banks Memory Banks IP Core IP Core Shared Memory Shared Memory IP Core Multi-banking flash memory chips Multi-banking flash memory chips Private Memory Local Memory

  6. Row Decoder Bank N . . . Bank 1 Sense Amplifiers (Row Buffer) Memory Address (Bank 0) Column Decoder Memory Controller Design Issue • Challenges in the memory controller design in MPSOC • Concurrent main memory accesses with different access patterns • Multiple streams, random accesses, etc • Limitations on conventional DRAM controller • Unaware of DRAM status • Lack of control over the bandwidth allocation for different PEs • Significant access latencies due to the fair scheduling policies Memory AccessScheduler Request Buffers Memory Controller

  7. Smart Memory Controller Design • Stream prefetching • Identify streams at runtime or compile-time, and perform stream prefetching • Address pre-computation • Multimedia processing units usually have regular address patterns • 1-D linear address generator: audio codec • 2-D block-based address generator: mpeg2 motion compensation, DCT

  8. Smart Memory Controller Design (cont’d) • Row buffer management • Close page policy • Precharge as soon as possible • Good for random accesses • Open page policy • Precharge as late as possible • Good for accesses with high locality • Close page or open page policy? • Different access patterns within & among tasks =>Dynamic row buffer management

  9. Time (cycles) Reference (Bank, Row, Column) (0, 0, 0) (0, 1, 0) (0, 0, 1) (0, 1, 3) P A C A C Smart Memory Controller Design (cont’d) • Memory access scheduling • Schedule accesses to different banks at the same time • Provide high utilization of DRAM bandwidth • Schedule accesses according to the state of DRAM 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 DRAM Operations: P: bank precharge (3 cycle occupancy) A: row activation (3 cycle occupancy) C: column access (1 cycle occupancy) P C C With access scheduling (17 DRAM cycles)

  10. Smart Memory Controller Design (cont’d) • Task-aware scheduling policy • Schedule the accesses of a task together Memory contention Task-aware scheduling Memory access interference computation time time time task1 task2

  11. Power Management in DRAMs • DDR/DDR2 power management • Four different power modes • Active standby, active powerdown, precharge standby, precharge powerdown • State transition events • CKE: clock enable signal • CKE must be high to serve requests • Sense amplifier w/wo data Active 1.0x mW DDR power-states Active standby 0.2x mW Data in sense amplifiers CKE low CKE high Data not in sense amplifiers Precharge standby 0.18x mW Active powerdown 0.11x mW CKE low CKE high Precharge powerdown 0.04x mW Data not in sense amplifiers

  12. DRAM Power Reduction Strategy • Open page vs. close page • Row buffer hit • Open page is more energy-efficient than close page • Reduce row access power • Row buffer miss • Open page is less energy-efficient than close page • Wasted energy due to staying in the active standby mode before next row access • Close page or open page policy? • Different access patterns within & among tasks =>Dynamic row buffer management Active Active standby Precharge standby Active powerdown Precharge powerdown

  13. DRAM Power Reduction Strategy (cont’d) • Increasing the idle period • Schedule the operations to active banks first • Request batching • Clustering the requests in the memory controller • Memory access pattern reshaping • Compiler approach - array interleaving: fine-grain power-aware data allocation • Run-time approach – popularity layout Idle time temporal concentration Sample code: for(i=0;i<n;i++){ x += A[i] + B[i]; } A[0] B[0] A[0] B[1] Array interleaving A[1] B[1] B[0] A[2] A[2] B[2] B[2] A[1] D1 D2

  14. NTU-CoSim for Memory System Evaluation Simulation Methodology Embedded Processor#1 Embedded Processor#1 Application Application … DMA SRAM ISS Micro-architectural Modeling ISS Micro-architectural Modeling RTL / Behavior RTL / Behavior SystemC Wrapper SystemC Wrapper SystemC Wrapper SystemC Wrapper Interconnection modeled at the Transactional Level in SystemC DRAM … Off-chip Bus Wrapper IP #1 IP #N DRAM-Sim SystemC Wrapper SystemC Wrapper SystemC Wrapper RTL / Behavior RTL / Behavior RTL / Behavior DRAM Controller Off-chip Bus

  15. Back-up Slides

  16. References • Energy-Aware Flash Memory Management in Virtual Memory System,L.-H. Lin, C.-L. Yang, H.-W., Tseng, to appear in IEEE Transactions on Very Large Scale Integration (VLSI) Systems • Tolerating Memory Latency Through Push Prefetching for Pointer-Intensive Applications, C.-L. Yang, A. R. Lebeck, H.-W. Tseng, and C.-H. Lee, ACM Transacations on Architecture and Code Optimization, 1(4),  445-475, December, 2004 • Software-Controlled Cache Architecture for Energy Efficiency, C.-L. Yang, H.-W. Tseng, C.-C. Ho, J.-L. Wu, IEEE Trans. Circuits Syst. Video Techn. 15(5),  634-644,  May, 2005 • Cache Leakage Control Mechanism for Hard Real-Time Systems, J.-W. Chi, Y.-J. Chen, and C.-L. Yang, in Proceedings of IEEE/ACM International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES '07), Saizburg, Austria, September, 2007. • P. R. Panda, N. D. Dutt and A. Nicolau, “On-chip vs. Off-chip Memory: The Data Partitioning Problem in Embedded Processor-based Systems,” TODAES, 2000. • G. Chen, G. Chen, O. Ozturk, and M. Kandemir, “Exploiting Inter-Processor Data Sharing for Improving Behavior of Multi Processor SoCs”, ISVLSI 2005. • K.-B. Lee, T.-C. Lin and C.-W. Jen, “An Efficient Quality-aware Memory Controller for Multimedia Platform SoC,” IEEE Transactions on Circuits and Systems for Video Technology, 2005. • S. A. McKee, W. A. Wulf, J. H. Aylor, R. H. Klenke, M. H. Salinas, Su. I. Hong, and D. A. B. Weikle, “Dynamic Access Ordering for Streamed Computations,” IEEE Transactions on Computers, 2000. • Min, S. L. and Nam, E. H., Current trends in flash memory technology: invited paper (ASP-DAC’ 06) • N. Vijaykrishnan, A. Sivasubranmaniam, V. Delaluz, M. Kandemir, and M. J. Irwin, “DRAM energy management using software and hardware directed power mode control,” HPCA, 2001 • M. Kandemir, N. Vijaykrishnan, V. Delaluz, A. Sivasubramaniam, and M. J. Irwin, “Scheduler-based DRAM energy management,” DAC, 2002 • M. Vijaykrishnan, M. J. Irwin, A. Sivasubramaniam, V. Delaluz, M. Kandemir and I. Kolcu, “Compiler-directed array interleaving for reducing energy in multi-bank memories,” VSLI Design, 2002 • H. Zeng, A. R. Lebeck, X. Fan, and C. S. Ellis, “Power aware page allocation,” ASPLOS, 2000

  17. PE 0 PE 1 PE 2 PE 3 Data A Data C Data E Data B Data D Data F PE 0 PE 1 PE 2 PE 3 A B C Interconnection network D E F Hybrid memory architecture Shared/Private Memory Allocation • Customize on-chip memory configuration by capturing the privately-accessed and shared data across processors

  18. NTU-CoSim for Memory System Evaluation Architecture Specification HW/SW Partition • System Configuration • CPU frequency, voltage • Cache architecture • (associativity, line size, capacity) • 3. Interconnection • 4. SDRAM configuration • (RAS, CAS, etc) Parameterized IP (f,w) SW In C RTL Level Timed-Fun Level HW/SW Co-Simulation • Features of NTU-CoSim • Cycle-accurate power/performance model • Detailed SDRAM model • Tunable simulation platform for easy design space exploration Power/Performance Monitor Tool Power/Performance Breakdown I/D Cache Miss Rate Component Activation Bus Utilization/Contention 18

  19. Target Architecture DSP1 DMA MEM MEM MEM DSP2 MEM IP #2 DMA IP #1 I/O 1 Embedded Processor1 DMA SRAM I/O 2 On-chip bus off-chip bus On-chip bus off-chip bus DDR/SDRAMMem Controller Off-chip Bus Interface Embedded Processor2 SDRAM 19

  20. Power Management Challenge • State transition overhead • Precharge powerdown  active standby ~= 5ns • Overhead is relatively small, but not negiligable • Read ~= 75ns, Write ~= 85ns Active Power Active standby Precharge standby Active powerdown 5ns Time Break-even time Resynchronization overhead Precharge powerdown = Idle power Low power

  21. Flash-Based Storage in an Embedded System DRAM DRAM DRAM Flash-BasedStorage Flash-BasedStorage Flash-BasedStorage CPU General PurposeProcessor General PurposeProcessor DSP Digital SignalProcessing (DSP) Digital SignalProcessing (DSP) DMA Memory Controller Memory Controller Memory Controller I-$ I-Cache I-Cache D-$ D-Cache D-Cache Memory or $ Local Memory Local Memory Flash Controller Flash Controller Flash Controller On-chip Interconnection Network On-chip Interconnection Network On-chip Interconnection Network Memory Banks Memory Banks Memory Banks IP Core IP Core IP Core Shared Memory Shared Memory Shared Memory IP Core Multi-banking flash memory chips Multi-banking flash memory chips Multi-banking flash memory chips Private Memory Local Memory Local Memory

  22. Challenges in Adopting Flash-Based Solid State Drive • Unique features on flash memory • Write-once, out-place update, garbage collection, limited write/erase cycles • The need to revisit designs assuming disk as the storage • e.g, virtual memory system • Reliability issue • With the technology shrinking to smaller geometries, there comes the quality and reliability issues of the small geometry in addition to the existing flash memory reliability issues • Adopting ECC, wear leveling • Single flash chip bandwidth < disk bandwidth • Multiple flash chip system Flash Memory Controller Host Inter Flash memory chips

  23. Multiple flash chip system • Imbalanced bandwidth between host interface bandwidth and flash memory bus/chip bandwidth • Host interface: 3Gb/s • Single flash chip: 10MB/s • Flash memory bus: 33MB/s • To increase bandwidth of flash subsystem • Parallel reads/writes • Data interleaving in each flash chip • Problem: data locality is destroyed • inefficient garbage collection • Parallelized garbage collection: Flash Memory Controller Flash Memory Bus Host Inter Flash memory chips Free blocks Victim block Buffer Buffer Read live data Write live data ::::::::::: Read live data Write live data :::: Erase the block Erase the block Flash Chips Flash Chips

More Related