1 / 19

SystemC Simulation Based Memory Controller Optimization

SystemC Simulation Based Memory Controller Optimization . Primary Author: Ashutosh Pandey Secondary Author(s): Nitin Gupta, Amit Garg Presenter: Ashutosh Pandey Company/Organization: Synopsys . Agenda. Background Challenges – System Level Memory Controller Architecture – An Example

jaguar
Download Presentation

SystemC Simulation Based Memory Controller Optimization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SystemC Simulation Based Memory Controller Optimization Primary Author: Ashutosh Pandey Secondary Author(s): Nitin Gupta, Amit Garg Presenter: Ashutosh Pandey Company/Organization: Synopsys

  2. Agenda • Background • Challenges – System Level • Memory Controller Architecture – An Example • Optimization & Configuration • Requirements • Methodology • A case Study • Conclusion

  3. Background • SDRAM controller ‘s are an integral piece of today’s System on Chip (SoC) • SDRAM access performance is one of the primary bottleneck • Memory Controller is responsible for optimizing SDRAM accesses • Across the system • Optimizing JEDEC interface utilization

  4. Challenges – System Level • Early design-space and architecture exploration • System level optimization for targeted use cases • Interconnect configuration • Memory hierarchy (Buffers/caches/ on-chip / off-chip memories) • Memory architecture optimization • Meeting bandwidth/latency requirements for each application / master in the system • System level architecture and design for the targeted use cases and applications • SDRAM hardware architecture optimization

  5. Memory Controller Architecture – An Example AXI Port Scheduler Port Arbiter AXI IF Request Multiplexer SDRAM Memory Access Controller Command Queue AXI Port RdData / WrRspDemux AXI IF AXI IF JEDECIF AXI IF A Sample Memory Controller Programmable interface

  6. Optimization & Configuration - Requirements • System level visibility (end to end latency/throughput) • Memory access co-relation with system traffic • Visualization and analysis of memory interface activity • Root cause analysis for various bottlenecks / limitations • SDRAM architecture exploration

  7. Methodology • Specify system constraints like latency, throughput or utilization • Simulate and analyze constraint violations • Analyze system characteristics to identify bottleneck(s) • Investigate to identify the root cause of the problem • Re-configure the system to address bottleneck(s) • Re-run / re-analyze refined configuration till constraints are satisfied

  8. Memory Controller Optimization – A case study CORE0 SDRAM INTERFACE AXI Bus MEMORY CONTROLLER SDRAM (DDR3) CORE1 AXI PORT INTERFACE (XPI) ARBITER SCHEDULER • Objective: • Optimize memory controller to achieve desired latencies for CORE0 • Optimization on throughput & Utilization is also possible

  9. System Level – Latency Analysis Cumulative average duration de-composed per component Average Duration for Read Transaction

  10. System Level – Latency Analysis CORE0 memory access latency Interconnect latencies • Analysis Result For Round-Robin Arbitration Scheme: • Average Delay for CORE0 transactions in Memory Controller Arbiter is 262ns. But Arbiter alone is causing a delay of 100ns • Average SDRAM access delay for CORE0 is 72ns. Delay in different components of Memory Controller for transactions from CORE0 Delays for memory access for CORE0

  11. Priority Based Arbitration for Memory Controller Arbiter CORE0 memory access latency reduces from 428ns to ~310ns. • Result For Priority based Arbitration Scheme:- • Average Delay for CORE0 transactions reduces from 428ns to 310ns. • Average SDRAM access delay for CORE0 is ~68ns while it was 72ns previously. But it is still 22% of the total Latency. Delay in different components of Memory Controller for transactions from CORE0

  12. Memory Channel Utilization Analysis for CORE0 Detailed Memory Channel Utilization for the entire system COMMAND and DATA PHASE utilization for CORE0 only HIT = 8.4 % MISS = 12.315 % For optimum architecture HIT % >> MISS %

  13. Initial Inferences • A large percentage of accesses are resulting in page misses, causing: • Increased access latency, • In-efficient usage of JEDEC interface and • Higher power consumption due to increased “precharges” and “activates” • Possible reasons for inefficient system could be • Mapping of application addresses to memory addresses • Page policy • Page crossovers • Rank crossovers

  14. Memory Channel Utilization Analysis for CORE0 Maximum MISSES are due to transaction in same Bank but different rows COMMAND and DATA PHASE utilization for Core_0 with CMD_PHASE divided into CMD setup due to different reasons for MISS.

  15. Refined Inferences • The reason for almost all Page MISS in this system is transaction on same Bank but different Page • Possible solution for resolving this • Change in traffic (not always possible) • Use a memory with bigger page sizes

  16. Increasing Page Size from 1KB to 2KB Command Setup phase drastically reduces from 41.262% to 15% Increased page size results in desired performance Zero Page Miss. All memory accesses result in Page Hit.

  17. Effect of increasing Page Size on overall Delay CORE0 memory access latency reduces from 310ns to ~222ns. Delays for memory access for CORE0 reduces from 68ns to 52 ns.

  18. Conclusions • System level performance analysis allows detection of performance problems • Detailed data path visibility allows identification of hot-spots, e.g: • Arbitration scheme & • Hit/Miss ratio of SDRAM • Analyzing the hot-spots allows identification of root causes • Systematic refinement allows creation of optimum architecture for targeted use cases

  19. Q&A

More Related