1 / 37

Graham Hellestrand Mahdi Seddighnazhad James Brogan VaST Systems Technology Corporation

Profiles in Power: Optimizing Real-Time Systems for Power As well as Speed (IPS), Response Latency and Cost. Graham Hellestrand Mahdi Seddighnazhad James Brogan VaST Systems Technology Corporation. Wireless Trends. Key Focus: Low Cost, Power Reduction and Increased Features

tsuda
Download Presentation

Graham Hellestrand Mahdi Seddighnazhad James Brogan VaST Systems Technology Corporation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Profiles in Power: Optimizing Real-Time Systems for PowerAs well as Speed (IPS), Response Latency and Cost Graham Hellestrand Mahdi Seddighnazhad James Brogan VaST Systems Technology Corporation

  2. Wireless Trends • Key Focus: Low Cost, Power Reduction and Increased Features • Competitive positionsmust be maintained • Product complexity isincreasing • Hardware growth • Software growth • Critical Program Schedules • Market windows must be hit • Revenue opportunitiesmust be captured • Burden has moved to designand development CONFIDENTIAL

  3. The Metric Power Reducing in power regardless of the effect on other optimization factors is of limited value. • Example: • Saving 50% power • While Yielding: • 50% speed hit and/or • Failure to meet response latency specifications Is likely to be a unacceptable in the marketplace CONFIDENTIAL

  4. Implications • Real-time software architecture and development needs to be subject to a rigorous optimization of an appropriate objective function, based on: • Power • Speed • Event response latencies • Examples: interrupts, exceptions • Cost – approximated by: • Cache sizes • Memory sizes and hierarchies CONFIDENTIAL

  5. System Architecture & OptimizationSoftware ArchitecturePlatform ArchitectureReal-world interaction architectureProcessor µ-architecture+Empirical experimentation

  6. Architecture Addresses the Whole System Buses & Bridges Devices VPMs & Peripheral Devices Structures Architecture RF, Mechanical, Physical Virtual Prototype Sub- systems Evaluation, Exploration Systems Platform Appli- cations Behav. Middleware, Comms Software Hardware RTL Operating Systems Device Drivers Physical CONFIDENTIAL

  7. Optimization effect:Software Architecture & Design1st Order Effect on system performance

  8. Architecture VSP Hardware Software Software Architecture & Design UML, Simulink, C, C++, … Create Compile Assemble • Monitor prototype internals • Cache hits/misses • Bus transactions • Processor performance • Memory usage • Interrupt latency • Trigger hardware and software debuggers • Example usage: analyze processor and platform power • Make intelligent tradeoffs between power, performance and cost Link HW Load VaST VSP Debug + Monitor SW IDE CONFIDENTIAL

  9. Optimization effect:Platform Architecture & Design1st Order Effect on system

  10. D ROM P ROM StarCore SC1400 Virtual Processor Model ARM1176 P1 Virtual Processor Model ARM1156 P2 Virtual Processor Model I Cache D Cache StdBus I/F StdBus I/F A H B Buses I Cache D Cache I Cache D Cache StdBus I/F StdBus I/F StdBus I/F StdBus I/F StdBus Bridge StdBus Bridge StdBus Bridge StdBus Bridge Arb. Ctrl DRAM Console 1 Console 2 Memory Block Memory Block UART UART Shared Memory P1 Memory P2 Memory TIMER TIMER INTC INTC Memory Block Memory Block P1 Devices P2 Devices Typical 3G Cell Phone Controller3 processors, 12 buses, 10 bus bridges, 70 peripherals VaST Virtual System Prototype (model) CONFIDENTIAL

  11. Optimization effect:Real-world Interaction Architecture1stOrder Effect on system

  12. Engine control unit Real-time Engine Monitoring AutomotivePower-train Control Igniting fuel under pressure at the wrong part of the cylinder stroke Results in spectacular destruction of the engine (and maybe the experimenter) CONFIDENTIAL

  13. Optimization of:Processor µ-architecture2nd / 3rdOrder Effect(apart from caches & buffering)

  14. Generic Single Pipeline Operation CONFIDENTIAL

  15. Pre-Silicon System Design Process

  16. Business Requirements Software Functional Requirements Translate Architectand Test Designand Test Developand Test CoMET System Level Design Tool METeor Executable System Specification + Virtual System Platform Integrate & CoVerify Silicon Hardware Platform +Embedded System Software + Integrate & CoVerify VSP Executable System Architecture (VSP) CoMET Hardware Translate Architectand Test Designand Test Developand Test Integrated & Optimized Final Product Concurrent, Iterative S/W – H/W Development Architecture + + System Development Process CONFIDENTIAL

  17. System architecture  Virtual Prototype (timing accurate) + Software || Hardware design  Virtual System Prototypes (high speed) Electronic System Design Process Evaluate architectures of candidate designs using real software applications Architecture Virtual Prototype Hardware development Software development Develop behavioral-level executable specification and verify RTL Design, develop and debug software before silicon or hardware prototypes are available CONFIDENTIAL

  18. So What Performance can we get from a Timing Accurate VSPon a Single Processor Host?That is how useful are these things?

  19. ARM926E VPM 1 ARM926E VPM 1 ARM926E VPM 1 CONFIG & CONTROL CONFIG & CONTROL CONFIG & CONTROL INST INST INST DATA DATA DATA GP INTC ARM GP INTC ARM GP INTC ARM Bridge Bridge Bridge Bridge Bridge Bridge Bridge Bridge Bridge GP MEM GP MEM GP MEM GP TIMER GP TIMER GP TIMER GP UART GP UART GP UART GP MEM GP CONSOLE GP MEM GP CONSOLE GP MEM GP CONSOLE VSP Computation PerformanceMultiple Independent Platforms CONFIDENTIAL

  20. Results - Computational Performance Study Platform dominated study: As Virtual System Prototypes (VSPs), with the processors having software and data resident in cache, are switched into the simulation (Pink line), the sharing of host cycles between the processor and the hardware (purple line) of each VSP stays in proportion for each additional VSP activated. The frequent switching between VSPs, each having a processor and hardware that also share the host cycles, also increases the Simulation overhead (blue line). CONFIDENTIAL

  21. Application software (Vocoder), on INT will shuffle data from DRAM to MemBanks Application software (Viterbi), on INT will shuffle data from DRAM to MemBanks SC1200 SC1200 DMA Master Core Master DMA Master Core Master Slave OCP Channel Wrapper Slave DRAM (2MB) AHB AHB DMA Traffic Generator approx. 60% utilization DMA Traffic Generator DMA Traffic Generator 32 32 32 DMA Traffic Generator Bridges Bridges Bridges Bridges Bridges Bridges 32 32 32 32 32 32 every 300-500 cycles AHB like transactions Mem Bank 0 (512KB) Mem Bank 1 (512KB) Mem Bank 2 (512KB) Mem Bank 3 (512KB) Mem Bank 4 (512KB) Mem Bank 5 (512KB) VSP with TLM Bus Matrix CONFIDENTIAL

  22. Results – Bus Matrix Performance Communications and computation sharing study: This is a multi-variable study measuring simulation performance of a system having transactions of various sizes (1024, 64 and 4 bytes) being transmitted at a high rate over a complex switch to which are attached two SC1200 processors. Initially no processors are activated and each is then successively activated. The bar chart is best read as a sequence of 3 pairs (Transaction / Headroom (MIPS) – into the slide. As transactions become progressively smaller, there is relatively more work to be performed by the model to transmit and receive them. The Headroom measure is the amount of available host cycles for further simulation. As more processor are activated and the transaction size is reduced, the available headroom diminishes. CONFIDENTIAL

  23. Study 4: VSP Interrupt HandlingAutomotive Benchmark, Feb 2004 Capability or a VSP under interrupt loads: This is a relatively simple experiment that shows the performance of a single processor Virtual System Prototype under increasingly stressful rates of processing asynchronous events (interrupts). Even at high interrupt rates (every 3,750 cycles is equivalent to a 12 cylinder engine running at 20,000 RPM and producing an interrupt every 10 degrees of crank-angle) the VPM is capable of simulating high software execution rates (4 MIPS) while handling the interrupts. CONFIDENTIAL

  24. Back to Building Systems

  25. It is all about optimization, stupid! 32-bit MPU Clock Gen. Serial Comms Interrupt Controller A2D Convert RAM ROM General I/O Bus Interface DMA Virtual bus InterruptTimer Flash Virtual Prototypes Physical Mechanical, RF, .. Physical Prototype Specifications H-type Respecifier Very Smart System Instantiator Software Power Consumption Asynch-Signal Response Latency Speed CONFIDENTIAL

  26. Typical 2.5G Wireless Systemsbuilt using aVirtual System Prototype

  27. ARM Debugger TeakLite Debugger SG2 Virtual COM Port I Q Signals Virtual PrototypingMobile Handset Development Full System Development Architecture, Software, Hardware, I/F CONFIDENTIAL

  28. ARM Debugger TeakLite Debugger SGOLD2 Architecture Keypad Test Bench Linux OS Execution + MPEG4 Encoding Camera Input Camera Test Bench Win32 Terminal for all Serial IOVirtual COM Ports LCD Display QCIF/CIF Wireless VP Benefits • Early Design Feedback in Semiconductor Development Process • Enabled 1st Pass Silicon Success • Eliminated Costly 2nd Silicon • Provided Complete SoftwareDevelopment Environment 9 Months Prior to Silicon • Resulted in a Better QualityProduct 5 Months EarlierThan Standard DevelopmentProcess • Advanced Debugging • Multi-Core debugging • ARM926 (ADS 1.2) • TeakLite* (DSP group) • Complete system visibility • S-GOLD programmer model • Bus status & Interrupt behavior • System cycle count, monitors • I/O Test Bench Support • Open Model Extension CONFIDENTIAL

  29. Concurrent Bus Activity CONFIDENTIAL

  30. Optimizing forPower and Performance Separated Functions

  31. General Form of Multi-Objective Optimization Equation:Characterize an objective function in terms of events directly measurable from the VSP Problem: Huge volume of data some of which may be highly correlated with other data – leading to multiple counting and unreliability in composite measures. CONFIDENTIAL

  32. A Simple Power Function for a Full Platform CONFIDENTIAL

  33. Resolving the Weights for the Power Function CONFIDENTIAL

  34. Single Task Working Set vs Cache Size Analysis CONFIDENTIAL

  35. Linux Boot - Memory Hierarchy Analysis(I&D cache + bus + bus bridge + Mem (DDR | SDR) Analysis CONFIDENTIAL

  36. Replace Cache with Simple External Buffer for a Known Task Set CONFIDENTIAL

  37. The Message • System optimization needs a composite, complex optimization function of functions operating on a complete (model of a) system. The constituent functions include: • Power • Speed • Response deadline compliance • Cost • …… A rigorous scientific methodology is required for empirical experimentation CONFIDENTIAL

More Related