1 / 20

Optimization Of Power Consumption For An ARM7-BASED Multimedia Handheld Device

Optimization Of Power Consumption For An ARM7-BASED Multimedia Handheld Device. Hoseok Chang; Wonchul Lee; Wonyong Sung Circuits and Systems, 2003. ISCAS '03. Proceedings of the 2003 International Symposium on , Volume: 5 , 25-28 May 2003 Pages:V-105 - V-108 vol.5 Presenter: Chin-Chi Hu.

finola
Download Presentation

Optimization Of Power Consumption For An ARM7-BASED Multimedia Handheld Device

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Optimization Of Power Consumption For An ARM7-BASED Multimedia Handheld Device Hoseok Chang; Wonchul Lee; Wonyong Sung Circuits and Systems, 2003. ISCAS '03. Proceedings of the 2003 International Symposium on , Volume: 5 , 25-28 May 2003 Pages:V-105 - V-108 vol.5 Presenter: Chin-Chi Hu

  2. Abstract • We have developed a multimedia handheld educational device and optimized the current consumption not only by employing several software optimization techniques but also by using dynamic clock frequency scaling scheme (DFS). Although the ARM7 CPU employed does not support operating voltage scaling, the controlling of the operating frequency helps reducing the current consumption in the idle time and results in up to 25% of power reduction in the system level. The CPU operation frequency is determined by profiling the multimedia program components, which include LZW (Lempel-Ziv Welch) image decompression, MP3 audio decoding, CELP based speech decoding, speech recognition and ADPCM. Especially, it is shown that the time for LZW decompression is proportional to the image size rather than the size of the compressed file. The CPU load becomes almost full, between 80 to 95%, after applying the DFS.

  3. What’s the problem? • Multi-Tasking operating system and dynamic frequency scaling • analysis the current consumption for system • Software optimization techniques • improve software to reduce numbers of instruction and clock cycle • CPU load estimation • the CPU load for executing each software components • Results and optimization

  4. Introduction • A low power multimedia handheld device • only two AA-size batteries • It was needed to optimize DSP programs • MP3 decoding • LZW(Lempel-Ziv Welch) decompression • speech recognition • Aspect • ARM7 specific feature • optimization of software components • lowering the CPU clock frequency • minimizes the idle time

  5. System architecture • Speaking partner • ARM7TDMI 60MHz CPU • 8KB cache • graphic LCD controller • synchronous DRAM controller • IIS interface • 8 channel of 10 bit ADC • 128KB NOR flash for system ROM • NAND flash and SMC (smart media card) for program ROM • SSFDC (solid state floppy disk card) and USB for read / write

  6. System architecture Speaking Partner

  7. Current consumption • The CPU drains some power even when the CPU load is very small although the CPU is mostly in the idle state • It is advantageous for power reduction to use the lowest possible clock frequency. • The estimation of the minimum clock frequency for a real-time implementation is needed

  8. Current consumption This figure shows that the dynamic frequency scaling scheme is more efficient than the constant frequency operation with idle state when the load condition is low

  9. Current consumption • Current consumption at each hardware block (CPU load is 10%)

  10. Software optimization • ARM7TDMI processor has characteristics for implementing DSP algorithms • large number of registers • most of the instructions can be executed conditionally • 32 bit barrel shifter • block load and store instructions are supported • ARM7TDMI processor has a relatively simple data path, where the hardware multiplier only has the accuracy of 32*8 bits

  11. Software optimization • MP3 decoding algorithm • C language based high level optimization • assembly language based low level optimization • optimized by the conditional execution of ARM7TDMI processor

  12. Software optimization • block data transfer • is used for load (LDM) or store (STM) of any subset of currently visible registers to/from sequential memory • No block data transfer of 15 32-bit registers • from registers to sequential memory • 14S+2N+1I cycles • From registers to memory using the store instruction (STR) • (1S+1N+1I)*15 • S :sequential cycles • N :non-sequential cycles • I :internal cycles

  13. Software optimization

  14. Software optimization • Optimization for speech recognition • 16bit multiplications instead of 32 bit multiplications • 8% of cycle time reduction • employed several software optimization techniques • loop fusion • loop unrolling • post increment/decrement conversion • total execution time is reduced to about 30~45%

  15. CPU load estimation • The load for MP3 decoding is dependent on the bit rate and sampling clock frequency • The CPU load with 60MHz • 56kbps 22.05kHz : 10% • 32kbps 22.05kHz : 9.6% • 32kbps 16kHz : 7% • The load for CELP decoding is almost constant • 18% of the 60MHz CPU load

  16. CPU load estimation Processing time of LZW according to the number of pixels Processing time of LZW according to the compressed data size

  17. CPU load estimation Execution time prediction of each software component

  18. Experimental result 478mA(optimized) / 542(original current) = 88.2%

  19. Experimental result • No change the clock frequency of the CPU, which would be a more aggressive power optimization approach which paying the delay for PLL relocking

  20. Concluding • A dynamic frequency scaling scheme is employed in order to reduce the CPU power consumption, which shows that 20% of system power saving can be achieved • The power analysis show that the current consumed at the DRAM is almost equal to that of the CPU core, which means that reducing cache miss is most important for lowering power consumption • The current can be further reduced, without any significant change in the power reduction algorithm • Employ a CPU that supports the dynamic voltage scaling (Intel’s Xscale)

More Related