1 / 21

Dynamic Voltage Frequency Scaling for Multi-tasking Systems Using Online Learning

Dynamic Voltage Frequency Scaling for Multi-tasking Systems Using Online Learning. ISLPED 2007. Gaurav Dhiman Tajana Simunic Rosing Department of Computer Science and Engineering University of California, San Diego. Why Dynamic Voltage Frequency Scaling?.

neil
Download Presentation

Dynamic Voltage Frequency Scaling for Multi-tasking Systems Using Online Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dynamic Voltage Frequency Scaling for Multi-tasking Systems Using Online Learning ISLPED 2007 Gaurav Dhiman Tajana Simunic Rosing Department of Computer Science and Engineering University of California, San Diego

  2. Why Dynamic Voltage Frequency Scaling? • Power consumption is a critical issue in system design today • Mobile systems face battery life issues • High performance systems face heating issues • Dynamic Voltage Frequency Scaling (DVFS): • Dynamically scale the supply voltage level of CPU to provide “just enough” circuit speed to process the workload • An effective system level technique to reduce power consumption • Dynamic Power Management (DPM) is another popular system level technique. However focus of this work is on DVFS

  3. Previous Work • Based on task level knowledge: • [Yao95],[Ishihara98],[Quan02] • Based on compiler/app. support: • [Azevedo02],[Hsu02],[Chung02] • Based on micro-architecture level support: • [Marculescu00],[Weissel02],[Choi04], [Choi05]

  4. Workload Characterization and Voltage-Frequency Selection • No hard task deadlines in general purpose system. • Goal: Maximize energy savings while minimizing performance delay. • Key idea: • CPU-intensive tasks don’t benefit from scaling • Memory intensive tasks energy efficient at low v-f settings

  5. Workload Characterization and Voltage-Frequency Selection (contd.) • Three tasks burn_loop (CPU-intensive), mem (memory intensive) and combo (mix) run with static scaling. • burn_loop energy efficient at all settings • mem energy efficient at lowest v-f setting

  6. Measure CPU-intensiveness (µ) • CPI Stack CPIavg=CPIbase+CPIcache+CPItlb+CPIbranch+CPIstall • Use Performance Monitoring Unit (PMU) of PXA27x to estimate CPI stack components. • µ = CPIbase/CPIavg • High µ indicates high CPU-intensiveness and vice versa

  7. Dynamic Task Characterization • Dynamically estimate µ for every scheduler quantum and feed it to the online learning algorithm. • The algorithm models the CPU-intensiveness of the task and accordingly selects the best suited v-f setting. • Theoretical guarantee on converging to the best v-f setting available.

  8. Online Learning for Horse Racing Expert manages money for the race Experts Selects the best performing expert for investing his money Evaluates performance of all experts for that race

  9. Online Learning for DVFS DVFS Experts (Working Set) ….. Selected expert applied to CPU for next scheduler quantum v-f setting 1 v-f setting 2 v-f setting n Selects the best performing expert CPU DVFS Controller Evaluates performance of all experts

  10. Controller Algorithm Parameters: Initial weight vector for experts such that • Do for t = 1,2,3….. • Calculate µ. • Update weight vector of task: • wit+1 = wit . (1-(1-ß). lit • Choose expert with highest probability factor in : Sched. tick occurs • 4. Apply the v-f setting corresponding to the selected expert to the CPU. • 5. Reset and restart the PMU

  11. µ 1.0 0 0.6 0.8 0.2 0.4 0.1 0.3 0.5 0.7 0.9 Expert1 µmean Expert2 µmean Expert3 µmean Expert4 µmean Expert5 µmean Evaluation of experts (loss calculation) • Intuition: Best suited frequency scales linearly with µ. • Map task characteristics to the best suited frequency using µ-mapper. Eg: Expert1-5={100,200,300,400,500}MHz • Evaluate experts against the best suited frequency.

  12. What about Multi-tasking systems? • Possible for task with differing characteristics to execute together. • Weight vector (wt) characterizes an executing task. • Need to personalize this information at task level for accurate characterization. • Solution: store weight vector as a task level structure

  13. Performance bound on Controller Performance of the scheme converges to that of best performing expert with successive sched ticks • If ltiis the loss incurred by expert i for the scheduler quantum t: = rt.lt • Goal to minimize net loss: LG–miniLi where,rt.ltand • Net loss bounded by • Average net loss per period decreases at the rate of Let N: experts in working set, T: total number of sched ticks

  14. Implementation • Testbed • Intel PXA27x Development Platform • Linux 2.6.9 • Implemented as Loadable Kernel Module User Linux Kernel /proc file system Linux Process Manager Task Creation DVFS LKM Scheduler Tick vf setting PMU Intel PXA27x

  15. Experiments • Setup • 1.25 samples/sec DAQ • Energy savings calculated using actual current measurements • Working set: 4 v-f setting experts • Workloads: • qsort • djpeg • blowfish • dgzip

  16. Results: Single Task Environment

  17. Lower Perf Delay Higher energy savings Result: Frequency of Selection For qsort

  18. Results: Multi Task Environment

  19. Advantages of the scheme • Online learning algorithm: Provides theoretical guarantee on performance converging to that of the best performing expert. • Multi-Tasking systems: Works seamlessly across context switches. • User preference: Adapts energy savings/performance delay tradeoff with changes in user preference.

  20. Overhead • Process Creation: used lat_proc from lmbench. • 0% overhead • Context Switch: used lat_ctx from lmbench • 3% overhead with 20 processes (max supported by lat_ctx) • [choi05] cause 100% overhead in context switch times • Extremely lightweight implementation.

  21. Conclusion • Designed and implemented a DVFS technique for general purpose multi-tasking systems. • Based on online learning that provides theoretical guarantee on the convergence of overall performance to that of the best performing expert. • Provides user control over desired energy/performance tradeoff and is extremely lightweight.

More Related