1 / 36

Generic Adaptive Control

Generic Adaptive Control. Contact: Joe Hellerstein IBM Thomas J Watson Research Center hellers@us.ibm.com May 16, 2003 http://www.research.ibm.com/PM. Participants. Research Joe Bigus (ABLE) Markus Debusman (University of Applied Science, Wiesbaden Germany) Yixin Diao Frank Eskesen

lequoia
Download Presentation

Generic Adaptive Control

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Generic Adaptive Control Contact: Joe Hellerstein IBM Thomas J Watson Research Center hellers@us.ibm.com May 16, 2003 http://www.research.ibm.com/PM

  2. Participants • Research • Joe Bigus (ABLE) • Markus Debusman (University of Applied Science, Wiesbaden Germany) • Yixin Diao • Frank Eskesen • Steve Froehlich • Joe Hellerstein • Alexander Keller • Xue Lui (Univ. of Illinois) • Sujay Parekh • Lui Sha (Univ. of Illinois) • Maheswaran Surendra (team lead) • Dawn Tilbury (Univ. of Michigan) • DB2 • Randy Horman • Matt Huras • Ed Lassettre • Sam Lightstone • Kevin Rose • Adam Storm • WebSphere • Carolyn Norton • HVWS • Noshir Wadia • Eric Ye • Server Group • Lisa Spainhower

  3. URL Cache EJB threads JVM heap size Servlet reload int MaxClients Number of Threads DB Connections KeepAlive TImeout Fast response cache MaxRequestsPerChild ThreadsPerChild Max simultan. requests ListenBackLog Challenges: Skill shortage Multiple vendors, multiple standards Mapping policies to IT “knobs” Administrator Example: Configuration & Optimization in WebSphere Web Servers End Users Application Servers

  4. Project Goals • Develop a formal basis for resource management problems with dynamics (especially policy enforcement) • Demonstrate the practical value of the approach • Evangelize the approach • Book, tutorials, classes • Methodology and tools

  5. Agenda • Basics of Control Theory • Regulating concurrent users in Lotus Notes: pole placement design • Regulating utilizations in Apache • Optimizing response times in Apache • Throttling DB2 utilities • DB2 self-tuning memory • Regulating service levels in a multi-tiered eCommerce system (HotRod) • Educational efforts (book, tutorials) • Summary

  6. AutoTune Agent K=.1 K=1 K=5 Uncontrolled Slow Better Bad Control of Lotus Notes eMail Server Workload generator RPCs Administrator MaxUsers Lotus Notes Server Target Queue Length Measured Queue Length

  7. MaxUsers Notes Server Actual Queue Length Dynamic model 100 80 Predicted QL 60 40 20 0 0 20 40 60 80 100 Observed QL System Identification:Estimate Transfer Function

  8. H(z) = Closed Loop Transfer Function + Controller G(z) Notes Server N(z) Sensor S(z) - Design for “poles” of H(z) Simplified Integral Control Law K=5 K=1 Controller Design

  9. Workload generator AutoTune Agent Web Service requests Administrator MaxClients, KeepAlive TO Apache System Policies & Reports CPU Utilization, Memory Utilization Control of Apache Server Contribution: Multiple Input, Multiple Output

  10. LEGEND HTTP Inter-Process Value flow Shared Mem Process MaxClients KeepAlive SvcTime stats Get/Set interface Internal Controller mod_controller (close-up) Apache Control Enablements OS (procfs) Web Server CPU util Mem util Master External Controller GET/SET KILL SPAWN mod_controller Worker Procs RT info External RT Probe

  11. G11 G11 S + + + + 0 SISO approach assumes cross terms are negligible G21 G21 MIMO model SISO vs. MIMO 0 G12 G12 + + G22 G22 + + S Model Structure The Transfer Function Relationship G11 KA CPU Two SISO models G22 MC MEM Apache Server

  12. MIMO Model CPU CPU MEM MEM KA KA MC MC Time (s) Time (s) Model Comparison Model Prediction Two SISO Models CPU: SISO model fails because MC and KA both affect CPU, MIMO model is able to capture this relationship MEM: Both models do a good job of predicting system response

  13. Optimization of Apache Server Workload generator AutoTune Agent Web Service requests MaxClients Apache System Response Time

  14. Apache Operation New Users Close() Timeout() + New conn MaxClients TCP Accept queue Apache Heuristic: Find the smallest MaxClients that eliminates TCP queueing

  15. Apache Defaults Impact of MaxClients Response Time MaxClients

  16. d/dt Inference mechanism Fuzzy Controller  Fuzzification Defuzzification Rule base AutoTune Using Fuzzy Rules • Fuzzification • Convert numeric variables to linguistic variables • Characterized by membership functions • Rule base • IF-THEN rules • Using linguistic variables • Inference mechanism • Activate the fuzzy rules (IF) • Combine the rule actions (THEN) • Defuzzification • Convert linguistic variables to numeric variables

  17. Constructing Fuzzy Rules Rule 3 Rule 1 • Decision making: • Increment direction • Increment size Response Time (RT) Rule 4 Rule 2 MaxClients • Rule 1:IF change-in-MaxClients is poslarge and change-in-RT • is neglarge THEN next-change-in-MaxClients is poslarge • Rule 2:IF change-in-MaxClients is neglarge and change-in-RT • is poslarge THEN next-change-in-MaxUsers is poslarge • Rule 3:IF change-in-MaxClients is neglarge and change-in-RT • is neglarge THEN next-change-in-MaxUsers is neglarge • Rule 4:IF change-in-MaxClients is poslarge and change-in-RT • is poslarge THEN next-change-in-MaxUsers is neglarge

  18. Apache default Optimized setting AutoTune Controlling MaxClients on Apache

  19. New optimized setting Old optimized setting AutoTune Response to a new workload Workload changes

  20. DB2 UDB Utilities Throttling (SMART Project) Target Utilization Backup Disk, CPU Utilizations Restore UDB Engine Re-Balance Sleep Delay Server

  21. Success Is: Small Effect on User Throughput High System Utilization Gap due to reduced utilization in sleep periods 1 % Utilization Time Note: This is a longer-time averaged value than on slide 5.

  22. Workload b b Utility U a U Y + a DB2 Throttling a Single Utility • Standard PI controller tries to reach E=0 • Assume: linear effect of throttling on Y Parameters characterizing DB2 Control error Max thruput from utility + workload Thruput degradation

  23. Baseline Measurement: idling P1 Time P2 P3 • “Start” is perf output after all Pi have read new control value. • “End” is from closest output to control change Start1 End1 Start2 End2 Control Points “Loop” Throughput “Other” (Sleep) Throughput

  24. p s 1 Baseline Estimation • Over time, record sequence {(ti, pi, si)} • t = Time • p = Perf at time t • s = SleepPct at time t • Fit a “curve” to this data, to get model M • E.g., Over some fixed time interval of the past

  25. Control with disturbance Large Disturbance Small Disturbance • Baseline estimation needs work • Cannot adjust to large workload change • Controller response still OK

  26. Few minutes later… Dynamic Surge Protection Systems can go from steady state … Internet • tooverloaded without warning

  27. Resource Actions With Lead Times • Definition of lead time: • Delay from request to action taking effect • Examples • From provision a server to its servicing requesting • From de-provision a server to its being returned to a free pool • From increase size of a buffer pool to pool is filled with data

  28. Leadtime Effect of Lead Times on WAS Provisioning

  29. Leadtime Benefits of Proactive Provisioning

  30. Solution Manager On-Line Capacity Planning Adaptive Forecasting On-Demand Actions HVWS Performance Modeler A Controller Forecaster Plan Analyze On-Demand Actions Deployment Manager M M Execute E E Configuration Management BOPS Monitoring P P Monitor Knowledge Sensors Effectors 3 Element S Workload A A 2 DB2 v8.1 #WAS 1 WAS 5.0 RT Application E E Autonomic Computing: Dynamic Surge Protection

  31. CeBit Press Reuters: IBM: Software Can Predict Computer Demand C/Net: IBM offers details on autonomic software InfoWorld: IBM to show new autonomic suite at CeBIT IDG News: IBM to show off new autonomic technology InformationWeek: More Autonomic Capabilities From IBM InternetNews:IBM Spruces Up Autonomic Computing Offerings cw360.com: IBM to demo autonomic technology at CeBIT

  32. Control Theory Book • Feedback Control of Computing Systems • Wiley-Interscience • Intended audience • Computer scientist with minimal math background (geometric series) who want to apply techniques to practical problems • Control theorist looking for new applications • Status • 10 of 11 chapters at a “beta” level • Expected completion by end of June • Publication in 2004

  33. Table of Contents • Introduction (Qualitative control theory) • Model construction (statistics) • Z-Transforms and transfer functions (component models) • Block diagrams (system models) • First order systems • Higher order systems • State space models (multi-variate models) • Proportional control (feedback basics) • Other classical controllers (PID, tuning controllers) • State space feedback control (MIMO) • Advanced topics

  34. Progress Towards Project Goals • Develop/identify a formal approach • Control theory based • Demonstrate value • Lotus Notes – control w/o instabilities • Apache – simple way to optimize tuning parameters • DB2 Utilities Throttling HotRod – handling resource actions with dead times • HotRod prototype – resource actions w/lead times • Evangelize • Feedback Control of Computing Systems, Wiley-Interscience • Tutorials: Almaden, Integrated Management, Stanford/Berkeley • Classes: Columbia?, University of Michigan? • AC toolkit integration

  35. "Using Control Theory to Achieve Service Level Objectives in Performance Management," S Parekh, N Gandhi, JL Hellerstein, D Tilbury, TS Jayram, J Bigus, Real Time Systems Journal, 2002. "Feedback Control of a Lotus Notes Server: Modeling and Control Design," N. Gandhi, S. Parekh, J. Hellerstein, and D.M. Tilbury, American Control Conference, 2001. (Best paper in session.) "An Introduction to Control Theory With Applications to Computer Science," JL Hellerstein and S Parekh, ACM Sigmetrics, 2001. Using MIMO Feedback Control to Enforce Policies for Interrelated Metrics With Application to the Apache Web Serve," Y Diao, N Gandhi, JL Hellerstein, S Parekh, and DM Tilbury. Network Operations and Management, 2002. (Best paper in conference.) "MIMO Control of an Apache Web Server: Modeling and Controller Design," Y Diao, N Gandhi, JL Hellerstein, S Parekh, and DM Tilbury, American Control Conference, 2002. (Best paper in session.) "Using Fuzzy Control to Maximize Profits in Service Level Management," Y Diao, JL Hellerstein, S Parekh. Accepted to the IBM Systems Journal, 2002. "A First-Principles Approach to Constructing Transfer Functions for Admission Control in Computing Systems," JL Hellerstein, Y Diao, and S Parekh. Conference on Decision and Control, 2002. "Generic On-Line Discovery of Quantitative Models for Service Level Management," Y Diao, F Eskesen, S Froehlich, JL Hellerstein, A Keller, L Spainhower, and M Surendra, IFIP Symposium on Integrated Management, 2003. On-Line Response Time Optimization of An Apache Web Server," Yixin Diao, Xue Lui, Steve Froehlich, Joseph L Hellerstein, Sujay Parekh, and Lui Sha. To appear in International Workshop on Quality of Service, 2003. http://www.research.ibm.com/PM

More Related