1 / 31

András Pataricza pataric@mit.bme.hu

Advanced Data Processing Methods for Dependability Benchmarking and Log Analysis BEHIND THE SCENE: A COLLECTION OF OBSERVATIONS NOT DESCRIBED IN OUR PAPERS. András Pataricza pataric@mit.bme.hu. Trends in IT . Evolution Environment Specification Technology Adaptivity Drivers :

andie
Download Presentation

András Pataricza pataric@mit.bme.hu

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Advanced Data Processing Methods for Dependability Benchmarking and Log AnalysisBEHIND THE SCENE: A COLLECTION OF OBSERVATIONS NOT DESCRIBED IN OUR PAPERS AndrásPataricza pataric@mit.bme.hu

  2. TrendsinIT • Evolution • Environment • Specification • Technology • Adaptivity • Drivers: • Run-timetaskallocation (virtualization) • Extendstocyber-physicalsystems • Functional: adaptivetaskecosystem • Context-sensitive, on-demand • Optimization of resourceuse • Computational • Communication • Energy • Traditionalpre-operationphase design movestorun-time • Off-lineassessment -> run-timecontrolconfigurationparametrization+ run-timeassessment • Assessmentcriteria • Generalizable • Reusable • Parametrizable • Coversall main aspectsneededforoperationdecisions and control • Informationsources • In vitro: benchmarking • In vivo: • fieldmesurement • Log analysis • Reusabilityconstraints • Differentlevels of detail • Anytimealgorithms • Incrementalalgorithms • Complexity • Go continous?

  3. Typicalusecase: control of infrastructure Source: AMBER teachingmaterial

  4. Self* Computing • Controlled computing • Autonomic • Virtualization • Cloud • Self-* properties • Emphasizes control loop • Relation to • control theory • signal processing Obstacle: wedealwithnetworks of (practically) blackboxes!

  5. System Management as a Control Problem Control theory applied to IT Infrastructures Collect and store data about the state of the infrastucture Controlled Plant Monitoring Sensors Service Control Objective provides Controller Decision Making Software Component Control Policy Actuator Provisioning Based on human expertise or automation deployed on Effectuate changes in the infrastucture Supervised Node Monitoring / Control Node

  6. Performability Management QoS requirementobjective metric (e.g. response time < 3 sec) metric Service metric - + metric reference provides Decision Making Set a reference value (e.g: 2.5 sec) „have some margin but do not overperform” Software Component Provisioning deployed on Reconfigure the service provider

  7. A Simple Performance Management Pattern • A very common pattern • Simplicity • Platform support • Control/rule design? • that is practical Load Service Response Spare pool or other service Load-balanced cluster

  8. A Simple Performance Management Pattern Forthe IT system management expert Service Sparepoolorother service Load-balancedcluster

  9. A Simple Performance Management Pattern Forthecontrolexpert Service Sparepoolorother service Load-balancedcluster

  10. Objective: Proactive Qualitative Performance Control Predictstate Decideonaction Service Sparepoolorother service Load-balancedcluster

  11. Empiricaldependabilitycharacterization Operation: RTdata acquisition and monitoring Design time: modeling, analysis, testing Op. decisions Design decisions Validation Validation Service (human) Service (J2EE) Services Service (Web) Service (e-mail) IMS transaction

  12. Empiricaldependabilitycharacterization Design time: modeling, analysis, testing • Challenges: • Incompleteness • Environmentsensitivity • Changetolerance • CORE ISSUE: • frominstanceassessmenttoprediction • Assessmentcriteria • Generalizable • Reusable • Parametrizable • KNOWLEDGE EXTRACTION

  13. Empiricaldependabilitycharacterization Operation: RTdata acquisition and monitoring ? • Somechallenges: • Tresholdconfiguration • Embeddingdiagnosis • Embeddingforecasting • Over/undermonitoring

  14. Approach

  15. Workflow

  16. Examples – pilot components Apache ~ Loadbalancer – UA (task) Tomcat (applicationspecific platform independent + implementationdep.) Linux OS Agent (platform + task) MySQL ~ VI Agent – add-on (platform + task)

  17. Faultstakeninto account - sidenote SRDS WS 2008. 5. 10. • Source: qualitativedynamic modelling of Apache Separatework: representativeness HOW TO GENERALIZE MEASUREMENTS?

  18. TPC-W Workload • A standard benchmark for multi-tier systems • Models an e-bookshop • Customer behavioral models: • 14 different web pages • Varying load on the system • 3 standard workload mix • Highly non-deterministic • ABSOLUTELY INAPPROPRIATE AS A PLATFORM BENCHMARK Representativeness Synthetic/naturalbenchmarks

  19. The Problem of Over-Instrumentation • Overly complex rule set/model • V&V? • Maintenance? • Control design? • A few of variables significant w.r.t. a management goal • „control theory for IT” works do not tackle this provides metric metric metric Service Service Service metric metric metric metric metric metric Software Component Software Component Software Component Software Component Software Component metric metric metric metric metric metric metric metric metric metric metric metric metric metric metric deployed on metric metric metric metric metric metric metric metric metric metric metric metric metric metric metric metric metric metric metric metric metric metric metric metric WHAT TO MEASURE: Measurable≠ To be measured VariableSelectionProblem

  20. Design phase - measurement • Objectives: • Design time: all candidate control variables • Runtime: few (selection) • Stress the system (scalability) to reveal operation domains and dynamics EDCC-5: PintérG., Madeira H., Vieira M., Pataricza A. and Majzik I."A Data Mining Approach to Identify Key Factors in Dependability Experiments"

  21. Component Metrics Gathered Database + IBM Tivoli Monitoring Agent • Phenomenological service metrics: • Average response time • Failed SQL statements % • Number of active sessions • … • „Causal” metrics: • DB2 status • Buffer pool hit ratio • Average pool read/write time • Average locks held • Rows read/write rate • … No. of database metrics: MySQL: 12, Oracle: 640, DB2: 880. • Phenomenological resource metrics: • Average CPU usage • Average disk I/O usage • …

  22. Qualitative State Definition for Prediction • Coarse control: intuitively, interval aggregate defines state • High freq. „jitter”: noise; lower level means • Aggregation interval: match prediction horizon! • Alternatively: explicitly filter out „noise” • The same intent • Presented: • Amplitude filter • Medianfiltering 0% - 25% - 45% - 100% THROUGHPUT DATA FILTERING CLASSIFICATION

  23. Design phase – variable selection • Objective: • Control variables • As few as possible, as much as needed • mRMR (minimum Redundancy Maximum Relevance ) feature selection • Cf. AUTONOMICS 2009 paper

  24. Variable Selection 160+ metrics METRICS: FULL DATASET Simplestatistics is insufficent – signalprocessing FILTERING VARIABLE SELECTION VARIABLES GOAL METRIC DATASET FILTERING 12 metricschosen Algorithm: mRMR

  25. Example Selected Metrics: Median Filtering • First 7 of 12 („value” decreases) • Tomcatload/CPU alwaysin top 3 – bottleneck • Clustercharacterization: ongoingwork

  26. Operationphase – measurement Decide on the system state based on the samples

  27. 1 Minute Prediction for Median Filtering • Qualitativepredictionaccuracy: >90% • (multipleruns; 4 hourvalidationset)

  28. Operational domains? • Normal operational state • Internal relationships tend to be linear (with some „noise”) • Saturation (over-loaded) • Objective metrics behave linear again • Physical limits of the system • Degrading state • The point of interest! • Seemingly non-linear behaviour • mRMR metric selection better • For the specific case

  29. Minimum MeanSquareError Shouldhavebeenmonotonicdecreasing

  30. Conclusions

  31. Concluding remarks • Assessmentforpredictive systemmanagement needs SIGNAL PROCESSING(at the moment more than control theory) • Shannon’s law is in there ? • Asynchronoussamplingproblem • Our experiment: design flaws • TPC-W: closed loop • Result: coupling of workload and transfer characteristics • Too strong autocorrelation of client behavior • Methodology still valid • Introducing dependability: „easy”

More Related