1 / 42

Proactivity = Observation + Analysis + Knowledge extraction + Action planning ?

Proactivity = Observation + Analysis + Knowledge extraction + Action planning ?. András Pataricza, Budapest University of Technology and Economics. Contributors. Prof. G. Horváth (BME) I. Kocsis (BME) Z. Micskei (BME) K. Gáti (BME) Zs . Kocsis (IBM) I. Szombath (BME)

adlai
Download Presentation

Proactivity = Observation + Analysis + Knowledge extraction + Action planning ?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Proactivity = Observation + Analysis + Knowledge extraction + Action planning? András Pataricza, Budapest University of Technology and Economics

  2. Contributors Prof. G. Horváth (BME) I. Kocsis (BME) Z. Micskei (BME) K. Gáti (BME) Zs. Kocsis (IBM) I. Szombath (BME) And manyothers

  3. Therewill be nothingnewinthislecture

  4. I learnedthebasics, whenIwassoyoung

  5. But old professorsare happy of newaudience

  6. Whatcantraditionalsignalprocessinghelpforproactivity Proactivestance: Buildson foreknowledge (intelligence) and creativity to anticipate the situation as an opportunity, regardless of how threatening or how bad it looks; influence the system constructively instead of reacting

  7. Reactivity vs. proactivity • Reactive control • „acting in response to a situation rather than creating or controlling it:” • Proactive control • „controllinga situation rather than just responding to it after it has happened:”

  8. Test environment

  9. Test configuration Virtualdesktopinfrastructure ~ a few of tens of VM/host ~ a few of tens of host/cluster VSphere monitoring and supervisory control • Objective: • VM level SLA control • Capacityplanning, • Proactivemigration • „CPU-ready” metrics: • VM ready to run, but lack of resources to start

  10. Performance monitoring Detecting a possibleproblemon VM orhostlevel Failureindicatoraswell

  11. Actionstoprevent performance issue Add limitsneighbouringVMs

  12. Actionstoprevent performance issue Livemigrate VM toother (underutilized) host

  13. Measureddata (at20 sec samplingrate)

  14. Aggregation over population Statisticalclusterbehavior versus QoS over the VM population

  15. Mean of thegoalVM-metric (VM_CPU_READY) VM application: • readytorun • Resourcelack-> Performance bottleneck-> Availabilityproblem Vmwarerecommended threshold: • 5% watching • 10% typicallyaction is needed

  16. The twotraps Visual processing: Youbelieveyoureyes Automatedprocessing: youbelieveyour computer

  17. Mean of the goal VM-metric • Statistics: • Mean: 0.007 -> a goodsystem • Only 2/3 of thesamplesareerror-free • -> A badsystem • Aftereliminatingfailure-freecasesbelowthethreshold • Mean: 0.023 • -> a goodsystem Visual inspection: Lotof badvalues This is a badsystem

  18. Hostshared and usedmemoryalongthetime • Noisy… • Highfrrequencycomponentsdominate • Buttheycorrelate (93%!) • YOU DON’T SEE IT

  19. … and a host of more mundaneobservations • Computingpoweruse = CPU use × • CPU clkrate (const.) • Should be pureproportional • Correlationcoefficient: • 0.99998477434137 • Well-visible, butnumericallysuppressed • Origin???

  20. Most importantfactor: host CPU usagemean • Host CPU usagevs • VM ratio: „bad” vCPUready

  21. The battleplan

  22. Impacts of temporalresultion • Nyquist–Shannon sampling theorem: • 2× sampling frequency = bandwidth • Samplingperiod = 20 sec-> Samplingfrequency = 5 Hz-> Bandwidth = 2.5 Hz • Additionaly: • Samplingclockjitter (SW sampling) • Clock skew (distributed system) • Precision Time Protocol (PTP)(IEEE 1588-2008) • No finegranularprediction

  23. Proactivity • Proactivity needs: • Situation recognitionbased on historical experience • What is to be expected ? • Identification of the principal factors • Singlefactor /multiplefactors • Operationdomainsleadingtofailures • Boundaries • Predictor design • High failure coverage • Temporal lookahead sufficient for reaction • Design of reaction

  24. Situationsto be covered • Single VM: applicationdemand > resourceallocated • VM-host:overcommisioning, overloadduetootherVMs • VM-host-cluster

  25. Data preparation Data cleaning Data reduction

  26. Data reduction • Huge initial set of samples • Reduction • Objectsampling: Represenative measurement objects • Parameterselection/reduction: • Aggregation • Relevance • Redundancy • Temporal • Sampling • Relevance

  27. Objectsampling Inpursuit of discoveringfine-grainedbehavior and thereasonsforoutliers

  28. Subsample: ratio > 0 + random subsampling • Forpresentationpurposesonly • - Reduction of thesamplesizeto 400  • Manageability Real-life analysis: - keepenoughdatatomaintain a propercorrelationwiththeoperation

  29. Demo: Visual datadiscoverywith || coordinates

  30. Visual multifactoranalysis Visual analyticsfor an arbitrarynumber of factors • Inselberg, A: Parallel Coordinates, Visual Multidimensional Geometry and Its Applications, Springer 2009 • You can do much, much more • Redundancy reduction • Correlation analysis • Clustering • Data mining • Approximation • Optimization

  31. Predictionattheclusterlevel What ratio of theVMswillbecomeproblematic?

  32. Pinpointedintervalforone VM Situation of interest Trainingtime > Predictiontime

  33. One minute predictionbasedonalldatasources

  34. One minute prediction and classification

  35. One minute predictionwithselectedvariables

  36. Classificationerror (simplestpredictor) False alarm rateis low (dominantpattern) Featuresetselectionis criticaltodetection More is less (PROPER selectionis needed – cf. PFARM 2010) Caseseparationfordifferentsituations Long termpredictionis hard (automatedreactions)

  37. Case study – Connectivity testing in Large Networks Indynamicinfrastructurestheactiveinternodetopology has to be discoveredaswell…

  38. Large Networks • not known explicitly • too complex forconventional algorithms • Social network graph • Yahoo! Instant Messenger friendconnectivity graph * • 1.8M nodes ~4M edges • Serve as a model ofLarge Infrastructures • Typical power law network • 75% of the friendships are related to 35% of users Yahoo! Research Alliance Webscope program *ydata-yim-friends-graph-v1_0 http://research.yahoo.com/Academic_Relations

  39. Typical Model: Random graphs Random order: Ordered by degree: Limit: Graphon Yahoo! Instant Messenger dataset – Adjacency Matrix Preferential attachment graph

  40. Approx. edge density by subgraph sampling Sample size k = 35 Repeated n = 20 times 2% error 4% of the graph examined Relative error White:error < 5% Sample size (k) Number of samples (n) Random, k=4 sample • Graph with 800 nodes 320000 edges • Subgraph sampling method • Random induced subgraph • Take krandom nodes • Repeat n times

  41. Neighborhood sampling: Fault Tolerant Services Root node Redundancy? • Neighborhood sampling • take random nodes • explore neighborhood to a given depth (m) Fault Tolerant Domain Trends • No. of 3 and 4 cycles = possible redundancy • High node has many substitute nodes (e.g. load balancer) • Distribution approximated from samples are very close!

  42. Summary: proactivityneeds Thankyouforyourattention • Observations • Allrelevantcases(Stress test) • Analysis • Check of input data • Visual analysis • UNDERSTANDING • Automatedmethodsforcalculation • Knowledge extraction • Clustering (situationrecognition) • Predictor • (generalization) Action planning • Situationdefiningprincipalfactorsareindicative

More Related