performance and capacity with analytics n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Performance and Capacity with Analytics PowerPoint Presentation
Download Presentation
Performance and Capacity with Analytics

Loading in 2 Seconds...

play fullscreen
1 / 23

Performance and Capacity with Analytics - PowerPoint PPT Presentation


  • 134 Views
  • Uploaded on

Performance and Capacity with Analytics. Dan Kimball – Cloud Infrastructure Architect - VMware. Agenda. Introduction What is Analytics? Real-world examples 3 rd generation monitoring with analytics Success stories Bringing it all together Closing remarks and Q&A. What is Analytics?.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Performance and Capacity with Analytics' - kezia


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
performance and capacity with analytics

Performance and Capacity with Analytics

Dan Kimball – Cloud Infrastructure Architect - VMware

agenda
Agenda
  • Introduction
  • What is Analytics?
  • Real-world examples
  • 3rd generation monitoring with analytics
  • Success stories
  • Bringing it all together
  • Closing remarks and Q&A
slide3

What is Analytics?

Analytics is the application of computer technology, operational research, and statistics to solve problems in business and industry.

A simple definition of analytics is "the science of analysis". A practical definition, however, would be that analytics is the process of developing optimal or realistic decision recommendations based on insights derived through the application of statistical models and analysis against existing and/or simulated future data.

Source: Wikipedia - http://en.wikipedia.org/wiki/Analytics

slide4

Real-world examples of Analytics

  • Clinical decision support systems
  • Experts use predictive analysis in health care primarily to determine which patients are at risk of developing certain conditions, like diabetes, asthma, heart disease, and other lifetime illnesses.
  • Customer retention
  • With the number of competing services available, businesses need to focus efforts on maintaining continuous consumer satisfaction, rewarding consumer loyalty and minimizing customer attrition.
  • Fraud detection
  • Fraud is a big problem for many businesses and can be of various types: inaccurate credit applications, fraudulent transactions (both offline and online), identity thefts and false insurance claims.
  • Risk management
  • When employing risk management techniques, the results are always to predict and benefit from a future scenario. The Capital asset pricing model (CAP-M) "predicts" the best portfolio to maximize return
  • Underwriting
  • Many businesses have to account for risk exposure due to their different services and determine the cost needed to cover the risk. For example, auto insurance providers need to accurately determine the amount of premium to charge to cover each automobile and driver.
1 st generation tools up down floods of alerts
“1st Generation” Tools, Up/down… Floods of alerts

1st Generation - Event-Centric, Hard-Threshold Based

3/4/08 16:45 Host 1 processingTimeServ The Processing Time Service Level on process… n/a n/an/a

3/4/08 16:45 Host 1 Processor_Table 0 Processor 0 is at 87.0%. A CPU Bottleneck is….. n/a 0 Windows_System

3/4/08 16:44 Host 2 System_Table The number of hardware interrupts per second… n/a 0 Windows_System

3/4/08 16:30 Host 2 Processor_Table 1 Processor 1 is at 84.0%. A CPU Bottleneck is …. n/a 0 Windows_System

3/4/08 16:25 n/a responseTimeServ… The Response Time Service Level on Toadwor.. n/a n/an/a

3/4/08 16:20 n/a processingTimeServ.. The Processing Time Service Level on Prospec.. n/a n/an/a

3/4/08 16:08 Host 1 Ora_Sql_Hogs_Alert Oracle: SFPRD A CPU Hog has been detected n/a OraSF Oracle

3/4/08 16:08 Host 1 Ora_Sql_Hogs_Alert Oracle: SFPRD SQL with high I/O has been de.. n/a OraSF Oracle

3/4/08 14:40 n/a responseTimeServ… The Response Time Service Level on Siebel Sa.. n/a n/an/a

3/4/08 14:20 n/a processingTimeServ.. The Processing Time Service Level on Siebel S. n/a n/an/a

3/4/08 14:39 Host 3 Top_CPU_Table Process ‘siebsh.exe(svc-siebel, 6780)’: is cons.. n/a 0 Windows_System

3/4/08 14:39 Host 3 Top_CPU_Table Process ‘siebsh.exe(svc-siebel, 7940)’: is cons.. n/a 0 Windows_System

3/4/08 14:15 n/a responseTimeServ… The Response Time Service Level on Toadwor.. n/a n/an/a

3/4/08 14:15 n/a processingTimeServ.. The Processing Time Service Level on Prospec.. n/a n/an/a

3/4/08 13:55 Host 1 Ora_Sql_Hogs_Alert Oracle: SFPRD A CPU Hog has been detected n/a OraSF Oracle

3/4/08 16:45 Host 1 processingTimeServ The Processing Time Service Level on process… n/a n/an/a

3/4/08 16:45 Host 1 Processor_Table 0 Processor 0 is at 87.0%. A CPU Bottleneck is….. n/a 0 Windows_System

3/4/08 16:44 Host 2 System_Table The number of hardware interrupts per second… n/a 0 Windows_System

3/4/08 16:30 Host 2 Processor_Table 1 Processor 1 is at 84.0%. A CPU Bottleneck is …. n/a 0 Windows_System

3/4/08 16:25 n/a responseTimeServ… The Response Time Service Level on Toadwor.. n/a n/an/a

3/4/08 16:20 n/a processingTimeServ.. The Processing Time Service Level on Prospec.. n/a n/an/a

3/4/08 16:08 Host 1 Ora_Sql_Hogs_Alert Oracle: SFPRD A CPU Hog has been detected n/a OraSF Oracle

3/4/08 16:08 Host 1 Ora_Sql_Hogs_Alert Oracle: SFPRD SQL with high I/O has been de.. n/a OraSF Oracle

DATA FEEDS

DATA FEEDS

DATA FEEDS

DATA FEEDS

2 nd generation tools don t handle change false positives
“2nd Generation” Tools, don’t handle change > false positives

2nd Generation - Rudimentary Baselining, Rules/Templates, Charting

3/4/08 16:45 Host 1 processingTimeServ The Processing Time Service Level on process… n/a n/a n/a

3/4/08 16:45 Host 1 Processor_Table 0 Processor 0 is at 87.0%. A CPU Bottleneck is….. n/a 0 Windows_System

3/4/08 16:44 Host 2 System_Table The number of hardware interrupts per second… n/a 0 Windows_System

3/4/08 16:30 Host 2 Processor_Table 1 Processor 1 is at 84.0%. A CPU Bottleneck is …. n/a 0 Windows_System

3/4/08 16:25 n/a responseTimeServ… The Response Time Service Level on Toadwor.. n/a n/a n/a

3/4/08 16:20 n/a processingTimeServ.. The Processing Time Service Level on Prospec.. n/a n/a n/a

3/4/08 16:08 Host 1 Ora_Sql_Hogs_Alert Oracle: SFPRD A CPU Hog has been detected n/a OraSF Oracle

3/4/08 16:08 Host 1 Ora_Sql_Hogs_Alert Oracle: SFPRD SQL with high I/O has been de.. n/a OraSF Oracle

3/4/08 14:40 n/a responseTimeServ… The Response Time Service Level on Siebel Sa.. n/a n/a n/a

3/4/08 14:20 n/a processingTimeServ.. The Processing Time Service Level on Siebel S. n/a n/a n/a

3/4/08 14:39 Host 3 Top_CPU_Table Process ‘siebsh.exe(svc-siebel, 6780)’: is cons.. n/a 0 Windows_System

3/4/08 14:39 Host 3 Top_CPU_Table Process ‘siebsh.exe(svc-siebel, 7940)’: is cons.. n/a 0 Windows_System

3/4/08 14:15 n/a responseTimeServ… The Response Time Service Level on Toadwor.. n/a n/a n/a

3/4/08 14:15 n/a processingTimeServ.. The Processing Time Service Level on Prospec.. n/a n/a n/a

3/4/08 13:55Host 1 Ora_Sql_Hogs_Alert Oracle: SFPRD A CPU Hog has been detected n/a OraSF Oracle

3/4/08 16:45Host 1 processingTimeServ The Processing Time Service Level on process… n/a n/a n/a

3/4/08 16:45Host 1 Processor_Table 0 Processor 0 is at 87.0%. A CPU Bottleneck is….. n/a 0 Windows_System

3/4/08 16:44Host 2 System_Table The number of hardware interrupts per second… n/a 0 Windows_System

3/4/08 16:30Host 2 Processor_Table 1 Processor 1 is at 84.0%. A CPU Bottleneck is …. n/a 0 Windows_System

3/4/08 16:25n/a responseTimeServ… The Response Time Service Level on Toadwor.. n/a n/a n/a

3/4/08 16:20n/a processingTimeServ.. The Processing Time Service Level on Prospec.. n/a n/a n/a

3/4/08 16:08Host 1 Ora_Sql_Hogs_Alert Oracle: SFPRD A CPU Hog has been detected n/a OraSF Oracle

3/4/08 16:08Host 1 Ora_Sql_Hogs_Alert Oracle: SFPRD SQL with high I/O has been de.. n/a OraSF Oracle

3 rd generation monitoring with analytics it s here

3rd generation monitoring with analytics – It’s here!

Dan Kimball – Cloud Infrastructure Architect - COE - VMware

real time performance management
Real-Time Performance Management

3rd Generation – Holistic, Real Time Analytics

Flexible INTEGRATIONto many data sources

EnterpriseSCALABILITY

I can put all my monitoring tools to good use and get better performance analytics.

Patented performanceANALYTICS

Powerful informationDASHBOARDS

smart alert using analytics to understand a bnormalities a cross the application
Smart Alert™ - Using Analytics to understand abnormalities across the application

App Data (e.g., Hyperic, SCOM)

User Experience (e.g., HP RUM, etc.)

Business Application

Smart Alert Generation (“When”)

!

SMART ALERT

vCenter(Private/Public Cloud)

Network Data (e.g., Ionix IPAM/PM, etc.)

Storage (EMC, NetApp, IBM)

future state evolution of learning and predictive analysis
Future State – Evolution of Learning and Predictive Analysis

Monitoring Server O/S Metrics – CPU, RAM, Disk, I/O, etc.

Monitoring App Layer Metric – JVM, DB Connections, etc.

Monitoring Business Metrics

  • My brain is understanding the health of my body. Should I do anything?
  • Your Brain Understands Context:
  • If my heart rate and temperature are increasing I should go to the hospital
  • If I’m tired, rest more
  • If I tire easily, start exercising!

Slide 10

Muscular

Skeletal

Cardio Vascular

Nervous

Respiration

Heart Rate

Temperature

Monitoring UserEx Metrics

  • vCenter Operations is understanding the health of my enterpriseby analyzing millions of measurements. Should I do anything?
  • vCenter Operations Understands Context:
  • Act based on urgency of emerging problems
  • Act based on real-time performance dashboards
  • Act based on long term correlations and trends
data agnostic approach to data collection
Data Agnostic Approach to Data Collection
  • Accepts any time series data (examples)
    • Server OS
    • Server App layer (i.e., IIS, Oracle, WebSphere, etc.)
    • Network
    • Storage
    • User Experience
    • Transactional
    • Business Data
    • Change Events
  • Minimal Required Fields (4)
    • Object Name, Metric Name, Value, Timestamp
  • Data Extraction - *not* an analytic question
    • No rules/templates to Write and Maintain
    • No thresholds or KPI’s to figure out
learn normal behavior and identify abnormalities
Learn Normal Behavior and Identify Abnormalities
  • Doesn’t assume IT data has a normal bell-shaped distribution
  • Sophisticated Analytics – 9 different algorithms working together
  • Learns your dynamic ranges of “Normal” without templates
  • Learns patterns of behavior and identifies abnormalities

GRAY BAR

Upper and Lower band of Dynamic Threshold - “Normal”

BLUE LINE

Metric’s Current Value

RED BAR

Breached Dynamic Threshold – “Abnormal”

understanding progressive change

Actual

Build

Standard

Build

New

Build

Understanding Progressive Change
  • Type: Unplanned, Uncontrolled
  • User Changes
  • Unapproved Admin Change
  • Exploits
  • Shadow IT
  • Origin: End Users, Developers, Suppliers

80,000

CIs

  • Type: Planned, Controlled
  • Updates and fixes
  • Infrastructure changes
  • Component patches
use cases

Use Cases

Dan Kimball – Cloud Infrastructure Architect - VMware

the role of operations management
The Role of Operations Management

Ensure and RestoreService Levels

Optimize forEfficiency and Cost

Utilization / forecast

Slow performance

!

Problem

Maintenance

Reclaim capacity

Rollback change

Config issue

Orchestrate changes

Reactive

Proactive

business benefits delivered by 3 rd generation monitoring
Business benefits delivered by 3rd generation monitoring

ComprehensiveVisibility

IntelligentAutomation

ProactiveManagement

vCenter Operations Management Suite

  • Higher QoS
  • Fewer Incidents
  • Tool Consolidation
  • Compliance
  • Faster MTTR
  • Improved Collaboration
  • Resource Utilization

“Troubleshooting time reduced by 50%”

“Notified the storage team before they were even aware of an issue.”

“We’ll be able to reduce our monitoring tools from over 300 to about 30.”

TUI Infotec

Maximus

Kaiser Permanente

customer success it operations
Customer Success: IT Operations

Solve performance issues before end-users are affected and reduce total alerts

  • Before
  • 400 critical alerts/hour
  • End-user complaints alerted IT to the problem
  • End-users impacted (avg. 2 hours/outage)
  • 12 Level-2 engineers on bridge call to address problem
  • After
  • 20 alerts/MONTH
  • 3 hours advanced warning of slowdown w/root cause
  • NO end-user impact
  • 1 Level-2 Engineer and 1 DBA to address problems

Learn Normal

Smart Alerting

Root Cause

bringing it all together

Bringing it all together

Dan Kimball – Cloud Infrastructure Architect - COE - VMware

focused solutions
Focused Solutions
  • Performance and Capacity analytics with root cause analysis
  • Configuration, Change, Compliance Management with Patching
  • Application Dependency Mapping
deeper performance and capacity management for the cloud
Deeper performance and capacity management for the Cloud
  • Overview
  • Gain performance and capacity management across the Enterprise
  • Cover every silo of the environment
  • Breakdown the silos in the org.
  • Reduce overall MTTR/MTTI
  • Keep an eye on your cloud service providers
  • Reclaim precious compute resources
  • Gain unprecedented visibility into how your infrastructure behaves

Service Owner

performance and capacity for vdi
Performance and Capacity for VDI
  • Overview
    • End-to-end monitoring of infrastructure
    • Included PCoIPperformance monitoring
    • Desktop, Pool and User Contexts
    • Self-Learning performance analytics
    • Automated alerts
    • Remediation guidance
  • Benefits
    • Get to root cause quickly; Reduce MTTI
    • Respond proactively before support calls
    • Remediate quickly and accurately
    • Improve resource utilization by identifying over-provisioned hardware and track down bottlenecks
slide23

Thank you for your time!

Additional reading material:

Quantifying Information Data Loss through Data Aggregation

http://www.vmware.com/files/pdf/vcenter/VMware-vCenter-Operations-Quantifying-Information-Loss-Data-Aggregation-WP-EN.pdf

How Normal is Your Data:

http://www.vmware.com/files/pdf/vcenter/VMware-vCenter-Operations-How-Normal-Is-Your-Data-WP-EN.pdf