Loading in 2 Seconds...
Loading in 2 Seconds...
Did Something Change? Using Statistical Techniques to Interpret Service and Resource Metrics. Frank Bereznay. Did Something Change?
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
In a perfect world, one would always know the answer to that question. Unfortunately, nobody works in a perfect world. This paper / presentation will explore statistical techniques used to look for deviations in metrics that are due to assignable causes as opposed to the period to period variation that is normally present. Hypothesis Testing, Statistical Process Control, Multivariate Adaptive Statistical Filtering, and Analysis of Variance will be compared and contrasted. SAS code will be used to perform the analysis. Exploratory analysis techniques will be used to build populations for analysis purposes.Abstract
The average message rate is 15 per minute.
Create an alternative hypothesis that contradicts the null hypothesis.
The average message rate is not 15 per minute.Hypothesis Testing
Calculation of the t statistic
with 9 (N-1) Degrees of Freedom
The official statement is:
At a 95% confidence level, the data is insufficient for us to state the mean of the population is not 15 for the 24 hour period being examined.
Important point, the contrary is not necessarily true.
This does not prove in any way the population mean is 15.Hypothesis Testing
Underlying population does not need to be normally distributed.
The population must be randomly sampled.Hypothesis Testing
W. Edwards Deming
SPC is conceptually similar to Hypothesis Testing, but computationally different.
No a priori data point is needed.
Data is sub-grouped for calculation purposes.
SPC and Hypothesis Testing can produce different results for the same set of data.Statistical Process Control
Sample Order Output
The data does not need to be normally distributed.
Proper sub grouping of the data is fundamental to the technique.
Sampling plan must be random and cover the boundaries of the population being examined.Statistical Process Control
Subject of 1995 CMG Paper by same name.
Practitioner’s approach to create a statistical detection technique which addresses the unique challenges of the interval driven time series datasets used by Computer Resource Management Professionals.Multivariate Adaptive Statistical Filtering (MASF)
Filling a bottle with wine.
Manufacturing a roll of paper.
Commercial computer workloads are generally not repeatable processes (and that is an understatement!).Why MASF?
Increases number of samples per collection period.
14MASF – Aggregation Policies
Tuesday thru Thursday
Ideally 10 to 20 points per reference set.
Longer term datasets subject to Time Series influences which distorts metrics.
This technique should be included in every Resource Management Specialist’s toolkit!MASF Summary
Best explained by why it was developed.
Agricultural work in the late 1800’s to improve crop yields.
Plot of land was divided into multiple areas and subjected to different treatments.
Test was developed to compare the effects of these different treatments on crop yield.Analysis of Variance (ANOVA)
Can’t prove any mean is different – end of test.
Accepting Alternative Hypothesis has an interesting twist.
One or more of the means are different – but which one(s) is/are different?ANOVA
John Tukey developed a technique to group means of an ANOVA test when the Alternative Hypothesis is accepted.
We now have a way to take a set of multiple data populations and segment them into like groups.ANOVA
Mon Tue Wed Thur Fri
Sufficient data is needed to obtain 6 to 10 observations for each treatment.
Need to be sensitive to correlated data.
Sampling plan must be random and cover the boundaries of the population being examined.ANOVA
Comparing data from multiple days to see if it is the same or different.
Use it as a clustering technique to build aggregated data groups for a MASF analysis.
Multiple factor ANOVAs can look at multiple treatments (factors) at the same time.
Day of week and hour of day.
A very powerful tool that should be in everybody’s toolkit!ANOVA
The MASF technique will be used to look for deviations.
The first three weeks will be used to be the reference set to examine the fourth weeks data.
ANOVA will be used to create Aggregation Policies to cluster the hourly data.Midrange Server Example
Identified two non-overlapping groups.
Monday and Friday.
Tuesday, Wednesday and Thursday.Midrange Server Example
Pick up these tools at your nearest CMG meeting. They do take some getting used to, but are worth the learning curve.
Hypothesis Testing, Statistical Process Control, MASF and ANOVA
Be very wary of your data.
The Time Series Data we routinely work with is a very complicated multi-dimensional dataset.
Get to know you data. The better you know the data, the better you know your workload.Summary
I. Trubin’s CMG papers on application of MASF and variance based statistical detection techniques.
2001 – Exception Detection System, Based on Statistical Process Control Concept.
2002 – Global and Application Levels Exception Detection System, Based on MASF Technique
2003 – Disk Subsystem Capacity Management, Based on Business Drivers, I/O Performance Metrics and MASF
2004 – Mainframe Global and Workload Levels Statistical Exception Detection System, Based on MASF
2005 – Capturing Workload Pathology by Statistical Exception Detection System.Summary