Statistics Review

Statistics Review – Part I Topics • Z-values • Confidence Intervals • Hypothesis Testing • Paired Tests • T-tests • F-tests

Statistics References References used in class slides: • Sullivan III, Michael. Statistics: Informed Decisions Using Data, Pearson Education, 2004. • Gitlow, et. al Six Sigma for Green Belts and Champions, Prentice Hall, 2004.

Sampling and the Normal Distribution Relative frequency histograms that are symmetric and bell-shaped are said to have the shape of a normal curve.

Sampling and the Normal Distribution If a continuous random variable is normally distributed or has a normal probability distribution, then a relative frequency histogram of the random variable has the shape of a normal curve (bell-shaped and symmetric).

Sampling and the Normal Distribution

Sampling and the Normal Distribution • Suppose that the mean normal sugar level in the population is 0=9.7mmol/L with std. dev. =2.0mmol/L - you want to see whether diabetics have increased blood sugar level • Sample n=64 individuals with diabetes mean is 0=13.7mmol/L with std. dev. =2.0mmol/L • How do you compare these values? • Standardize!

Sampling and the Normal Distribution Reading z-scores

Sampling and the Normal Distribution • Standardization: • Using Z-tables to evaluate sample means • Puts samples on the same scale • Subtract mean and divide by standard deviation

Sampling and the Normal Distribution • Why do we standardize? • Enables the comparison of populations/ samples using a standardized set of values • Recall

Sampling and the Normal Distribution The table gives the area under the standard normal curve for values to the left of a specified Z-score, zo, as shown in the figure.

Sampling and the Normal Distribution

Sampling and the Normal Distribution • Population Mean=10, Standard Deviation=5 • What is the likelihood of a sample (n=16) having a mean greater than 12 (standard deviation = 5)? • What is the likelihood of a sample (n=16) having a mean of less than 8 (standard deviation = 5)?

Sampling and the Normal Distribution Notation for the Probability of a Standard Normal Random Variable: P(a < Z < b) represents the probability a standard normal random variable is between a and b P(Z > a) represents the probability a standard normal random variable is greater than a. P(Z< a) represents the probability a standard normal random variable is less than a.

Sampling and the Normal Distribution • Before using Z-tables, need to assess whether the data is normally distributed • Different ways • Histogram • Probability plot

Sampling and the Normal Distribution Normal Probability Plots:

Sampling and the Normal Distribution Normal Probability Plots: Fat pencil test to detect normality

Sampling and the Normal Distribution Shapes of Normal Probability Plots:

Sampling and the Normal Distribution Normal Probability Plots vs Box plots:

Sampling and the Normal Distribution • If distribution of data is “approximately” normally distributed, use Z-tables to determine likelihood of events

Sampling and the Normal Distribution • Can also “flip” Z-scores to determine the ‘highest’ or ‘lowest’ acceptable sample mean

Confidence Intervals • Point estimate: value of a statistic that estimates the value of the parameter. • Confidence interval estimate: interval of numbers along with a probability that the interval contains the unknown parameter. • Level of confidence: a probability that represents the percentage of intervals that will contain if a large number of repeated samples are obtained.

Confidence Intervals • A 95% level  if 100 confidence intervals were constructed, each based on a different sample from the same population, we would expect 95 of the intervals to contain the population mean. • The construction of a confidence interval for the population mean depends upon three factors: • The point estimate of the population • The level of confidence • The standard deviation of the sample mean:

Confidence Intervals If a simple random sample from a population is normally distributed or the sample size is large, the distribution of the sample mean will be normal with:

Confidence Intervals

Confidence Intervals 95% of all sample means are in the interval: With a little algebraic manipulation, we can rewrite this inequality and obtain:

Confidence Intervals • Steps to constructing a confidence interval: • Verify normality if n<=30. • Determine /2, x-bar, . • Find z-score for /2. • Calculate upper and lower bound.

Confidence Intervals Histogram for z

Confidence Intervals Histogram for t

Confidence Intervals • Properties of the t Distribution • The t distribution is different for different values of n. • 2. The t distribution is centered at 0 and is symmetric about 0. • 3. The area under the curve is 1. The area under the curve to the right of 0 = the area under the curve to the left of 0 = 1 / 2. • 4. As t increases and decreases without bound, the graph approaches, but never equals, zero. • The area in the tails of the t distribution is a little greater than the area in the tails of the standard normal distribution. This is due to using s as an estimate introducing more variability to the t statistic. • As the sample size n increases, the density of the curve of t approaches the standard normal density curve. The occurs due to the values of s approaching the values of sigma by the law of large numbers.

Confidence Intervals EXAMPLE: Finding t-values Find the t-value such that the area under the t distribution to the right of the t-value is 0.2 assuming 10 degrees of freedom. Hint: find t0.20 with 10 degrees of freedom.

Confidence Intervals EXAMPLE: Finding Chi-Square Values Find the chi-square values that separate the middle 95% of the distribution from the 2.5% in each tail. Assume 18 degrees of freedom.

Confidence Intervals EXAMPLE: Constructing a Confidence Interval about a Population Standard Deviation

Hypothesis Testing Hypothesis testing is a procedure, based on sample evidence and probability, used to test claims regarding a characteristic of one or more populations. Selecting Hypothesis Testing methods – see next slides.

Hypothesis Testing The null hypothesis, denoted Ho (read “H-naught”), is a statement to be tested. The null hypothesis is assumed true until evidence indicates otherwise. In this chapter, it will be a statement regarding the value of a population parameter. The alternative hypothesis, denoted, H1 (read “H-one”), is a claim to be tested. We are trying to find evidence for the alternative hypothesis. In this chapter, it will be a claim regarding the value of a population parameter.

Hypothesis Testing There are three ways to set up the null and alternative hypothesis: 1. Equal versus not equal hypothesis (two-tailed test) Ho: parameter = some value H1: parameter  some value 2. Equal versus less than (left-tailed test) Ho: parameter = some value H1: parameter < some value 3. Equal versus greater than (right-tailed test) Ho: parameter = some value H1: parameter > some value

Hypothesis Testing THREE WAYS TO STRUCTURE THE HYPOTHESIS TEST:

Hypothesis Testing • Two-tailed test

Statistics Review – Part I