Create Presentation
Download Presentation

Download Presentation
## Statistics Review – Part I

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**Statistics Review – Part I**Topics • Z-values • Confidence Intervals • Hypothesis Testing • Paired Tests • T-tests • F-tests**Statistics References**References used in class slides: • Sullivan III, Michael. Statistics: Informed Decisions Using Data, Pearson Education, 2004. • Gitlow, et. al Six Sigma for Green Belts and Champions, Prentice Hall, 2004.**Sampling and the Normal Distribution**Relative frequency histograms that are symmetric and bell-shaped are said to have the shape of a normal curve.**Sampling and the Normal Distribution**If a continuous random variable is normally distributed or has a normal probability distribution, then a relative frequency histogram of the random variable has the shape of a normal curve (bell-shaped and symmetric).**Sampling and the Normal Distribution**• Suppose that the mean normal sugar level in the population is 0=9.7mmol/L with std. dev. =2.0mmol/L - you want to see whether diabetics have increased blood sugar level • Sample n=64 individuals with diabetes mean is 0=13.7mmol/L with std. dev. =2.0mmol/L • How do you compare these values? • Standardize!**Sampling and the Normal Distribution**Reading z-scores**Sampling and the Normal Distribution**• Standardization: • Using Z-tables to evaluate sample means • Puts samples on the same scale • Subtract mean and divide by standard deviation**Sampling and the Normal Distribution**• Why do we standardize? • Enables the comparison of populations/ samples using a standardized set of values • Recall**Sampling and the Normal Distribution**The table gives the area under the standard normal curve for values to the left of a specified Z-score, zo, as shown in the figure.**Sampling and the Normal Distribution**• Population Mean=10, Standard Deviation=5 • What is the likelihood of a sample (n=16) having a mean greater than 12 (standard deviation = 5)? • What is the likelihood of a sample (n=16) having a mean of less than 8 (standard deviation = 5)?**Sampling and the Normal Distribution**Notation for the Probability of a Standard Normal Random Variable: P(a < Z < b) represents the probability a standard normal random variable is between a and b P(Z > a) represents the probability a standard normal random variable is greater than a. P(Z< a) represents the probability a standard normal random variable is less than a.**Sampling and the Normal Distribution**• Before using Z-tables, need to assess whether the data is normally distributed • Different ways • Histogram • Probability plot**Sampling and the Normal Distribution**Normal Probability Plots:**Sampling and the Normal Distribution**Normal Probability Plots: Fat pencil test to detect normality**Sampling and the Normal Distribution**Shapes of Normal Probability Plots:**Sampling and the Normal Distribution**Normal Probability Plots vs Box plots:**Sampling and the Normal Distribution**• If distribution of data is “approximately” normally distributed, use Z-tables to determine likelihood of events**Sampling and the Normal Distribution**• Can also “flip” Z-scores to determine the ‘highest’ or ‘lowest’ acceptable sample mean**Confidence Intervals**• Point estimate: value of a statistic that estimates the value of the parameter. • Confidence interval estimate: interval of numbers along with a probability that the interval contains the unknown parameter. • Level of confidence: a probability that represents the percentage of intervals that will contain if a large number of repeated samples are obtained.**Confidence Intervals**• A 95% level if 100 confidence intervals were constructed, each based on a different sample from the same population, we would expect 95 of the intervals to contain the population mean. • The construction of a confidence interval for the population mean depends upon three factors: • The point estimate of the population • The level of confidence • The standard deviation of the sample mean:**Confidence Intervals**If a simple random sample from a population is normally distributed or the sample size is large, the distribution of the sample mean will be normal with:**Confidence Intervals**95% of all sample means are in the interval: With a little algebraic manipulation, we can rewrite this inequality and obtain:**Confidence Intervals**• Steps to constructing a confidence interval: • Verify normality if n<=30. • Determine /2, x-bar, . • Find z-score for /2. • Calculate upper and lower bound.**Confidence Intervals**Histogram for z**Confidence Intervals**Histogram for t**Confidence Intervals**• Properties of the t Distribution • The t distribution is different for different values of n. • 2. The t distribution is centered at 0 and is symmetric about 0. • 3. The area under the curve is 1. The area under the curve to the right of 0 = the area under the curve to the left of 0 = 1 / 2. • 4. As t increases and decreases without bound, the graph approaches, but never equals, zero. • The area in the tails of the t distribution is a little greater than the area in the tails of the standard normal distribution. This is due to using s as an estimate introducing more variability to the t statistic. • As the sample size n increases, the density of the curve of t approaches the standard normal density curve. The occurs due to the values of s approaching the values of sigma by the law of large numbers.**Confidence Intervals**EXAMPLE: Finding t-values Find the t-value such that the area under the t distribution to the right of the t-value is 0.2 assuming 10 degrees of freedom. Hint: find t0.20 with 10 degrees of freedom.**Confidence Intervals**EXAMPLE: Finding Chi-Square Values Find the chi-square values that separate the middle 95% of the distribution from the 2.5% in each tail. Assume 18 degrees of freedom.**Confidence Intervals**EXAMPLE: Constructing a Confidence Interval about a Population Standard Deviation**Hypothesis Testing**Hypothesis testing is a procedure, based on sample evidence and probability, used to test claims regarding a characteristic of one or more populations. Selecting Hypothesis Testing methods – see next slides.**Hypothesis Testing**The null hypothesis, denoted Ho (read “H-naught”), is a statement to be tested. The null hypothesis is assumed true until evidence indicates otherwise. In this chapter, it will be a statement regarding the value of a population parameter. The alternative hypothesis, denoted, H1 (read “H-one”), is a claim to be tested. We are trying to find evidence for the alternative hypothesis. In this chapter, it will be a claim regarding the value of a population parameter.**Hypothesis Testing**There are three ways to set up the null and alternative hypothesis: 1. Equal versus not equal hypothesis (two-tailed test) Ho: parameter = some value H1: parameter some value 2. Equal versus less than (left-tailed test) Ho: parameter = some value H1: parameter < some value 3. Equal versus greater than (right-tailed test) Ho: parameter = some value H1: parameter > some value**Hypothesis Testing**THREE WAYS TO STRUCTURE THE HYPOTHESIS TEST:**Hypothesis Testing**• Two-tailed test