1 / 7

Mean and Standard Deviation

Mean and Standard Deviation. Another type of numerical summary for a data set Mean: The mean of a set of n observations is the arithmetic average; it is the sum of the observations divided by the number of observations, n. (p. 227)

Download Presentation

Mean and Standard Deviation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mean and Standard Deviation Another type of numerical summary for a data set • Mean: The mean of a set of n observations is the arithmetic average; it is the sum of the observations divided by the number of observations, n. (p. 227) • Formula: x1 + x2 + … + xnsum of observations n  n = the sample size Calculate the mean for Sosa’s homeruns. • Data: 15, 10, 33, 25, 36, 40, 36, 66

  2. Measures of Spread or Variability • Data set A: 0 1 2 2 3 3 3 4 4 5 6, n = 11 • Data set B: 1 2 2 3 3 3 3 3 4 5 5, n = 11 • For both, the means, medians, and quartiles are equal. Which is more spread out? x x x x x x x x x x x x x x x x xx x x x x 0 1 2 3 4 5 6 1 2 3 4 5 Data set A Data set B

  3. Variance and Standard Deviation • Variance: The modified average of the squares of the deviations of the observations from their mean. This is denoted by s2. (p. 227) • Standard Deviation: The positive square root of the variance. This is denoted by s. (p. 227) • Variance = Sum of Squared Deviations from the Mean n – 1 • Find the variance and the standard deviation for Sosa’s homeruns.

  4. Why Standard Deviation? • When using the mean for center, the standard deviation makes sense as a measure of spread. • It is a kind of average deviation of the observations from their mean. • Inter-quartile range = IQR = Q3 – Q1. This is a measure of spread when we use the 5-number summary to describe our data.

  5. Choosing a Numerical Summary (p. 232): Recalculating the statistics for Sosa and Maris without the high “outlier”, we get: Mean s Min Q1 M=Q2 Q3 Max IQR Sosa (All) 32.63 17.20 15 20 34.5 38 66 18 Sosa (-66) 27.86 11.54 15 15 33 36 40 21 Maris (All) 24.6 16.02 5 13 23 33 61 20 Maris (-61) 20.6 10.97 5 13 19.5 28 39 15 Which changed more, mean and standard deviation or 5-number summary (median and IQR)?

  6. Mean and Standard Deviation: • The mean and standard deviation are strongly affected by outliers or by the long tail of a skewed distribution. • Appropriate numerical summary when data are not skewed (symmetric) and outliers are not present. • Will work with mean and standard deviation in Chapter 13, normal distributions.

  7. Five-number Summary: • The quartiles are not strongly influenced by outliers or skewed data. • More appropriate thanx and s when the data have a skewed distribution or when outliers are present. • Salary data are usually skewed right (Billionaires92 data set in DoStat), median income is often reported rather than the mean income.

More Related