1 / 38

Measures of Central Tendency: Describing the Shape and Center of Data

This chapter discusses descriptive statistics, numerical methods, and measures of central tendency, including mean, median, and mode. It also covers measures of variation, percentiles, quartiles, and box-and-whiskers displays.

hunterb
Download Presentation

Measures of Central Tendency: Describing the Shape and Center of Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 3 Descriptive Statistics: Numerical Methods

  2. Descriptive Statistics 3.1 Describing Central Tendency 3.2 Measures of Variation 3.3 Percentiles, Quartiles and Box-and-Whiskers Displays

  3. Describing Central Tendency • In addition to describing the shape of a distribution, want to describe the data set’s central tendency • A measure of central tendency represents the center or middle of the data • “Center” means typical or regular in this setting.

  4. Parameters and Statistics • A population parameter is a number calculated from all the population measurements that describes some aspect of the population • A sample statistic is a number calculated using the sample measurements that describes some aspect of the sample

  5. Measures of Central Tendency Mean,  The average or expected value Median, Md The value of the middle point of the ordered measurements Mode, Mo The most frequent value

  6. The Mean Population X1, X2, …, XN Sample x1, x2, …, xn m Population Mean Sample Mean

  7. The Sample Mean For a sample of size n, the sample mean is defined as • and is a point estimate, one-number estimate, of the population mean  • It is the value to expect, on average and in the long run

  8. Example 3.1: The Car Mileage Case • Example 3.1: Sample mean for first five car mileages from Table 3.1:30.8, 31.7, 30.1, 31.6, 32.1

  9. The Median The median Md is a value such that 50% of all measurements, after having been arranged in numerical order, lie above (or below) it • If the number of measurements is odd, the median is the middlemost measurement in the ordering, or (n+1)/2 th value in the ordered list. • If the number of measurements is even, the median is the average of the two middlemost measurements in the ordering, or the average of n/2 th and (n/2 +1) th values in the ordered list.

  10. Example: Car Mileage Case • Example 3.1: First five observations from Table 3.1:30.8, 31.7, 30.1, 31.6, 32.1 • In order: 30.1, 30.8, 31.6, 31.7, 32.1 • There is an odd so median is one in middle, or 31.6

  11. The Mode The mode Mo of a population or sample of measurements is the measurement that occurs most frequently • Modes are the values that are observed “most typically” • Sometimes higher frequencies at two or more values • If there are two modes, the data is bimodal • If more than two modes, the data is multimodal • When data are in classes, the class with the highest frequency is the modal class • The tallest box in the histogram

  12. Suggested Exercise • Page 122 3.3, 3.4

  13. Mean Vs Median • Data set: 1, 2, 3, 4 Mean= 2.5; median=2.5 • Data set: 1, 2, 3, 4 , 100 Mean= 22 median= 3

  14. Mean Vs Median • Compare with the mean, the median is resistant to extreme values. The median can resist the influence of the extreme values better than the mean.

  15. Measures of Variation • Data set 1: 4, 5, 6, 7, 8 • Mean • Data set 2: 1, 4, 6, 8, 11 • Mean

  16. Data set 1: 4, 5, 6, 7, 8 Mean Data set 2: 1, 4, 6, 8, 11 Mean

  17. Measures of Variation • Knowing the measures of central tendency is not enough • Both of the distributions below have identical measures of central tendency

  18. Measures of Variation Range Largest minus the smallest measurement Variance The average of the squared deviations of all the population measurements from the population mean Standard The square root of the variance Deviation

  19. The Range • Largest minus smallest • Measures the interval spanned by all the data • For Figure 3.13, largest repair time is 5 and smallest is 3 • Range is 5 – 3 = 2 days

  20. Variance • For a population of size N, the population variance σ2 is: • For a sample of size n, the sample variance s2 is:

  21. Standard Deviation • Population standard deviation (σ): • Sample standard deviation (s):

  22. Example: Chris’s Class Sizes This Semester • Data points for a populaton are: 60, 41, 15, 30, 34 • Mean is µ=36 • Variance is: Standard deviation is:

  23. Example: Sample Variance and Standard Deviation • Example 3.7: sample data for first five car mileages from Table 3.1 are 30.8, 31.7, 30.1, 31.6, 32.1 • The sample mean is 31.26

  24. An alternative formula for the sample variance

  25. Data points are: 60, 41, 15, 30, 34 • Mean is 36, • Sample variance is:

  26. Percentiles, Quartiles, and Box-and-Whiskers Displays For a set of measurements arranged in increasing order, the pth percentile is a value such that p percent of the measurements fall at or below the value and (100-p) percent of the measurements fall at or above the value • The first quartile Q1 is the 25th percentile • The second quartile (or median) is the 50th percentile • The third quartile Q3 is the 75th percentile • The interquartile range IQR is Q3 - Q1

  27. Example: Quartiles Md = (8+8)/2 = 8 Q3 = (9+9)/2 = 9 Q1 = (7+8)/2 = 7.5 • 20 customer satisfaction ratings: • 1 3 5 5 7 8 8 8 8 8 8 9 9 9 9 9 10 10 10 10 • i=25/100*20=5 • i=75/100*20=15 • (5.25, 11.25) IQR = Q3 Q1 = 9  7.5 = 1.5

  28. Calculating Percentiles • Arrange the measurements in increasing order • Calculate the index i=(p/100)n where p is the percentile to find • (a) If i is not an integer, round up and the next integer greater than i denotes the pth percentile(b) If i is an integer, the pth percentile is the average of the measurements in the i and i+1 positions

  29. Percentile Example (p=10th Percentile) • i=(10/100)12=1.2 • Not an integer so round up to 2 • 10th percentile is in the second position so 11,070 • Q1? i=(25/100)*12=3,

  30. Percentile Example (p=25th Percentile) • i=(25/100)12=3 • Integer so average values in positions 3 and 4 • 25th percentile (18,211+26,817)/2 or 22,514

  31. Five Number Summary • The smallest measurement • The first quartile, Q1 • The median, Md • The third quartile, Q3 • The largest measurement • Displayed visually using a box-and-whiskers plot

  32. Box-and-Whiskers Plots • The box plots the: • first quartile, Q1 • median, Md • third quartile, Q3 • inner fences • outer fences

  33. Box-and-Whiskers Plots Continued • Inner fences: IQR= Q3–Q1 • Located 1.5IQR away from the quartiles: • Q1 – (1.5  IQR) • Q3 + (1.5  IQR) • (Q1 – (1.5  IQR), Q3 + (1.5  IQR) ) • (7.5-1.5*1.5, 9+1.5*1.5) • (5.25, 11.25) • Outer fences • Located 3IQR away from the quartiles: • Q1 – (3  IQR) • Q3 + (3  IQR)

  34. Box-and-Whiskers Plots Continued • The “whiskers” are dashed lines that plot the range of the data • A dashed line drawn from the box below Q1 down to the smallest measurement • Another dashed line drawn from the box above Q3 up to the largest measurement

  35. Box-and-Whiskers Plots Continued

  36. Outliers • Outliers are measurements that are very different from other measurements • They are either much larger or much smaller than most of the other measurements • Outliers lie beyond the fences of the box-and-whiskers plot: less than Q1 – (1.5  IQR) or greater than Q3 + (1.5  IQR) • Measurements between the inner and outer fences are mild outliers • Measurements beyond the outer fences are severe outliers

More Related