1 / 98

Chapter 3

Statistics for Business (Env). Chapter 3. Descriptive Statistics: Numerical Methods. Descriptive Statistics. 3.1 Describing Central Tendency 3.2 Measures of Variation 3.3 Percentiles, Quartiles and Box-and-Whiskers Displays 3.4 Covariance, Correlation, and the Least Square Line

nakia
Download Presentation

Chapter 3

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistics for Business(Env) Chapter 3 Descriptive Statistics: Numerical Methods

  2. Descriptive Statistics 3.1 Describing Central Tendency 3.2 Measures of Variation 3.3 Percentiles, Quartiles and Box-and-Whiskers Displays 3.4 Covariance, Correlation, and the Least Square Line 3.5 Weighted Means and Grouped Data (Optional) 3.6 The Geometric Mean (Optional)

  3. Describing Central Tendency • In addition to describing the shape of a distribution, want to describe the data set’s central tendency • A measure of central tendency represents the center or middle of the data • It is most typical or most representative of the entire data

  4. Parameters and Statistics • A population parameter is a number calculated from all the population measurements that describes some aspect of the population • A sample statistic is a number calculated using the sample measurements that describes some aspect of the sample

  5. Measures of Central Tendency Mean,  The average or expected value Median, Md The value of the middle point of the ordered measurements Mode, Mo The most frequent value

  6. Population X1, X2, …, XN Sample x1, x2, …, xn m Population Mean Sample Mean The Mean

  7. The Sample Mean For a sample of size n, the sample mean is defined as • and is a point estimate of the population mean  • It is the value to expect, on average and in the long run • And the amount each member gets when the total is distributed equally within the sample

  8. Mean as the balance point for a distribution Data: 2, 2, 6, 10 mean=(2+2+6+10)/4=5

  9. Data: 3, 6, 6, 9, 11 mean=(3+6+6+9+11)/5=7 What will happen to the mean if we add one more number to the data?

  10. The Median The median Md is a value such that 50% of all measurements, after having been arranged in numerical order, lie above (or below) it • If the number of measurements is odd, the median is the middlemost measurement in the ordering • If the number of measurements is even, the median is the average of the two middlemost measurements in the ordering

  11. Data: 3, 5, 8, 10, 11 median=8

  12. Data: 3, 3, 4, 5, 7, 8 median=(4+5)/2=4.5

  13. Data: 1, 2, 2, 3, 4, 4, 4, 4, 4, 5 median=4??

  14. Example for non-integer data • Example 3.1: First five observations from Table 3.1:30.8, 31.7, 30.1, 31.6, 32.1 • In order: 30.1, 30.8, 31.6, 31.7, 32.1 • There is an odd so median is one in middle, or 31.6

  15. Data: 2, 2, 2, 3, 3, 12 mean=4 median=(2+3)/2=2.5

  16. The Mode The mode Mo of a population or sample of measurements is the measurement that occurs most frequently • Modes are the values that are observed “most typically” • Sometimes higher frequencies at two or more values • If there are two modes, the data is bimodal • If more than two modes, the data is multimodal • When data are in classes, the class with the highest frequency is the modal class • The tallest box in the histogram

  17. Histogram Describing the 50 Mileages

  18. Selecting a measure of Central Tendency • Usually the mean is a good measure, because it uses every score in the distribution. • There are some extreme cases in which the mean is not representative (or calculable). Then the mode and the median are used.

  19. Mean=(10+11*4+12*3+13+100)/10=20.3 Mode=11 Median=(11+12)/2=11.5

  20. Mean – not computable Median=(12+13)/2=12.5 Mode – not meaningful Open-ended distributions A distribution is said to be open-ended when there is no upper limit (or lower limit) for one of the categories

  21. Measures of Variation/variability • Knowing the measures of central tendency is not enough • Both of the distributions below have identical measures of central tendency

  22. Measures of Variation Range Largest minus the smallest measurement Variance The average of the squared deviations of all the population measurements from the population mean Standard The square root of the variance Deviation They provide quantitative measures of the degree to which data in a distribution are spread out or clustered together.

  23. Range for discrete & continuous data • The range is the distance between the largest score (Xmax) and the smallest score (Xmin) in the distribution for discrete data. • For continuous data, you must also take into account the real limits of the maximum and minimum X values. • range = URL Xmax - LRL Xmin

  24. Population Variance and Standard Deviation • The population variance (σ2) is the average of the squared deviations of the individual population measurements from the population mean (µ) • The population standard deviation (σ) is the positive square root of the population variance

  25. Variance • For a population of size N, the population variance σ2 is: • For a sample of size n, the sample variance s2 is:

  26. Sample variability tends to underestimate the population value

  27. Standard Deviation • Population standard deviation (σ): • Sample standard deviation (s):

  28. Example: Sample Variance and Standard Deviation • Data points are: 60, 41, 15, 30, 34 • Mean is 36 • Variance is:Standard deviation is:

  29. X

  30. Percentiles & Quartiles For a set of measurements arranged in increasing order, the pth percentile is a value such that p percent of the measurements fall at or below the value and (100-p) percent of the measurements fall at or above the value • The first quartile Q1 is the 25th percentile • The second quartile (or median) is the 50th percentile • The third quartile Q3 is the 75th percentile • The interquartile range IQR is Q3 - Q1

  31. Cumulative percentages & PERCENTILES Q3 Q1 30% of the individuals have been accumulated by the time you reach the top of the interval for X=2. X=2 means that the measurement was somewhere between the real limits of 1.5 and 2.5.

  32. What is the 95th percentile? (Answer: X = 4.5.) What is the percentile rank for X = 3.5? (Answer: 70%.) What is the 50th percentile? What is the percentile rank for X = 4? estimates of these values by a standard procedure known as interpolation

  33. Using the following distribution of scores, we will use interpolation to find the 50th percentile:

  34. For the scores, the width of the interval is 5 points. For the percentages, the width is 50 points. The value of 50% is located 10 points from the top of the percentage interval. As a fraction of the whole interval, this is 10 out of 50, or 1/5 of the total interval. The 50th percentile is X = 8.5.

  35. USING INTERPOLATION TO FIND THE MEDIAN Answer: X = 3.70 is the median Notice that this is exactly the same answer we obtained using the graphic method of interpolation in Figure 3.7

  36. Md = (8+8)/2 = 8 Q3 = (9+9)/2 = 9 Q1 = (7+8)/2 = 7.5 Example: Quartiles A slightly different way to find the quartiles (without using interpolation). 20 customer satisfaction ratings: 1 3 5 5 7 8 8 8 8 8 8 9 9 9 9 9 10 10 10 10 IQR = Q3 Q1 = 9  7.5 = 1.5

  37. Five Number Summary in descriptive statistic • The smallest measurement • The first quartile, Q1 • The median, Md • The third quartile, Q3 • The largest measurement • Displayed visually using a box-and-whiskers plot

  38. Box-and-whisker plots A box and whisker plot (sometimes called a boxplot) is a graph that presents information from a five-number summary. It does not show a distribution in as much detail as a stem and leaf plot or histogram does, but is especially useful for indicating whether a distribution is skewed and whether there are potential unusual observations (outliers) in the data set.

  39. Outliers • Outliers are measurements that are very different from other measurements • They are either much larger or much smaller than most of the other measurements • Outliers lie beyond the fences of the box-and-whiskers plot • Measurements between the inner and outer fences are mild outliers • Measurements beyond the outer fences are severe outliers

  40. Box-and-Whiskers Plots • The box plots the: • first quartile, Q1 • median, Md • third quartile, Q3 • inner fences • outer fences From: Business Statistics in Practice, 5th Edition, Bowerman O’Connell Murphree,

  41. Box-and-Whiskers Plots Continued • Inner fences • Located 1.5IQR away from the quartiles: • Q1 – (1.5  IQR) • Q3 + (1.5  IQR) • Outer fences • Located 3IQR away from the quartiles: • Q1 – (3  IQR) • Q3 + (3  IQR) From: Business Statistics in Practice, 5th Edition, Bowerman O’Connell Murphree,

  42. Box-and-Whiskers Plots Continued • The “whiskers” are dashed lines that plot the range of the data • A dashed line drawn from the box below Q1 down to the smallest measurement between the inner fences • Another dashed line drawn from the box above Q3 up to the largest measurement between the inner fences

More Related