1 / 50

Chapter 3

3-1. Chapter 3. Data Description. Outline. 3-2. 3-1 Introduction 3-2 Measures of Central Tendency 3-3 Measures of Variation 3-4 Measures of Position 3-5 Exploratory Data Analysis. Objectives. 3-3.

Download Presentation

Chapter 3

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 3-1 Chapter 3 Data Description

  2. Outline 3-2 • 3-1 Introduction • 3-2 Measures of Central Tendency • 3-3 Measures of Variation • 3-4 Measures of Position • 3-5 Exploratory Data Analysis

  3. Objectives 3-3 • Summarize data using the measures of central tendency, such as the mean, median, mode, and midrange. • Describe data using the measures of variation, such as the range, variance, and standard deviation.

  4. Objectives 3-4 • Identify the position of a data value in a data set using various measures of position, such as percentiles, deciles and quartiles.

  5. 3-2 Measures of Central Tendency 3-6 • Astatisticis a characteristic or measure obtained by using the data values from a sample. • Aparameteris a characteristic or measure obtained by using the data values from a specific population.

  6. 3-2 The Mean (arithmetic average) 3-7 • Themeanis defined to be the sum of the data values divided by the total number of values. • We will compute two means: one for the sample and one for a finite population of values. • The mean, in most cases, is not an actual data value.

  7. 3-2 The Sample Mean 3-9 T h e s y m b o l X r e p r e s e n t s t h e s a m p l e m e a n . X i s r e a d a s " X - b a r " . T h e G r e e k s y m b o l i s r e a d a s " s i g m a " a n d i t m e a n s " t o s u m " .  X X . . . + X + + X = 1 2 n n X  . = n

  8. 3-2 The Sample Mean - Example 3-10 T h e a g e s i n w e e k s o f a r a n d o m s a m p l e o f s i x k i t t e n s a t a n a n i m a l s h e l t e r a r e 3 , 8 , 5 , 1 2 , 1 4 , a n d 1 2 . F i n d t h e a v e r a g e a g e o f t h i s s a m p l e . T h e s a m p l e m e a n i s  X 3 + 8 + 5 + 1 2 + 1 4 + 1 2 X = = n 6 5 4 = 9 w e e k s . = 6

  9. 3-2 The Population Mean 3-11 T h e G r e e k s y m b o l r e p r e s e n t s t h e p o p u l a t i o n m m e a n . T h e s y m b o l i s r e a d a s " m u " . m N i s t h e s i z e o f t h e f i n i t e p o p u l a t i o n . X X . . . + X + + = m 1 2 N N X  . = N

  10. A s m a l l c o m p a n y c o n s i s t s o f t h e o w n e r , t h e m a n a g e r , t h e s a l e s p e r s o n , a n d t w o t e c h n i c i a n s . T h e s a l a r i e s a r e l i s t e d a s $ 5 0 , , , , , a n d r e s p e c t i v e l y A s s u m e t h i s i s t h e p o p u l a t i o n T h e n t h e p o p u l a t i o n m e a n w i l l b e X N 3-2 The Population Mean-Example 3-12 0 0 0 2 0 0 0 0 1 2 0 0 0 , 9 , 0 0 0 9 , 0 0 0 . ( . )  = m 5 0 , 0 0 0 + 2 0 , 0 0 0 + 1 2 , 0 0 0 + 9 , 0 0 0 + 9 , 0 0 0 = 5 = $ 2 0 , 0 0 0 .

  11. 3-2 The Median 3-20 • When a data set is ordered, it is called adata array. • Themedianis defined to be the midpoint of the data array. • The symbol used to denote the median isMD.

  12. 3-2 The Median - Example 3-21 • The weights (in pounds) of seven army recruits are 180, 201, 220, 191, 219, 209, and 186. Find the median. • Arrange the data in order and select the middle point.

  13. 3-2 The Median - Example 3-22 • Data array:180, 186, 191,201, 209, 219, 220. • The median,MD = 201.

  14. 3-2 The Median 3-23 • In the previous example, there was an odd number of values in the data set. In this case it is easy to select the middle number in the data array.

  15. 3-2 The Median 3-24 • When there is aneven numberof values in the data set, the median is obtained by taking theaverage of the two middle numbers.

  16. 3-2 The Median -Example 3-25 • Six customers purchased the following number of magazines: 1, 7, 3, 2, 3, 4. Find the median. • Arrange the data in order and compute the middle point. • Data array: 1, 2, 3, 3, 4, 7. • The median, MD = (3 + 3)/2 = 3.

  17. 3-2 The Median -Example 3-26 • The ages of 10 college students are: 18, 24, 20, 35, 19, 23, 26, 23, 19, 20. Find the median. • Arrange the data in order and compute the middle point.

  18. 3-2 The Median -Example 3-27 • Data array:18, 19, 19, 20,20,23, 23, 24, 26, 35. • The median,MD = (20 + 23)/2 = 21.5.

  19. 3-2 The Mode 3-38 • Themodeis defined to be the value that occurs most often in a data set. • A data set can have more than one mode. • A data set is said to have no mode if all values occur with equal frequency.

  20. 3-2 The Mode -Examples 3-39 • The following data represent the duration (in days) of U.S. space shuttle voyages for the years 1992-94. Find the mode. • Data set: 8, 9, 9, 14, 8, 8, 10, 7, 6, 9, 7, 8, 10, 14, 11, 8, 14, 11. • Ordered set: 6, 7, 7, 8, 8, 8, 8, 8, 9, 9, 9, 10, 10, 11, 11, 14, 14, 14.Mode = 8.

  21. 3-2 The Mode -Examples 3-40 • Six strains of bacteria were tested to see how long they could remain alive outside their normal environment. The time, in minutes, is given below. Find the mode. • Data set: 2, 3, 5, 7, 8, 10. • There isno modesince each data value occurs equally with a frequency of one.

  22. 3-2 The Mode -Examples 3-41 • Eleven different automobiles were tested at a speed of 15 mph for stopping distances. The distance, in feet, is given below. Find the mode. • Data set: 15, 18, 18, 18, 20, 22, 24, 24, 24, 26, 26. • There aretwo modes (bimodal). The values are 18 and 24. Why?

  23. 3-2 The Midrange 3-45 • The midrangeis found by adding the lowest and highest values in the data set and dividing by 2. • The midrange is a rough estimate of the middle value of the data. • The symbol that is used to represent the midrange isMR.

  24. 3-2 The Midrange -Example 3-46 • Last winter, the city of Brownsville, Minnesota, reported the following number of water-line breaks per month. The data is as follows: 2, 3, 6, 8, 4, 1. Find the midrange.MR = (1 + 8) / 2 = 4.5. • Note:Extreme values influence the midrange and thus may not be a typical description of the middle.

  25. 3-2 The Weighted Mean 3-47 • Theweighted meanis used when the values in a data set are not all equally represented. • Theweighted meanof a variable Xisfound by multiplying each value by its corresponding weight and dividing the sum of the products by the sum of the weights.

  26. 3-2 The Weighted Mean 3-48 T h e w e i g h t e d m e a n w X w X . . . w X w X + + +  X = = 1 1 2 2 n n w w . . . w w + + +  1 2 n w h e r e w , w , . . . , w a r e t h e w e i g h t s 1 2 n f o r t h e v a l u e s X , X , . . . , X . 1 2 n

  27. 3-3 Measures of Variation - Range 3-53 • The rangeis defined to be the highest value minus the lowest value. The symbol R is used for the range. • R = highest value – lowest value. • Extremely large or extremely small data values can drastically affect the range.

  28. T h e v a r i a n c e i s t h e a v e r a g e o f t h e s q u a r e s o f t h e d i s t a n c e e a c h v a l u e i s f r o m t h e m e a n . T h e s y m b o l f o r t h e p o p u l a t i o n v a r i a n c e i s ( i s t h e G r e e k l o w e r c a s e l e t t e r s i g m a ) 2 s s m s = m 3-3 Measures of Variation - Population Variance 3-54 ( X ) - 2  , w h e r e 2 = N X i n d i v i d u a l v a l u e = p o p u l a t i o n m e a n N = p o p u l a t i o n s i z e

  29. 3-3 Measures of Variation - Population Standard Deviation 3-55 T h e s t a n d a r d d e v i a t i o n i s t h e s q u a r e r o o t o f t h e v a r i a n c e . ( X ) 2 - m  = . 2 s s = N

  30. 3-3 Measures of Variation -Example 3-56 • Consider the following data to constitute the population: 10, 60, 50, 30, 40, 20. Find the mean and variance.

  31. 3-3 Measures of Variation -Example 3-56 • Consider the following data to constitute the population: 10, 60, 50, 30, 40, 20. Find the mean and variance. • Themean = (10 + 60 + 50 + 30 + 40 + 20)/6 = 210/6 = 35. • Thevariance  2= 1750/6 = 291.67. See next slide for computations.

  32. m m X 2 X - ( X - ) 1 0 - 2 5 6 2 5 6 0 + 2 5 6 2 5 6 0 + 2 5 6 2 5 5 0 + 1 5 2 2 5 5 0 + 1 5 2 2 5 3 0 - 5 2 5 3 0 - 5 2 5 4 0 + 5 2 5 2 0 - 1 5 2 2 5 3-3 Measures of Variation -Example 3-57 X 2 m m X – ( X – ) 1 0 - 2 5 6 2 5 4 0 + 5 2 5 2 0 - 1 5 2 2 5 2 1 0 1 7 5 0 2 1 0 1 7 5 0

  33. The unbias ed estimat or of the population variance o r the samp le varianc e is a statistic whose valu e approxim ates the expected v alue of a population variance. It is deno ted by s , where 2 - å ( X X ) 2 = s , and 2 - n 1 X = sample mean n = sample size 3-3 Measures of Variation - Sample Variance 3-58

  34. 3-3 Measures of Variation - Sample Standard Deviation 3-59 The sample standard deviation is the squ are root of t he sample variance. - å ( X X ) 2 = s = s 2 - n 1

  35. s 2 3-3Shortcut Formulafor the Sample Variance and the Standard Deviation 3-60 - å å ( X ) / n 2 X 2 = - n 1 - å å ( X ) / n X 2 2 s = - n 1

  36. 3-3 Sample Variance -Example 3-61 • Find the variance and standard deviation for the following sample: 16, 19, 15, 15, 14.

  37. 3-3 Sample Variance -Example 3-61 • Find the variance and standard deviation for the following sample: 16, 19, 15, 15, 14. • X = 16 + 19 + 15 + 15 + 14 = 79. • X2 = 162 + 192 + 152 + 152 + 142 = 1263.

  38. 3-3 Sample Variance -Example 3-62    X ( ) / n X 2 2 s = 2  n 1 1263  (79) / 5 2  = 3.7 4  s = 3.7 1 . 9

  39. s s = × × CVar 100% or CVar = 100%. m X 3-3 Coefficient of Variation 3-67 • The coefficient of variation is defined to be the standard deviation divided by the mean. The result is expressed as a percentage.

  40. For samples : - X X = z . s For population s : - m X = z . s 3-4 Measures of Position — z-score 3-72 • The z score represents the number of standard deviations a data value falls above or below the mean.

  41. 3-4 z-score - Example 3-73 • A student scored 65 on a statistics exam that had a mean of 50 and a standard deviation of 10. Compute the z-score. • z = (65 – 50)/10 = 1.5. • That is, the score of 65 is 1.5 standard deviationsabovethe mean. • Above -since the z-score is positive.

  42. 3-4 Measures of Position - Percentiles 3-74 • Percentiles divide the distribution into 100 groups. • ThePkpercentile is defined to be that numerical value such that at most k% of the values are smaller than Pk and at most (100 – k)% are larger than Pk in an ordered data set.

  43. 3-4 Percentile Formula 3-75 • The percentile corresponding to a given value (X) is computed by using the formula: number of values below X + 0.5  Percentile  100% total number of values

  44. 3-4 Percentiles - Example 3-76 • A teacher gives a 20-point test to 10 students. Find the percentile rank of a score of 12. Scores: 18, 15, 12, 6, 8, 2, 3, 5, 20, 10. • Ordered set: 2, 3, 5, 6, 8, 10, 12, 15, 18, 20. • Percentile = [(6 + 0.5)/10](100%) = 65th percentile. Student did better than 65% of the class.

  45. 3-4 Percentiles - Finding the value Corresponding to a Given Percentile 3-77 • Procedure:Let p be the percentile and n the sample size. • Step 1:Arrange the data in order. • Step 2:Compute c = (np)/100. • Step 3:If c is not a whole number, round up to the next whole number. If cis a whole number, use the value halfway between cand c+1.

  46. 3-4 Percentiles - Finding the value Corresponding to a Given Percentile 3-78 • Step 4:The value of c is the position value of the required percentile. • Example: Find the value of the 25th percentile for the following data set: 2, 3, 5, 6, 8, 10, 12, 15, 18, 20. • Note: the data set is already ordered. • n = 10, p = 25, so c = (1025)/100 = 2.5. Hence round up to c = 3.

  47. 3-4 Percentiles - Finding the value Corresponding to a Given Percentile 3-79 • Thus, the value of the 25th percentile is the value X = 5. • Find the 80th percentile. • c = (1080)/100 = 8. Thus the value of the 80th percentile is the average of the 8th and 9th values. Thus, the 80th percentile for the data set is (15 + 18)/2 = 16.5.

  48. 3-4 Special Percentiles - Deciles and Quartiles 3-80 • Decilesdivide the data set into 10 groups. • Deciles are denoted by D1, D2, …, D9 with the corresponding percentiles being P10, P20, …, P90 • Quartilesdivide the data set into 4 groups.

  49. 3-4 Special Percentiles - Deciles and Quartiles 3-81 • Quartiles are denoted by Q1, Q2, and Q3 with the corresponding percentiles being P25, P50, and P75. • The median is the same as P50 or Q2.

  50. 3-4 Outliers and the Interquartile Range (IQR) 3-82 • Anoutlieris an extremely high or an extremely low data value when compared with the rest of the data values. • TheInterquartile Range, IQR = Q3 – Q1.

More Related