1 / 51

Descriptive Statistics

Descriptive Statistics. the everyday notions of central tendency. Usual Customary Most Standard Expected normal Ordinary Medium commonplace. NY Times, 10/24/ 2010 Stories vs. Statistics By JOHN ALLEN PAULOS. Overview. What are descriptive statistics? A bit of terminology/notation

ailish
Download Presentation

Descriptive Statistics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Descriptive Statistics

  2. the everyday notions of central tendency • Usual • Customary • Most • Standard • Expected • normal • Ordinary • Medium • commonplace NY Times, 10/24/ 2010 Stories vs. Statistics By JOHN ALLEN PAULOS

  3. Overview • What are descriptive statistics? • A bit of terminology/notation • Measures of Central Tendency • Mean, Mode, Median • Measures of Variability • Ranges, Standard Deviations • The Normal Curve

  4. Terminology/Notation • A data distribution = A set of data/scores (the whole thing) • 1, 2, 4, 7 • X = A raw, single score (i.e., 2 from above) • ∑ = Summation (added up) • ∑X = 14 (each individual score added up) • n = sample size (distribution size, or number of scores) • n = 4 (from above)

  5. Descriptive Statistics • Descriptive statistics are the side of statistics we most often use in our everyday lives • Realize that most observations/data are too “large” for a human to take in and comprehend – we must “reduce” them • How can we summarize what we see? • Example – Grades/Registrar

  6. Descriptive Statistics • Descriptive statistics = describing the data • n = 50, a test score of 83% • Where does it fit in the class?? Making sense out of chaos

  7. Descriptive Statistics • Transform a set of numbers or observations into indices that describe or characterize the data • “Summary statistics” • A large group of statistics that are used in all research manuscripts • Even the most complex statistical tests and studies start with descriptive statistics

  8. Descriptive Statistics Measurement Scales Relationship • Scatterplot • Correlation • Regression • Nominal • Ordinal • Interval • Ratio Descriptive Statistics Graphic Portrayals Variability Central Tendency • Range • Standard deviation • Standardized scores • Frequencies • Histograms • Bar graphs • Normal distribution • Mean • Median • Mode

  9. Descriptive Statistics • Descriptive statistics usually accomplish two major goals: • 1) Describe the central location of the data • 2) Describe how the data are dispersed about that point • In other words, they provide: • 1) Measures of Central Tendency • 2) Measures of Variability

  10. Measure of Central Tendency • What SINGLE summary value best describes the CENTRAL location of an entire distribution? • Mode: which value occurs most often • Median: the value above and below which 50% of the cases fall (the middle; 50th percentile) • Mean: mathematical balance point; arithmetic/mathematical average

  11. Mode • Most frequent occurrence • What if data were? • 17, 19, 20, 20, 22, 23, 25, 28 • 17, 19, 20, 20, 22, 23, 23, 28 • Problem: set of numbers can be bimodal, or trimodal, depending on the scores • Not a stable measure • Ex. 17, 19, 20, 22, 23, 28, 28

  12. Median • Rank numbers, pick middle one • What if data were…? • 17, 19, 20, 23, 23, 28 • Solution: add up two middle scores, divide by 2 (=21.5) • Best measure in asymmetrical distribution (i.e. skewed), not sensitive to extreme scores • Ex. 17, 19, 20, 23, 23, 428

  13. Mean = X • Add up the numbers and divide by the sample size (the number of numbers!) • Try this one… • 2,3,5,6,9 • 2+3+5+6+9 = 25 / 5 = 5 • (Usually) best measure of the three –uses the most information (all values from distribution contribute)

  14. Characteristics of the Mean • Balance point • Point around which deviations sum to zero • Deviation = X – X • For instance, if scores are 2,3,5,6,9 • Mean is 5 • Sum of deviations: (-3)+(-2)+0+1+4=0 • ∑ (X – X) = 0

  15. Characteristics of the Mean • Affected by extreme scores • Example 1 • Scores 7, 11, 11, 14, 17 • Mean = 12, Mode and Median = 11 • Example 2 • Scores 7, 11, 11, 14, 170 • Mean = 42.6, Mode & Median = 11

  16. Characteristics of the Mean • Balance point • Affected by extreme scores • Appropriate for use with interval or ratio scales of measurement • More stable than Median or Mode when multiple samples drawn from the same population • Basis for inferential stats

  17. Guidelines to Choose Measure of Central Tendency • Mean is preferred because it is the basis of inferential statistics • Median may be better for skewed data • Distribution of wealth in the US – ex. annual household income in Washington state for 2000: mean=$76,818; median=$42,024 • Mode to describe average of nominal data (eye color, hair color, etc…)

  18. Normal Distribution Frequency, How often a score occurs Scores

  19. MLB batting averages over 3-year span (min. 100 AB) Mean = 0.267 n = 1291

  20. Normal Distribution Mode “Normal” distribution indicates the data are perfectly symmetrical Median Mean Scores

  21. Positively skewed distribution Mode Median Mean Scores

  22. NFL Salaries 2011

  23. Negatively skewed distribution Mode Median Mean Scores

  24. Relationship among the MCT & shape of distribution

  25. Alaska’s average elevation of 1900 feet is less than that of Kansas. Nothing in that average suggests the 16 highest mountains in the United States are in Alaska. Averages mislead, don’t they? Grab Bag, Pantagraph, 08/03/2000

  26. Variability Measures of dispersion or spread The only thing constant is variation.

  27. the notions of variability • Unusual • Peculiar • Strange • Original • Extreme • Special • Unlike • Deviant • Dissimilar • different NY Times, 10/24/ 2010 Stories vs. Statistics By JOHN ALLEN PAULOS

  28. Variability defined • Measures of Central Tendency provide a summary level of the data • Recognizes that scores vary across individual cases • ie, the mean or median may not be an actual score in your distribution • Variability quantifies the spread of performance • How scores vary around mean/mode/median

  29. To describe a distribution • 1) Measure of Central Tendency • Mean, Mode, Median • 2) Measure of Variability • Multiple measures • Range, Interquartile range, Semi-Interquartile Range • Standard Deviation

  30. Range • Range = Difference between low/high score • # of hours spent watching TV/week • 2, 5, 7, 7, 8, 8, 10, 12, 12, 15, 17, 20 • Range = (Max - Min) Score • 20 - 2 = 18 • Very susceptible to outliers • Doesn’t indicate anything about variability around the mean/central point

  31. Semi-Interquartile range • What is a quartile?? • Divide sample into 4 parts of equal size • Q1 , Q2 , Q3 = Quartile Points • Interquartile Range = Q3 - Q1 • Difference between highest and lowest quartile • SIQR = IQR / 2 • Related to the Median…prevents outliers from overly skewing measure • For ordinal data or skewed interval/ratio

  32. BMD and walking Quartiles based on miles walked/week Krall et al, 1994, Walking is related to bone density and rates of bone loss. AJSM, 96:20-26

  33. Notes: Skewed Distribution? 95th Percentile? 50th Percentile vs Median?

  34. Variation itself is nature's only irreducible essence. Stephen Jay Gould Standard Deviation • Most commonly accepted measure of spread • Compute the deviations of all numbers from the mean • Square and THEN sum each of the deviations • Divide by the number of deviations • Finally, take the square root

  35. Standard Deviation • Distribution = 1, 3, 5, 7 • X = 16 /4 = 4 • 1) Compute Deviations = -3, -1, 1, 3 • 2) Square Deviations = 9, 1, 1, 9 • 3) Sum Deviations = 20 • 4) Divide by n= 20/4 = 5 • 5) Take square root = √5 = 2.2

  36. Key points about SD • SD small  data clustered round mean • SD largedata scattered from the mean • Affected by extreme scores (just like mean)…oftentimes called “outliers” • Consistent (more stable) across samples from the same population • Just like the mean - so it works well with inferential stats (where repeated samples are taken)

  37. SD Example • Three NFL quarterbacks with similar QB ratings in 2006: • Matt Hasselbeck (SEA) = 76.0 • Rex Grossman (CHI) = 73.9 • Brett Favre (GB) = 72.7 • Note: QB rating involves a complex formula accounting for passing attempts, completions, yards, touchdowns, and interceptions…100+ is considered outstanding & 70-80 is average • All appear to have had very similar, somewhat mediocre seasons as QB’s

  38. SD Example • Let’s look at the SD of their game-by-game QB ratings: • Matt Hasselbeck (SEA) = 29.97 • Rex Grossman (CHI) = 47.60 • Brett Favre (GB) = 27.81 • Grossman had, by far, the most variability (i.e. inconsistency) in his game-by-game performances…is this good or bad?

  39. Clinical Use of SD

  40. SD and the normal curve • The following concepts are critical to your understanding of how descriptive statistics works • Remember – a “normal” curve is perfectly symmetrical. This is not typical, but usually data are almost normal…

  41. SD and the normal curve About 68% of scores fall within 1 SD of mean X = 70 SD = 10 34.1% 34.1% 60 70 80

  42. The standard deviation and the normal curve About 68% of scores fall between 60 and 70 X = 70 SD = 10 34% 34% 60 70 80

  43. The standard deviation and the normal curve About 95% of scores fall within 2 SD of mean X = 70 SD = 10 34.1% 34.1% 13.6% 13.6% 50 60 70 80 90

  44. The standard deviation and the normal curve About 95% of scores fall between 50 and 90 X = 70 SD = 10 34.1% 34.1% 13.6% 13.6% 50 60 70 80 90

  45. The standard deviation and the normal curve About 99.7% of scores fall within 3 S.D. of the mean X = 70 SD = 10 34.1% 34.1% 13.6% 13.6% 2.3% 2.3% 40 50 60 70 80 90 100

  46. The standard deviation and the normal curve About 99.7% of scores fall between 40 and 100 X = 70 SD = 10 34.1% 34.1% 13.6% 13.6% 2.3% 2.3% 40 50 60 70 80 90 100

  47. What about = 70, SD = 5? • What approximate percentage of scores fall between 65 & 75? • …1SD below + 1SD above = 68% • What range includes about 99.7% of all scores? • …3SD below to 3SD above = 55 to 85

  48. Interpreting The Normal Table • Area under Normal Curve • Specific SD values (z) include certain percentages of the scores • Values of Special Interest • 1.96 SD = 47.5% of scores (47.5 + 47.5 = 95%) • 2.58 SD = 49.5% of scores (49.5 + 49.5 = 99%) • ie, 95% of scores fall within 1.96 standard deviations of the mean (1.96 above and 1.96 below)

  49. IQ 68% have an IQ between 85-115 X = 100 SD = 15 34.1% 34.1% 13.6% 13.6% 2.3% 2.3% 145 55 70 85 100 115 130

  50. MLB players’ batting averages over a 3-year span (min. 100 at bats) ~95% of players have an average between 0.196 and 0.337

More Related