Data Analysis for Description

# Data Analysis for Description

## Data Analysis for Description

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
##### Presentation Transcript

1. Data Analysis for Description Research Methods for Public Administrators Dr. Gail Johnson Dr. G. Johnson, www.ResearchDemystified.org

2. Simple But Concrete • The Children’s Defense Fund reports on each day in America: • Four children are killed by abuse or neglect • Five children or teens commit suicide • Eight children or teens are killed by firearms • Seventy-five babies die before their 1st birthday • http://www.childrensdefense.org/child-research-data-publications/each-day-in-america.html Dr. G. Johnson, www.ResearchDemystified.org

3. Simple But Concrete • A million seconds = 11 ½ days • A billion seconds= 32 years • A trillion seconds= 32,000 years Dr. G. Johnson, www.ResearchDemystified.org

4. Simple But Concrete • A \$700 billion bailout translates into \$2,333 IOU from every person in the U.S. • Or—using a different metric-it comes to \$45 per week for each person in the U.S. • Going one step further, it comes out to \$6 a day • Framing: are you willing to pay \$6 a day to have a functioning financial system?Read more: http://www.time.com/time/business/article/0,8599,1870699,00.html#ixzz0aqek0mRZ Dr. G. Johnson, www.ResearchDemystified.org

5. Going Too Far? • Six dollars a day is also 25 cents an hour, or less than half a penny a minute. • Framing: Would you be willing to pay less than half a penny a minute? • Key Point: Does the comparison point make a difference in what you would be willing to pay? • Read more: http://www.time.com/time/business/article/0,8599,1870699,00.html#ixzz0aqf9HSQ9 Dr. G. Johnson, www.ResearchDemystified.org

6. Common Descriptive Analysis • Counts: how many • Decennial census • Percents • Women earned 77% of what men earned in 2006, up from 59% in 1970 • Parts of a whole • Percents (75%) and proportions (.75 or three-quarters) Dr. G. Johnson, www.ResearchDemystified.org

7. Common Descriptive Analysis • But be mindful of “bigger pie” distortions when working with percents and proportions • If the pie grows much faster than the slice, the slice will appear relatively smaller as a percent even though it still grew • Best example is budget deficit as a percent of the GDP: if GDP grows much faster than the budget deficit, it will appear smaller even though it has also grown. Dr. G. Johnson, www.ResearchDemystified.org

8. Common Descriptive Analysis • Rates: number of occurrences that are standardized • Deaths of infants per 100,000 births • Crop yields per acre • Crime rates • Rates provide an apples-to-apples comparison between places of different size or populations Dr. G. Johnson, www.ResearchDemystified.org

9. Common Descriptive Analysis • Ratio: numbers presented in relationship to each other • Student to teacher ratio: 15:1 • Divide number of students by the number of teachers • 1,500 students and 45 teachers equals a 33 to 1 student to teacher ratio (1,500 divided by 45) Dr. G. Johnson, www.ResearchDemystified.org

10. Common Descriptive Analysis • Rates of change • Percentage change from one time period to the other • For example: The budget increased 23% from FY 2006 to FY 2007. Three Steps: • Divided newest data by oldest data • Subtract 1 • Multiple by 100 to get the percentage change Dr. G. Johnson, www.ResearchDemystified.org

11. Common Descriptive Analysis • Rates of change • Percentage change from one time period to the other • For example: The budget increased 23% from FY 2006 to FY 2007. Three Steps: • Divided newest data by oldest data • Subtract 1 • Multiple by 100 to get the percentage change Dr. G. Johnson, www.ResearchDemystified.org

12. Common Descriptive Analysis • Rates of change: applied • What was the rate of change in 1992 budget deficit as compared to 1980. • Divide 1992 budget deficit (\$290 billion) by the 1980 budget deficit (\$73.8 billion) = 3.93 • 3.93-1 – 2.93 • 2.93 x 100 = 293 percent • The budget deficit in current dollars (meaning not controlled for by inflation) increased 293 percent. Dr. G. Johnson, www.ResearchDemystified.org

13. Common Descriptive Analysis • Frequency Distributions • Number and percents of a single variable Dr. G. Johnson, www.ResearchDemystified.org

14. In The News: Women Now Are Majority of College Graduates Dr. G. Johnson, www.ResearchDemystified.org

15. Interpretation? • How would you interpret these percentages in the comparative trend analysis? • Are you surprised by the changes over time? • Why or why not? Dr. G. Johnson, www.ResearchDemystified.org

16. Frequency and Percent Distributions • Survey data: analyzed by distributions • How many men and women are in the program? Distribution of Respondents by Gender: Male Female Total Number Percent Number Percent Number 100 33% 200 67% 300 Dr. G. Johnson, www.ResearchDemystified.org

17. Frequency and Percent Distributions • How many men and women are in the program? Write-up: Of the 300 people in this program, 67% are women and 33% are men. Dr. G. Johnson, www.ResearchDemystified.org

18. Different Analysis Tools For Different Situations • Frequency/percent distributions make sense when working with nominal and ordinal data • But frequency/percent distributions for interval/ratio data can result in a ridiculously long table that is impossible to interpret • If I ask 500 people how many years they lived in an area, I can can get a wide range of answers. • For this type of data, I would then look at means, medians, modes to describe that variable. Dr. G. Johnson, www.ResearchDemystified.org

19. Describing Distributions • Central tendency • Means, Medians, Modes • How similar are the characteristics? • Example: Use when we want to describe the similarity of the ages of a group of people. • Dispersion • Range, standard deviation • How dissimilar are the characteristics? • Example: how much variation in the ages? Dr. G. Johnson, www.ResearchDemystified.org

20. Measures of Central Tendency • The 3-Ms: Mode, Median, Mode. • Mode: most frequent response. • Median: mid-point of the distribution • Mean: arithmetic average. Dr. G. Johnson, www.ResearchDemystified.org

21. Basic Concepts Revisited • Levels of Measurement • Nominal Level Data: names, categories • Eg. Gender, religion, state, country • Ordinal Level Data: data with an order, going from low to high • Eg. Highest educational degree, income categories, agree—disagree scales • Interval Level Data: numbers but no zero • Eg. IQ scores, GRE scores • Ratio Level Data: real numbers with a zero point • Eg. Age, weight, income, temperature Dr. G. Johnson, www.ResearchDemystified.org

22. Which Measure of Central Tendency to Use? Depends on the type of data you have: • Nominal data: mode • Ordinal data: mode and median • Interval/ratio: mode, median and mean Dr. G. Johnson, www.ResearchDemystified.org

23. For Interval Or Ratio Data: Which One To Use? • Concept of the Normal Distribution—also called the bell-shape curve • In a normal distribution, the mean, median and mode should be very similar • Use mean if distribution is normal • Use median if distribution is not normal Dr. G. Johnson, www.ResearchDemystified.org

24. Normal Distribution: Bell-Shaped Curve Mean http://en.wikipedia.org/wiki/Normal_distribution Dr. G. Johnson, www.ResearchDemystified.org

25. Office contributions • \$10, \$ 1, \$.50, \$.25, \$.25. • The mean is \$2.40 (add up and divide by 5) • The median is .50 (the mid-point of this distribution) • The mode is .25 (the most frequently reported contribution) • Best description of contributions is median. Dr. G. Johnson, www.ResearchDemystified.org

26. Salaries • Assume that you had 11 teachers. 10 teachers earned \$21,000 per year and one earned \$1,000,000. • What would be the best measure to describe this data? Dr. G. Johnson, www.ResearchDemystified.org

27. Salaries • The average salary would be \$110,000. • The median and mode is \$21,000. • The curve would be positively skewed, i.e. Mean higher than Mode and Median • The median would do the best job at describing the center the salaries Dr. G. Johnson, www.ResearchDemystified.org

28. Skewed Data • negative skew: The mass of the distribution is concentrated on the right of the figure. It has relatively few low values. The distribution is said to be left-skewed. • positive skew: The mass of the distribution is concentrated on the left of the figure. It has relatively few high values. The distribution is said to be right-skewed. The \$ million salary pulls the average up. Wikipedia: http://en.wikipedia.org/wiki/Skewness Dr. G. Johnson, www.ResearchDemystified.org

29. Skewed Distributions:Negative and Positive http://en.wikipedia.org/wiki/File:Skewness_Statistics.svg Dr. G. Johnson, www.ResearchDemystified.org

30. Using Means With Survey Data? • Survey data is typically coded using numbers: • Gender: Male is coded 1 • Female is coded 2 • It is faster and less error-prone to code variables using numbers • But the computer could treat these as numbers and will compute a mean if asked • How would you interpret a mean for gender of 1.6? Or a mean for religion of 2.8 Dr. G. Johnson, www.ResearchDemystified.org

31. Do Not Use Means With Nominal Data • Gender (and religion) are nominal variables and should only be reported in terms of distributions: • Frequency distribution: 10 men and 12 women • Percentage distribution: 45% men and 55% women Dr. G. Johnson, www.ResearchDemystified.org

32. Using Means With Survey Data? • Scales (very satisfied<->very dissatisfied are ordinal scales • But they coded into the computer using numbers • 5 for very satisfied<->1 for very dissatisfied • The computer will compute a mean if asked: • The mean was 3.8 for job satisfaction. • The mean satisfaction with faculty performance was 4.2 on a scale from 1-5 • Grade-point averages are an example of means based on an ordinal scale (A—F (scale of 0-4) Dr. G. Johnson, www.ResearchDemystified.org

33. Using Means With Ordinal Data? • There is disagreement in the field—partly based on academic discipline-about whether to use means with ordinal data. • Things like GPA or faculty ratings are often shown as means • It is often helpful for researchers to look at the means initially when working with a lot of data—researchers are looking for unusually high or low means. • It is also true that sometimes it is easier to show the means than the percentage distribution for every variable Dr. G. Johnson, www.ResearchDemystified.org

34. Washington Employee Survey

35. Using Means With Ordinal Data? • But most people are more familiar with polling results, which report percent distributions. • We tend to see something like 55% report supporting cap and trade legislation rather than a mean of 3.4 on a scale of 5 (for) to 1 (against). • The decision about whether means or percent distributions are used to report ordinal data should reflect audience preference and ease of audience understanding. • Not an ideological stance Dr. G. Johnson, www.ResearchDemystified.org

36. Measures of Dispersion • Used with Interval and Ratio Data • Simple Description: The Range • Reported salaries ranged from \$21,000 to \$1,000,000 • Ages in the group ranged from 18 to 32 • Standard Deviation • Measures the dispersion in terms of the the distance from the mean • Small standard deviation: not much dispersion • Large standard deviation: lots of dispersion Dr. G. Johnson, www.ResearchDemystified.org

37. Standard Deviation • Normal Distribution: Bell-shaped curve • 68% of the variation is within 1 standard deviation of the mean • 95% of the variation is within 2 standard deviations of the mean Dr. G. Johnson, www.ResearchDemystified.org

38. Normal Distribution 95% of the distribution Standard deviations Standard deviations Mean

39. Applying the Standard Deviation • Average test score= 60. • The standard deviation is 10. • Therefore, 95% of the scores are between 40 and 80. • Calculation: • 60+20=80 60-20=40. Dr. G. Johnson, www.ResearchDemystified.org

40. Standard Deviation with Means • The Standard Deviation is used with interval/ratio level data • Typically, standard deviations are presented with means so the reader can tell whether there is a lot or a little variation in the distribution. • Note: the standard deviation is sometimes used in other statistical calculations, such as z-scores and confidence intervals Dr. G. Johnson, www.ResearchDemystified.org

41. Describing Two Variables Simultaneously • Cross-tabulations (cross tabs, contingency tables) • Used when working with nominal and ordinal data • It provides great detail Dr. G. Johnson, www.ResearchDemystified.org

42. Describing Two Variables Simultaneously Detail about the race and gender of the 233 people in the workplace: Dr. G. Johnson, www.ResearchDemystified.org

43. Describing Race and Gender • Write-up: Of the 233 employees, the greatest proportion are white women (31%) followed by white men (21%). Fifteen percent of the employees are black men and 11% are black women, and 14% are men of other race identity and 6% are women of other race identity. Dr. G. Johnson, www.ResearchDemystified.org

44. Describing Two Variables Simultaneously Comparison of Means • Used when one variable is nominal or ordinal, and the second variable is interval/ration level of measurement. • Examples: • Men in the MPA program have a GPA of 3.2 as compared to 3.0 for women. • The mean overall citizen satisfaction score is 4.2 this year as compared to 3.5 last year. • Mean salary for women was \$35,000 as compared to \$38,000 for men last year. Dr. G. Johnson, www.ResearchDemystified.org

45. Key Points • These simple descriptive analysis techniques can be effective: • Illuminates, provides feedback, informs and might persuade. • The math is generally straight-forward. • Descriptive data is generally easy for many people understand as compared to more complex statistics (stay tuned). • Complex statistics are not inherently better! Dr. G. Johnson, www.ResearchDemystified.org

46. The Tough Question • If descriptive data is distorted, it is tends to be in the way things are being counted and measured. • The math is usually correct. • Example: The federal debt is often presented just in terms of percent of debt held by the public but the total debt includes money borrowed from other government funds. • As a result, the debt looks smaller than what it actually is. Dr. G. Johnson, www.ResearchDemystified.org

47. The Tough Question • If descriptive data is distorted, it is tends to be in the way things are being counted and measured. The math is usually correct • Example. Health insurance profits look different when calculated as a percent of corporate revenue than when calculated as a percent of all spending on health care. • It will look smaller when presented as a percent of all health care spending which is larger than just corporate insurance revenue. Dr. G. Johnson, www.ResearchDemystified.org

48. The Tough Question • Always ask: what exactly is being measured and counted? • Consider whether there are other ways of counting and other ways of doing the analysis that might yield different results (or create different perceptions). • Do the choices reflect a political agenda? Dr. G. Johnson, www.ResearchDemystified.org

49. Creative Commons • This powerpoint is meant to be used and shared with attribution • Please provide feedback • If you make changes, please share freely and send me a copy of changes: • Johnsong62@gmail.com • Visit www.creativecommons.org for more information