Inferential Statistics

Inferential Statistics Research Methods for Public Administrators Dr. Gail Johnson Dr. G. Johnson, www.researchdemystified.org

Welcome to Inferential Statistics • This is a companion to Sampling Demystified • It could be argued that this should follow that chapter • If the results are not statistically significant, no further analysis is warranted • But some people find inferential statistics overwhelming so I saved it for last • There is much that can be done with descriptive data analysis but it gets overshadowed by the fancier statistics of regression and inference. Dr. G. Johnson, www.researchdemystified.org

Welcome to Inferential Statistics • Used when working with data from random samples • Used when researchers want to infer conclusions about a population based on results from a randomly selected sample from that population • Hence the term “Inferential” • Jargon term: generalizability Dr. G. Johnson, www.researchdemystified.org

Inferential Statistics: A Powerful Analytical Tool • Enables researchers to: • Estimate population proportions • Estimate population mean • Estimate sampling error • Estimate confidence intervals • Test for statistical significance Dr. G. Johnson, www.researchdemystified.org

Confidence Revisited • Estimate the population mean or proportion based on the sample survey • Confidence level: social science standard is 95% • 95% certain that our population estimate is correct within a specified range • This is the precision of the estimates • 90% confidence level is the lowest level that should be used • In some cases, the researchers might want to raise the bar to 99%--to very, very certain Dr. G. Johnson, www.researchdemystified.org

Confidence Revisited • Confidence interval: this is the range where the true mean exists • Social science standard for the confidence interval is plus or minus 5% • Sampling error is the analogous term when working with proportions, like with survey data • Sometimes called the margin of error Dr. G. Johnson, www.researchdemystified.org

Sampling Error: Revisited • Most familiar in polling data: • Big national surveys use 95 percent confidence level with a margin of error of Typically results are within +/- 3% • That means that if we had surveyed everyone, the researchers are 95% certain that the results would be within +/-3% of the results from the survey. Dr. G. Johnson, www.researchdemystified.org

Sampling Error: Revisited • 11/15/09 Poll: Views on Cap and Trade. • There's a proposed system called "cap and trade." The government would issue permits limiting the amount of greenhouse gases companies can put out. Companies that did not use all their permits could sell them to other companies. The idea is that many companies would find ways to put out less greenhouse gases, because that would be cheaper than buying permits. Would you support or oppose this system? • Results: Support Oppose 53 42 Dr. G. Johnson, www.researchdemystified.org

Sampling Error: Revisited • The sampling error is plus or minus 3 percent • If they had surveyed everyone: • The real percentage supporting cap and trade would be between 56% and 50% • The real percentage opposing cap and trade would be between 45% and 39% Dr. G. Johnson, www.researchdemystified.org

Sampling Error: Revisited • Sampling error provides a likely range for the true proportion in the population • If the sampling errors overlap, then there is no discernable difference in the views--“too close to call” Dr. G. Johnson, www.researchdemystified.org

Statistical Significance • When working with random sample data, the big question is: • How likely are these results a fairly accurate reflection of the large population from which the sample was taken? • Put another way: are these results just a quirk of chance? Dr. G. Johnson, www.researchdemystified.org

Statistical Significance • Statisticians have provided researchers with analytical techniques to estimate how likely it is that the researchers have gotten the results they see in their analysis of sample data by chance. • These techniques are called tests of statistical significance. Dr. G. Johnson, www.researchdemystified.org

Statistical Significance • We do not need to understand calculus in order to understand how to interpret tests of statistical significance • We just have to have faith that the statisticians have figured out the correct theories and that computers have been programmed to give correct results. • I Believe, I Believe! Dr. G. Johnson, www.researchdemystified.org

Statistical Significance • The logic will seem familiar. • Researchers set a standard for determining how much risk they are willing to take that the observed results are due to random chance • The social science standard or convention is to set an alpha level or p value of .05 or less. • They run the statistical significance test. • If the test comes in at .05 or less, the researchers conclude that there is little probability (less than 5 percent) that the results are due to chance. Dr. G. Johnson, www.researchdemystified.org

Another Way To Understand Statistical Significance • If I took 100 random samples from this population, only 5 out of 100 would have the results I have gotten. • It is unlikely, therefore, that I would have gotten such unusual results. • I am willing to take a risk that my sample results fairly accurately captures what is true in the larger population from which the sample was selected. Dr. G. Johnson, www.researchdemystified.org

How much risk? • It All Depends! • The standard is .05 or less, meaning there is 95% chance of being reasonably accurate (i.e.within sampling error) • I could raise the bar and set the standard at .01 or less, meaning there is 99% chance of being accurate • I could lower the bar and set the standard at .10, meaning there is a 90% chance of being accurate Dr. G. Johnson, www.researchdemystified.org

Statistical Significance: The Logic of Hypothesis Testing • Research Hypothesis • Women and men earn different salaries. • Null Hypothesis: • There is no difference between women and men’s salaries. • Remember: the null hypothesis is always one of “no difference” Dr. G. Johnson, www.researchdemystified.org

Steps In The Process • Collect salary data from a random sample of men and women across the U.S. • Analyze the data • There is a $5,000 difference • Because I am working with random sample data, you have to determine whether this $5,000 difference is the result of chance • In the jargon: is this difference statistically significant? Dr. G. Johnson, www.researchdemystified.org

Testing for Statistical Significance: • Testing against the Null Hypothesis: • What is the probability of getting a $5,000 difference in my sample results if there really is no difference in the population from which the sample was drawn? • I set the alpha or p value at .05. • I run the test for statistical significance. Dr. G. Johnson, www.researchdemystified.org

Testing for Statistical Significance: • If the test is .05 or less, I reject the null hypothesis • This means that the probability of getting the $5,000 difference when there really is no difference in the population is 5% or less. I am willing to take the risk and therefore I reject the null hypothesis. • I conclude that there is a $5,000 difference in salaries between men and women, and that difference is statistically significant. Dr. G. Johnson, www.researchdemystified.org

Testing for Statistical Significance: • If the test is more than .05, there is too great a chance that the results do not reflect the population. • This difference of $5,000 difference might be due to random chance. • I would conclude that this salary difference is not statistically significant. Dr. G. Johnson, www.researchdemystified.org

Remember: • A statistical significance test is nothing more than a determination of the probability of getting the results the researchers got by chance. Dr. G. Johnson, www.researchdemystified.org

Common Tests for Statistical Significance • Chi Square: nominal and ordinal data • T-tests: DV: interval/ratio data; IV: nominal/ordinal with2 categories • Anova: DV: interval/ratio data; IV nominal/ordinal with 3+ categories • F-tests: interval/ratio data Dr. G. Johnson, www.researchdemystified.org

Statistical Significance • There are 100+ kinds of tests for statistical significance. • Good news! They all get interpreted the same way. • If researchers set the probability level at .05: • Then anything that is .05 or less is statistically significant. • And anything that is more than .05 is not statistically significant. Dr. G. Johnson, www.researchdemystified.org

Test for Statistical Significance: Chi Square • Use with crosstabs • Chi Square is based on a mathematical formula that looks at the differences between the actual data compared to how the data should have looked if there was no difference. • The more difference there is, the more likely that the results will be statistically significant. Dr. G. Johnson, www.researchdemystified.org

Chi Square • If there was no difference in attitudes based on gender (which is our null hypothesis), our crosstab would expect to see results similar to this: For Against Men 50 50 Women 50 50 Dr. G. Johnson, www.researchdemystified.org

Chi Square • But what if our respondents actually reported this way: For Against Men 75 25 Women 25 75 • Clearly, there is a difference in attitudes based on gender. Dr. G. Johnson, www.researchdemystified.org

Example: Gender and Gun Law • Are views on gun permit laws different based on gender? • Results: it appears that women are somewhat more likely (89%) to favor gun permit law than men (77%). • But are these results statistically significant? • The computer calculates a p value of .001 • Conclusion? Dr. G. Johnson, www.researchdemystified.org

Example: Gender and Abortion Attitudes • Are views on abortion for any reason different based on gender? • 48 percent of men favor abortion for any reason as compared to 49 percent of women. • But are these results statistically significant? • The computer calculates a p value of .78 • Conclusion? Dr. G. Johnson, www.researchdemystified.org

Statistical Significance: T-Tests • Used with means, comparison of means • Single Mean: • Interval/ration data where you are comparing to a known population mean • Paired Means: • before and after design • Independent Means: • comparing 2 means • For t-tests: the dependent variable must be interval or ratio level data. Dr. G. Johnson, www.researchdemystified.org

Testing a Hypothesis about a Single Mean: • Research hypothesis: There is a difference in average hours worked as compared to “40.” • Null: not different from 40 • Results: Average number of hours =42. • T-test (p value) =.000 • Interpretation? Dr. G. Johnson, www.researchdemystified.org

Interpretation Process • In this case, you are comparing the actual result against the assumption that the norm is 40 hours. • How likely is to get 42 hours if the the real average in the population is 40? • It is less than .05 • It is very unlikely you would have gotten these results by chance alone, so you reject the null hypothesis. • Conclusion: the average number of hours worked is 42 and these results are statistically significant. Dr. G. Johnson, www.researchdemystified.org

Independent T-Test: Gender and Income • Is there a difference in men’s and women’s income? • The research hypothesis is that there is a difference in salaries. • The null hypothesis is that there is no difference: • Technically: The groups are independent or there is no difference in the population means for these 2 groups. Dr. G. Johnson, www.researchdemystified.org

Independent T-Test: Gender and Income • We collect the data and compare means • We run an independent t-test • Note: this test can only be used with a nominal independent variable with two values like gender, and an interval/ratio level dependent variable • Results: Mean for men: $38,000 Mean for women: $33,000 T-test = .001 Interpretation? Dr. G. Johnson, www.researchdemystified.org

F-Tests with Analysis of Variance • Used when researchers have an independent variable with more than 2 categories • Examples: • Religion (Christian, Jewish, Muslim, Buddhist, None) • Marital status (single, married, divorced) • Education (HS, College, Graduate Degree) Dr. G. Johnson, www.researchdemystified.org

Example: Working The Statistical Significance Logic • Is there a difference in income based on whether one has a High School degree or less, some college or completed a bachelor’s degree, or has a graduate degree • Your Research Hypothesis is? • Your Null Hypothesis is? Dr. G. Johnson, www.researchdemystified.org

Results: Education and Income • HS or less: $29, 225 • College $46,764 • Graduate $62,275 • But are these results statistically significant? • F-test = .001 • Your Conclusion? Dr. G. Johnson, www.researchdemystified.org

But There Is Potential For Error • Type I and Type II Errors • Type I Error: • This occurs when the null hypothesis is rejected even though it is actually true. • “There really is no difference in salaries population but we concluded that there was a statistically significant difference.” • In very large samples, small differences will be found to be statistically significant. Dr. G. Johnson, www.researchdemystified.org

But There Is Potential For Error-at Least a 5% Chance • Type II Error: • This occurs when researchers fail to reject the null hypothesis even though it is false. • “There really is a difference in salaries in the population but we concluded there were no statistically significant difference in salaries between men and women.” Dr. G. Johnson, www.researchdemystified.org

No Way To Avoid Error When Working With Random Sample Data • To avoid a Type I error, the researchers may want to make it harder to reject the null hypothesis • So they will raise the bar—and set the alpha or p-value at .01 rather than .05 • But by doing so, they have increased the likelihood of making a Type II error Dr. G. Johnson, www.researchdemystified.org

No Way To Avoid Error When Working With Random Sample Data • To avoid a Type II error, the researchers may want to make it easier to reject the null hypothesis • So they will lower the bar—and set the alpha or p-value at .10 rather than .05 • Or they will increase sample size • But by making it easier to reject the null hypothesis, they will increase the likelihood of making a Type I error. Dr. G. Johnson, www.researchdemystified.org

Which Error Is Worse? It Depends • Generally, social scientists feel that it is worse to make a Type I error than a Type II error. • It is more problematic to conclude there is a difference or an impact when there really isn’t any. • For example, concluding that a drug has a statistically significant positive impact when the results are just a Type I error is a problem. Dr. G. Johnson, www.researchdemystified.org

Which One Is Worse?Type I and Type II • As a program manager, you may feel that it is worse to make a Type II error. • In this case, the null hypothesis of “No difference” would not be rejected. • The risk is that “No statistically significant differences were found” might turn into a conclusion that the program did not work. • But technically, all that should be concluded is the researchers “failed to reject the null hypothesis.” • The program may actually make a difference that the researchers failed to detect. Dr. G. Johnson, www.researchdemystified.org

More Statistical Significance Concepts • ONE-Tailed Test: is used whenever the hypothesis specifies a direction. • Men will earn more than women • We are concerned with only one tail of the normal curve. • Easier to reject a null-hypothesis. Dr. G. Johnson, www.researchdemystified.org

More Statistical Significance Concepts • TWO-tailed test: when the research question does not specify a direction. • The salaries of men and women are different • Generally the default on statistical software packages. • Generally the more “conservative” measure: harder to reject a null hypothesis. Dr. G. Johnson, www.researchdemystified.org

Statistical Significance Does Not Mean Meaningful Or Important • They surveyed 3000 people, selected randomly across the U.S. • 87% with a private physician reported being satisfied • 85% of those with an HMO physician reported being satisfied. • These results were statistically significant. • Are they meaningfully different? Dr. G. Johnson, www.researchdemystified.org

Statistical Significance Does Not Mean Meaningful Or Important • Statistical Significance has a narrow meaning and is based on mathematics • Although the researchers do decide on the alpha or p-value they will set as the criterion for whether the results are statistically significant • “Meaningful” or “important” is a judgment call. • But remember: “significance” is a word owned by statisticians—so only use it when you are talking about tests for statistical significance. Dr. G. Johnson, www.researchdemystified.org

Statistical Significance Does Not Mean • The results are meaningful or important. • The relationship is strong or weak. • That design errors have been eliminated. • A test result of .001 rather than .049 is not stronger or better in any other sense than there is a lower probability the results are due to random chance. Dr. G. Johnson, www.researchdemystified.org

Statistical Significance Does Not Mean • That non-sampling errors have been eliminated. • Poorly worded survey questions, error-prone data entry, low response rates, systematic bias in respondents, etc etc have to acknowledged as limitations of the study even if the results are reported as statistically significant. Dr. G. Johnson, www.researchdemystified.org

Over-attachment To Statistical Significance Tests • “Unfortunately, researchers often place undue emphasis on significance tests….Perhaps it is because they have spent so much time in courses learning to use significance tests, that many researchers give the tests an undue emphasis in their research.” --Phillip Shively, p. 172 Dr. G. Johnson, www.researchdemystified.org

Inferential Statistics