Probability and Statistical Inference Gehlbach: Chapter 8

1. Probability and Statistical InferenceGehlbach: Chapter 8

3. Basic Concepts of Statistics

4. General Approach to Statistical Analysis

5. Outline Probability Definition Probability Laws Random Variable Probability Distributions Statistical Inference Definition Sample vs. Population Sampling Variability Sampling Problems Central Limit Theorem Hypothesis Testing Test Statistics P-value Calculation Errors in Inference P-value Adjustments Confidence Intervals

6. We disagree with Stephen A working understanding of P-values is not difficult to come by. For the most part, Statistics and clinical research can work well together. Good collaborations result when researchers have some knowledge of design and analysis issues

7. Probability

8. Probability and the P-value You need to understand what a P-value means P-value represents a probabilistic statement Need to understand concept of probability distributions More on P-values later

9. Definition of Probability An experiment is any process by which an observation is made An event (E or Ei) is any outcome of an experiment The sample space (S) is the set of all possible outcomes of an experiment Probability: a measure based on the sample S; in the simplest case is empirically estimated by # times event occurs / total # trials E.g.: Pr(of a red car) = (# red cars seen) / (total # cars) Probability is the basis for statistical inference

10. Axiomatic Probability (laying down �the laws�) For any sample space S containing events E1, E2, E3,�; we assign a number, P(Ei), called the probability of Ei such that: 0 = P(Ei) = 1 P(S) = 1 If E1, E2, E3,�are pairwise mutually exclusive events in S then

11. Union and Intersection:Venn Diagrams

12. Laws of Probability (the sequel) Let (�E complement�) be set of events in S not in E, then P( )= 1-P(E) P(E1U E2) = P(E1) + P(E2) � P(E1n E2) The conditional probability of E1 given E2 has occurred: Events E1 and E2 are independent if P(E1nE2) = P(E1)P(E2)

13. Conditional Probability Restrict yourself to a �subspace� of the sample space

14. Categorical data analysis: odds ratio = ratio of odds of two conditional probabilities Survival analysis, conditional probabilities of the form : P(alive at time t1+t2 | survive to t1) Conditional probability examples

15. Random Variables (where the math begins) A random variable is a (set) function with domain S and range (i.e., a real-valued function defined over a sample space) E.g.: tossing a coin, let X=1 of heads, X=0 if tails P(X=0) = P(X=1) = � Many times the random variable of interest will be the realized value of the experiment (e.g., if X is the b-segment PSV from RDS) Random variables have probability distributions

16. Probability Distributions Two types: Discrete distributions (and discrete random variables) are represented by a finite (or countable) number of values P(X=x) = p(x) Continuous distributions (and random variables) are be represented by a real-valued interval P(x1<X<x2) = F(x2) � F(x1)

17. Expected Value & Variance Random variables are typically described using two quantities: Expected value = E(X) (the mean, usually ��) Variance = V(X) (usually �s2�) Discrete Case: E(X) = V(X) = Continuous Case:

18. Discrete Distribution Example Binomial: Experiment consists of n identical trials Each trial has only 2 outcomes: success (S) or failure (F) P(S) = p for a single trial; P(F) = 1-p = q Trials are independent R.V. X = the number of successes in n trials

19. Continuous Distribution Example Normal (Gaussian): The normal distribution is defined by its probability density function, which is given as for parameters � and s, where s > 0.

22. Statistical Inference

23. Statistical Inference Is there a difference in the population? You do not know about the population. Just the sample you collected. Develop a Probability model Infer characteristics of a population from a sample How likely is it that sample data support null hypothesis

24. Statistical Inference Mean = ?

25. Definition of Inference Infer a conclusion/estimate about a population based on a sample from the population If you collect data from whole population you don�t need to infer anything Inference = conducting hypothesis tests (for p-values), estimating 95% CI�s

26. Sample vs. Population (example) �The primary sample [involved] students in the 3rd through 5th grades in a community bordering a major urban center in North Carolina� The sampling frame for the study was all third through fifth-grade students attending the seven public elementary schools in the community (n=2,033). From the sampling frame, school district evaluation staff generated a random sample of 700 students.� Source: Bowen, NK. (2006) Psychometric properties of Elementary School Success Profile for Children. Social Work Research, 30(1), p. 53.

27. Philosophy of Science Idea: We posit a paradigm and attempt to falsify that paradigm. Science progresses faster via attempting to falsify a paradigm than attempting to corroborate a paradigm. (Thomas S. Kuhn. 1970. The Structure of Scientific Revolutions. University of Chicago Press.)

28. Philosophy of Science Easier to collect evidence to contradict something than to prove truth? The fastest way to progress in science under a paradigm of falsification is through perturbation experiments. In epidemiology, often unable to do perturbation experiments it becomes a process of accumulating evidence Statistical testing provides a rigorous data-driven framework for falsifying hypothesis

29. What is Statistical Inference? A generalization made about a larger group or population from the study of a sample of that population. Sampling variability: repeat your study (sample) over and over again. Results from each sample would be different.

30. Sampling Variability Mean = ?

31. Sampling Variability Mean = ?

32. Sampling Problems Low Response Rate Refusals to Participate Attrition

33. Low Response Rate Response rate = % of targeted sample that supply requested information Statistical inferences extend only to individuals who are similar to completers Low response rate ? Nonresponse bias, but is a possible symptom

34. Low Response Rate (examples) �One hundred six of the 360 questionnaires were returned, a response rate of 29%.� Source: Nordquist, G. (2006) Patient insurance status and do-not-resuscitate orders: Survival of the richest? Journal of Sociology & Social Welfare, 33(1), p. 81. �At the 7th week, we sent a follow-up letter to thank the respondents and to remind the nonrespondents to complete and return their questionnaires. The follow-up letter generated 66 additional usable responses.� Source: Zhao JJ, Truell AD, Alexander MW, Hill IB. (2006) Less success than meets the eye? The impact of Master of Business Administration education on graduates� careers. Journal of Education for Business, 81(5), p. 263. �The response rate, however, was below our expectation. We used 2 procedures to explore issues related to non-response bias. First, there were several identical items that we used in both the onsite and mailback surveys. We compared the responses of the non-respondents to those of respondents for [both surveys]. No significant differences between respondents and non-respondents were observed. We then conducted a follow-up telephone survey of non-respondents to test for potential non-response bias as well as to explore reasons why they had not returned their survey instruments�� Source: Kyle GT, Mowen AJ, Absher JD, Havitz ME. (2006) Commitment to public leisure service providers: A conceptual and psychometric analysis. Journal of Leisure Research, 38(1), 86-87.

35. Refusals to Participate Similar kind of problem to having low response rates Statistical inferences may extend only to those who agreed to participate, not to all asked to participate Compare those who agree to refusals

36. Refusals to Participate (example) �Participants were 38 children aged between 7 and 9 years. Children were from working- or middle-class backgrounds, and were drawn from 2 primary schools in the north of England. Letters were sent to the parents of all children between 7 and 9 in both schools seeking consent to participate in the study. Around 40% of the parents approached agreed for their children to take part.� Source: Meins E, Fernyhough C, Johnson F, Lidstone J. (2006) Mind-mindedness in children: Individual differences in internal-state talk in middle childhood. British Journal of Developmental Psychology, 24(1), p. 184.

37. Attrition Individuals who drop out before study�s end (not an issue for every study design) Differences between those who drop out and those who stay in are called Attrition bias. Conduct follow-up study on dropouts Compare baseline data

38. Attrition (example) ��Of the 251 men who completed an assigned intervention, about a fifth (19%) failed to return for a 1-month assessment and more than half (54%) for a 3-month assessment� Conclusions also cannot be generalized beyond the sample [partly because] attrition in the evaluation study was relatively high and it was not random. Therefore, findings cannot be generalized to those least likely to complete intervention sessions or follow-up assessments.� Source: Williams ML, Bowen AM, Timpson SC, Ross MW, Atkinson JS. (2006) HIV prevention and street-based male sex workers: An evaluation of brief interventions. AIDS Education & Prevention, 18(3), pp.207-214. �The 171 participants who did not return for their two follow-up visits represent a significant attrition rate (34%). A comparison of demographic and baseline measures indicated that [those who stayed in the study versus those who did not] differed on age, BMI, when diagnosed, language, ethnicity, HbA1c, PCS, MCS and symptoms of depression (CES-D).� Source: Maljanian R, Grey N, Staff I, Conroy L. (2005) Intensive telephone follow-up to a hospital-based disease management model for patients with diabetes mellitus. Disease Management, 8(1), p. 18.

39. Back to Inference�.

40. Motivation Typically you want to see if there are differences between groups (i.e., Treatment vs. Control) Approach this by looking at �typical� or �difference on average� between groups Thus we look at differences in central tendency to quantify group differences Test if two sample means are different (assuming same variance) in experiment

42. Central Limit Theorem The CLT states that regardless of the distribution of the original data, the average of the data is Normally distributed Why such a big deal? Allows for hypothesis testing (p-values) and CI�s to be estimated

43. Central Limit Theorem If a random sample is drawn from a population, a statistic (like the sample average) follows a distribution called a �sampling distribution�. CLT tells us the sampling distribution of the average is a Normal distribution, regardless of the distribution of the original observations, as the sample size increases.

45. What is the P-value? The p-value represents the probability of getting a test statistic as extreme or more under the null hypothesis That is, the p-value is the chances you obtained your data results under the assumption that your null hypothesis is true. If this probability is low (say p<0.05), then you conclude your data results do not support the null being true and �reject the null hypothesis.�

46. Hypothesis Testing & P-value P-value is: Pr(observed data results | null hypothesis is true) If P-value is low, then conclude null hypothesis is not true and reject the null (�in data we trust�) How low is low?

47. Statistical Significance If the P-value is as small or smaller than the pre-determined Type I error (size) ?, we say that the data are statistically significant at level ?. What value of ? is typically assumed?

50. Why P-value < 0.05 ? This arbitrary cutoff has evolved over time as somewhat precedent. In legal matters, courts typically require statistical significance at the 5% level.

51. The P-value The P-value is a continuum of evidence against the null hypothesis. Not just a dichotomous indicator of significance. Would you change your standard of care surgery procedure for p=0.049999 vs. p=0.050001?

52. Gehlbach�s beefs with P-value Size of P-value does not indicate the [clinical] importance of the result Results may be statistically significant but practically unimportant Differences not statistically significant are not necessarily unimportant ***

53. Any difference can become statistically significant if N is large enough Even if there is statistical significance is there clinical significance?

54. Controversy around HT and P-value �A methodological culprit responsible for spurious theoretical conclusions� (Meehl, 1967; see Greenwald et al, 1996) �The p-value is a measure of the credibility of the null hypothesis. The smaller the P-value is, the less likely one feels the null hypothesis can be true.�

55. HT and p-value �It cannot be denied that many journal editors and investigators use P-value < 0.05 as a yardstick for the publishability of a result.� �This is unfortunate because not only P-value, but also the sample size and magnitude of a physically important difference determine the quality of an experimental finding.�

56. HT and p-value �[We] endorse the reporting of estimation statistics (such as effect sizes, variabilities, and confidence intervals) for all important hypothesis tests.� Greenwald et al (1996)

57. Test Statistics Each hypothesis test has an associated test statistic. A test statistic measures compatibility between the null hypothesis and the data. A test statistic is a random variable with a certain distribution. A test statistic is used to calculate probability (P-value) for the test of significance.

58. How a P-value is calculated A data summary statistic is estimated (like the sample mean) A �test� statistic is calculated which relates the data summary statistic to the null hypothesis about the population parameter (the population mean) The observed/calculated test statistic is compared to what is expected under the null hypothesis using the Sampling Distribution of the test statistic The Probability of finding the observed test statistic (or more extreme) is calculated (this is the P-value)

59. Hypothesis Testing Set up a null and alternative hypothesis Calculate test statistic Calculate the P-value for the test statistic Based on P-value make a decision to reject or fail to reject the null hypothesis Make your conclusion

60. Errors in Statistical Inference

61. The Four Possible Outcomesin Hypothesis Testing

62. The Four Possible Outcomesin Hypothesis Testing

63. Type I Errors

64. Type II Errors

65. P-value adjustments

66. P-value adjustments Sometimes adjustments for multiple testing are made Bonferroni a = (alpha) / (# of tests) alpha is usually 0.05 (P-value cutoff) Bonferroni is a common (but conservative) adjustment; many others exist

67. P-value adjustments (example) �An alpha of .05 was used for all statistical tests. The Bonferroni correction was used, however, to reduce the chance of committing a Type I error. Therefore, given that five statistical tests were conducted, the adjusted alpha used to reject the null hypothesis was .05/5 or alpha = .01.� Source: Cumming-McCann A. (2005) An investigation of rehabilitation counselor characteristics, white racial attitudes, and self-reported multicultural counseling competencies. Rehabilitation Counseling Bulletin, 48(3), 170-171.

70. Confidence Intervals INTERNAL USE FIG. 01s03f03 INTERNAL USE FIG. 01s03f03

73. Bayesian vs. Classical Inference There are 2 main camps of Statistical Inference: Frequentist (classical) statistical inference Bayesian statistical inference Bayesian inference incorporates �past knowledge� about the probability of events using �prior probabilities� Bayesian paradigm assumes parameters of interest follow a statistical distribution of their own; Frequentist inference assumes parameters are fixed Statistical inference is then performed to ascertain what the �posterior probability� of outcomes are, depending on: the data the assumed prior probabilities

74. Schedule

Probability and Statistical Inference Gehlbach: Chapter 8

Probability and Statistical Inference Gehlbach: Chapter 8

Presentation Transcript

Chapter 8 Statistical inference: Significance Tests About Hypotheses

Probability The Basis of the Statistical inference

Probability & Statistical Inference Lecture 5

Probability & Statistical Inference Lecture 1

Inference and Probability

Probability & Statistical Inference Lecture 6

Chapter 8 Statistical Inference and Sampling

Probability & Statistical Inference Lecture 6

Probability & Statistical Inference Lecture 4

Probability & Statistical Inference Lecture 3

Probability & Statistical Inference Lecture 5

Probability & Statistical Inference Lecture 9

Probability & Statistical Inference Lecture 4

Monday, October 15 Statistical Inference and Probability

Chapter 2: Statistical Inference

Introduction to Probability theory and statistical inference

Chapter 8: Introduction to Statistical Inference

Probability & Statistical Inference Lecture 2

Probability & Statistical Inference Lecture 3

Chapter 4: Sampling and Statistical Inference

Probability and Statistical Inference (9th Edition) Chapter 5 (Part 2/2)

Probability and Statistical Inference Gehlbach: Chapter 8