Lecture #7

Lecture #7 Comparisons between two means: Research questions about two separate or independent groups Research questions about two dependent or correlated groups

Univariate vs. Multivariate • Univariate analysis usually refers to one predictor variable and one outcome variable • Is gender a predictor of pneumonia? • Multivariate analysis usually refers to more than one predictor variable or more than one outcome variable being evaluated simultaneously. • After adjusting for age, is gender a predictor of pneumonia?

Difference vs. Association • Some tests are designed to assess whether there are statistically significant differences between groups. • Is there a statistically significant difference between the age of patients with and without pneumonia? • Some tests are designed to assess whether there are statistically significant associations between variables. • Is the age of the patient associated with the number of days in the hospital?

Unmatched vs. Matched • Some statistical tests are designed to assess groups that are unmatched or independent. • Is the admission systolic blood pressure different between men and women? • Some statistical tests are designed to assess groups that are matched or data that are paired. • Is the systolic blood pressure different between admission and discharge?

Hypothesis testing (Review) • In any hypothesis testing situation, we first need to define the null hypothesis. It is “under the null hypothesis” that we will figure out how our test statistic is distributed. • Knowing how our test statistic is distributed will allow us to use the corresponding probability distribution • We start by assuming that null hypothesis is true. This gives us the acceptance and rejection regions defined at our chosen significance level for the distribution "under the null hypothesis"

Hypothesis testing (Review) • Note: The null hypothesis, H0 and the alternative hypothesis Ha should be mutually exclusive and exhaustive. • 2 types of errors. • 1. Type I error: reject a true null hypothesis. (commonly called α) • 2. Type II error: fail to reject a false null. (commonly called β) • Consider a jury's hypothesis: • H0: The defendant is innocent. • Ha: The defendant is guilty. • Therefore: • Type I error would entail a false conviction of an innocent person. • Type II error would entail letting a guilty person go free.

Hypothesis Testing (Review) • Significance tests are used to accept or reject the null hypothesis. • This is done by studying the sampling distribution for a statistic. • If the probability of observing your result is < .05, reject the null • If the probability of observing your result is > .05, accept the null. • There are many kinds of significance tests for different kinds of statistics. Today we’re going to discuss t-tests.

2-Sample T-Tests Independent t-test Dependent t-test Picking the correct test

t-test example • We are interested in whether caffeine consumption improves people’s happiness. • We randomly assign 25 people to drink decaf and 25 people to drink regular coffee. • Subsequently we measure how happy people are. • Note: The independent variable is categorical (you’re in one group or the other), and there are only two groups. • The dependent variable is continuous—we measure how happy people are on a continuous metric.

t-test example (cont) • Let’s say we find that the control group has a mean score of 3 (SD =1) and the experimental group has a mean score of 3.2 (SD = .9). • Thus, there is a .20 difference between the two groups. [3.2 – 3.0 = .2] • Two possibilities • The .2 difference between groups is due to sampling error, not a real effect of caffeine. In other words, the two samples are drawn from populations with identical means and variances. • The .2 difference between groups is due to the effect of caffeine, not sampling error. In other words, the two samples are drawn from populations with different means (and maybe different variances).

Population for control group Population for experimental group These two populations have identical means and variances These two samples may or may not have identical means and variances because of sampling error hence, one sample mean might be .2 points higher than the other

t-test example (cont) • We need to know how likely it is that we would observe a difference of .20 or higher if the null hypothesis is true. • How can we do this? • We can construct a sampling distribution of mean differences—assuming the null hypothesis is true. • We can use this distribution to determine how large of mean difference we will observe on average when the population mean difference is zero.

Assumptions: 2-Sample T-Test • Data in each group follow a normal distribution. • For pooled test, the variances for each group are equal. • The samples are independent. That is, who is in the second sample doesn’t depend on who is in the first sample (and vice versa).

Indep t-test: formula Actual difference observed. (For our purposes, always zero) • Standard Error of the Difference (between the means) • difference expected between sample means • how much we expect the sample means to differ purely by chance

Ind. t-test: Example

Hypothesis Testing Steps (Ind. t) 1. Comparing xbar1and xbar2, μ and σ unknown. 2. H0: μ1 – μ2 = 0; HA: μ1 – μ2 ≠ 0 • α = .05, df = n1+n2–2 = 5 + 5 - 2 = 8 tcritical = 2.306 4. tcalculated = -1.947 5. Accept (Fail to reject) the H0 . • The research hypothesis was not supported. • The weight of women in sororities (xbar=111) does not differ significantly from that of other women (xbar=127), t(8)= -1.947, n.s.. (not needed if using SPSS)

Ind. t-test: Example (SPSS)

Steps of Hypothesis Testing • In an ideal research setting, we define a strategy to follow this order: • 1. Formulate hypothesis • 2. Figure out what test statistic will test this hypothesis • 3. Collect data • 4. Perform the statistical test • 5. Accept or reject the hypothesis. • 4 elements in any hypothesis test. • 1. A null hypothesis: H0 • 2. An alternative hypothesis, Ha • 3. A test statistic (how calculate, its distribution) • 4. A rejection region (you want to be in the rejection region!)

Caution on hypothesis testing • In the jury, the social choice is as to what constitutes an acceptable risk -- to decide the probability of type I error. • Generally, we refer to alpha (α) as the significance level, which is the highest limit we set on the probability of Type I error. • In most statistical situations, by convention we set α = 0.05. But in the criminal case, society may chose α = 0.001 or even α =0.0001 such that we increase the probability of letting the guilty go free so as not to falsely convict an innocent person. • If one reduces Type I error, one by necessity increases Type II error.

What happens if samples aren’t independent? That is, they are “dependent” or “correlated”?

Ways Pairing Can Occur • When subjects in one group are “matched” with a similar subject in the second group. • When subjects serve as their own control by receiving both of two different treatments. • When, in “before and after” studies, the same subjects are measured twice.

What is the effect of alcohol on useful consciousness? • Ten male subjects taken to a simulated altitude of 25,000 ft and given tasks to perform. • For each, time (in seconds) at which “useful consciousness” ended was recorded. • 3 days later, experiment was repeated one hour after subjects ingested 0.5 cm3 of 100-proof whiskey per pound of body weight.

What is the effect of alcohol on useful consciousness? H0: D = 0 vs. H0: D > 0 Paired T for NoAlcohol - Alcohol N Mean StDev SE Mean NoAlcohol 10 546.6 238.8 75.5 Alcohol 10 351.0 210.9 66.7 Difference 10 195.6 230.5 72.9 95% CI for mean difference: (30.7, 360.5) T-Test of mean difference= 0(vs > 0): T-Value = 2.68 P-Value = 0.013

What is the effect of time on memory recall? • 8 people were given 10 minutes to memorize a list of 20 nonsense words. • Each was asked to list as many words as he or she could remember after 1 hour and again after 24 hours.

What is the effect of time on memory recall? Paired T for 1hour - 24hour N Mean StDev SE Mean 1hour 8 12.75 3.69 1.31 24hour 8 9.13 3.52 1.25 Difference 8 3.625 2.066 0.730 95% CI for mean difference: (1.897, 5.353) T-Test of mean difference = 0 (vs not > 0): T-Value = 4.96 P-Value = 0.001

Do males earn higher average starting salaries than females? (in $1,000s)Males Females 22 20 29 28 80 78 35 32 Sample Average: $41.5 $39.5 Real question is whether males and females in the same job earn different average salaries. Better to compare the difference in salaries in “pairs” of males and females.

Paired Study Salaries (in $1,000s) JobMales Females Difference=M-F Non-Profit 22 20 2.0 Education 29 28 1.0 Doctor 80 78 2.0 Scientist 35 32 3.0 Averages 41.5 39.5 2.0 P-value = How likely is it that a paired sample would have a difference as large as $2,000 if the true difference were 0? Problem reduces to a One-Sample T-test on differences!!!!

The Paired-T Test Statistic • If: • there are n pairs • and the differences are normally distributed Then: The test statistic, which follows a t-distribution with n-1 degrees of freedom, gives us our p-value:

The Paired-T Confidence Interval • If: • there are n pairs • and the differences are normally distributed Then: The confidence interval, with t following t-distribution with n-1 d.f. estimates the actual population difference:

Data analyzed as Paired T Paired T for M - F N Mean StDev SE Mean M 4 41.5 26.2 13.1 F 4 39.5 26.1 13.1 Difference 4 2.000 0.816 0.408 95% CI for mean difference: (0.701, 3.299) T-Test of mean difference = 0 (vs not = 0): T-Value = 4.90 P-Value = 0.016 P = 0.016. Reject null. Sufficient evidence to conclude that average starting salaries differ between males and females.

Now, Data analyzed as 2-Sample T Two sample T for M vs F N Mean StDev SE Mean M 4 41.5 26.2 13 F 4 39.5 26.1 13 95% CI for µ M - µ F: ( -43, 47) T-Test µ M = µ F (vs not =): T = 0.11 P = 0.92 DF = 6 P = 0.92. Do not reject null. Insufficient evidence to conclude that average starting salaries differ between males and females.

What happened? • P-value from two-sample t-test is just plain wrong. (Assumptions not met.) • We removed or “blocked out” the extra variability in the data due to differences in jobs, thereby focusing directly on the differences in salaries. • The paired t-test is more “powerful” because the paired design reduces the variability in the data.

Example 3 • You have a very important psychological question: Is apple pie preferred over pecan pie? You give 9 people a slice of apple pie and a slice of pecan pie. Since you are such a skilled researcher you present the slices of pie in a counterbalanced order across subjects. You measure the number of grams of apple and pecan pie that each person eats.

What test? Related Samples t-test • Level of significance? 2. State IV, levels of IV, and DV? 3. One-tailed or two-tailed test? 4. Hypotheses? 5. Critical value? df = 8, tcrit = + 1.860

Example Cont.

Steps 6 & 7 t obs = - 1.095 t obs < t crit, therefore we fail to reject the null hypothesis. Apple pie is not preferred over pecan pie.

Paired Samples Test Paired Differences 95% Confidence Interval of the Difference Std. Std. Error Sig. Mean Deviation Mean Lower Upper t df (2-tailed) Pair APPLE - -10.00 27.39 9.13 -31.05 11.05 -1.095 8 .305 1 PECAN

Lecture #7

Lecture #7

Presentation Transcript

LECTURE

Lecture 25 Lecture 26

Lecture

Lecture

Lecture VIII Lecture IX

Lecture

Lecture 10 Lecture 10 Lecture 11 Lecture 11 Lecture 11 Lecture 11

Lecture S1: Sample Lecture

Lecture