11.1: Significance Tests

11.1: Significance Tests • 2nd type of inference • Assesses the evidence provided by the data in favor of some claim about the population • Asks how likely an observed outcome would be • Formal procedure for comparing observed data with a hypothesis whose truth we want to assess – in other words, we test a claim! • Begin with the unrealistic assumption that we know the population standard deviation.

Ha/Ho! STEP 1: P = Identify your Parameter STEP 2: H = State your Hypotheses • Make a claim and ask if the data gives evidence against it. • What we want to PROVE becomes your Ha. • Your statement of equivalency always becomes your Ho; the effect is not present in the population. Trying to find evidence against it. • Ho/Ha’s always refer to some population and thus must be written in terms of a population parameter. • One-sided/two-sided hypothesis

Some hypothesis notes… • Start with stating the alternative hypothesis, since this is the effect that we hope to find evidence for, then set up the null hypothesis as the statement that the hoped-for effect is not present • If you do not have a specific direction firmly in mind in advance, use a 2-sided alternative.

Vehicle accidents can result in serious injuries to drivers and passengers. When they do, someone usually calls 911. Police, firefighters, and paramedics respond to these calls as fast as possible. A city decides to record response times to all accidents involving life-threatening emergencies, and finds the mean response time to be 6.7 with a std. dev. of 2 minutes. The city manager tells them to “do better” next year. At the end of the next year, the manager selects an SRS of 400 calls and examines the response times. For this sample, the mean response time was 6.48 min. Does this data provide good evidence that response times have decreased since last year?

= 6.61 This result could occur just by change when the population mean is 6.7. Not good evidence of a decrease in response time. = 6.48  This result is much father from the pop. mean; an observed value this small would rarely occur by chance if the true pop. mean is 6.7. Good evidence of a decrease in response time. Exploring “good evidence”

Assumptions/Test Statistics • STEP 3: Assumptions: SRS, Normality, Independence • STEP 4: Test Statistic The test is based on a stat that compares the value of the parameter (as stated in Ho) with an estimate of the parameter from the sample data. Values of the estimate far from the parameter value in the direction of Ha give evidence against Ho. = estimate-hypothesized value Standard deviation of the estimate

We measure the strength of the evidence against Ho by the probability given to us by our z-score = p-value. • P-value = the probability of a result at least as far out as what we actually got. • A quantitative measure of just how unlikely a given finding is, assuming Ho is true. • The lower the p-value, the stronger the evidence against Ho; the observed value is unlikely to occur by chance. • Large p-values fail to give evidence against Ho.

Significance level is the value (alpha) to which we compare our p-value in order to determine significance. • “Statistically significant” = not likely to happen by chance • If p-value < alpha = statistically significant. • P-value allows us to assess significance at any level we choose. • If you are going to draw a conclusion based on statistical significance, then the level should be chosen BEFORE the data is produced.

STEP 5:Conclusion • There is about a 1.4% chance that the manager would obtain a sample of 400 calls with a mean response time of 6.48 minutes or less. • The small p-value provides STRONG evidence AGAINST Ho and in favor of the alternative Ha, so we conclude that the mean response time appears to be less than 6.7 minutes.

Does the job satisfaction of assembly workers differ when their work is machine-paced rather than self-paced? One study chose 28 subjects at random from a group of women who worked at assembling electronic devices. Half of them were assigned at random to each of two groups. Both groups did similar assembly work, but one work setup allowed workers to pace themselves and the other featured an assembly line that moved at fixed time intervals so that the workers were paced by machine. After 2 weeks, all subjects took a test of job satisfaction. Then they switched work setups, and took the test after two more weeks. The response variable is the difference in scores, self paced – machine paced. The authors of the study want to know if the two work conditions have different levels of job satisfaction. Data from 18 workers gave: SRS, Normality, Independent, x-bar = 17, pop. std. dev = 60.

Do all Steps: PHATC

Values as far from 0 as x-bar=17 would happen 23% of the time when the true population mean is 0 (Ho). An outcome that would occur so often when Ho is true is not good evidence against Ho. • Simple terms: Reject the Ho! P value too big.

Using Significance Tests • Widely used in reporting the results of research in applied science, industry, and legal proceedings • Some products require significant evidence of effectiveness and safety • Statistical significance is valued because it points to an effect that is unlikely to occur simply by chance

Same problem…different question! Sulfur compounds cause “off-odors” in wine, so winemakers want to know the odor threshold, the lowest concentration of a compound that the human nose can detect. The odor threshold for dimethyl sulfide (DMS) in trained wine tasters is about 25 micrograms per liter of wine( ). The untrained noses of consumers may be less sensitive, however. Here are the DMS odor thresholds for 10 untrained students: 31 31 43 36 23 34 32 30 20 24 Assume that the standard deviation of the odor threshold for untrained noses is known to be 7. Are you convinced that the mean odor threshold for beginning students is higher than the published threshold, 25 micrograms per liter of wine ( )? Carry out an appropriate significance test (and then state your conclusions clearly in complete English sentence(s).

At the bakery where you work, loaves of bread are supposed to weigh 1 pound. From experience, the weights of loaves produced at the bakery follow a Normal distribution with standard deviation s = 0.13 pounds. You believe that new personnel are producing loaves that are heavier than 1 pound. As supervisor of Quality Control, you want to test your claim at the 95% confidence level. You weigh 20 loaves and obtain a mean weight of 1.05 pounds. 1. Identify the population and parameter of interest. State your null and alternative hypotheses. • 2. Identify the statistical procedure you should use. Then state and verify the conditions required for using this procedure.3. Calculate the test statistic and the P-value. Illustrate using a graph. 4. State your conclusions clearly in complete sentences.

11.1: Significance Tests