We rarely rely on single data points to tell us anything about the population perhaps because most experimental effects are not that clear and we probably don't know what the natural variation of the population is on the measure we are using although this is not always the case (e.g. diagnostic or selection tests).
1. The Standard Error of the Mean Psychology 203
2. We rarely rely on single data points to tell us anything about the population perhaps because most experimental effects are not that clear and we probably don’t know what the natural variation of the population is on the measure we are using although this is not always the case (e.g. diagnostic or selection tests) When we don’t know the population values we take samples to estimate the population mean and SD
We use differences between samples to make decisions regarding the impact of our experimental manipulation
Every sample we take will be different - rarely will two samples be the same.
Yet all samples only provide estimates (information) about the population from whence they came (e.g. Means and standard deviations).
Often samples we take will be similar but sometimes they are very different even if the samples come from the same population.
Our question is “Do the samples come from the same or different populations?”
Lets look at means generated from samples of different sizes
3. What the previous slides show There is more variability when the sample size is small
Larger differences between means are observed with small samples
Simply looking at mean differences does not give us the information we need to draw inferences about the presence or absence of an experimental effect – we need to know something about probability
We need to know what the distribution of mean differences is and therefore what the expected frequency is of those mean differences amongst the set of possible mean differences generated from repeated sampling
4. What would we expect to happen by chance alone? If there is no experimental effect
repeated sampling should reveal 50% overestimates and 50%underestimates of the true mean if the sample come from a common underlying population
If there is an experimental effect
If the sample comes from different populations the proportion of scores falling above and below the mean of the “standard” population will change over time as we take more samples.
5. What affects the ability to detect an effect? The greater the difference (or the stronger the experimental effect) the quicker the changes in the proportions falling above and below the means will be observed with repeated sampling
The size of the sample also will impact on the speed with which changes in the proportions falling above and below the mean will be observed if there is an experimental effect.
Big samples more accurately estimate the population mean therefore sample means will “bounce around” less.
Differences of equal size, but calculated from different sample sizes, have different expected frequencies
6. How does this happen? Extreme scores exert a bigger impact in small samples making the means of small samples more variable.
The consequence is it is harder to know if the sample difference due to error or the experiment when the sample size is small as big differences are more likely to be observed
We need to know - given the range of possible sample means:
How likely are large differences in sample means across groups likely to have occurred by chance (sometimes referred to as sampling error)?
7. Group Sampling, Probability and Statistical Tests We don’t usually try and answer this directly but rely on statistical tests to answer the question “do these groups differ”
But how do the tests work?
Tests tell us how likely it is that an observed set of data departs from expectation.
With tests of mean differences (e.g., t-test ANOVA) the test examines how likely an observed difference between sample means would have occurred if the data had all been drawn from the same underlying population
This is a test of the null hypothesis that the samples were taken from the same population.
8. Observed differences When does a difference make a difference?
Are all differences the same?
Two sets of means can be different by the same magnitude but they can be interpreted very differently depending on the sample sizes.
Sometimes deciding whether a difference is important needs to be assessed with respect to previous research and theory.
Before proceeding further lets look at some sample distributions of means
9. Calculating the SEM It has been shown that the standard deviation of the sample means is:
SEM= Standard Error of the Mean
Not to be confused with SEM Standard error of Measurement
10. What is the SEM The SEM is a standard deviation
It is therefore a measure of spread or variance around a mean
We know then that if we take a random sample that we have a 68% chance that the sample mean will be less than or equal to one SEM from the true population mean.
The sample mean plus or minus the value of the SEM gives us the 68% confidence interval
The sample mean plus or minus 1.96 * SEM gives us the 95% confidence interval of where the true population mean lies
11. Standard Error of the Mean The SEM IS affected by sample size as the formula suggests.
Consequently as the sample gets larger our estimate of the true mean gets better and we can be more confident about where the true mean lies.
The practical consequence of this is that even small differences can appear “statistically odd” denoting a probable experimental effect rather than sampling error
Statistical significance aside, we have to ask what does the result mean, theoretically, practically, clinically?
12. Sampling Distribution of the Mean (SEM) Summary Repeated sampling from a population produces a distribution of means which is bell shaped.
The properties of the distribution are that most samples yield means close to the true population mean but they vary.
Plotting the means repetitiously would produce a sampling distribution of means that is “normal.”
Normal distributions have certain properties that are useful.
For one thing we can specify a criterion defining “oddness” or, if you like, difference. This is sometimes referred to as statistical significance.
13. Beginning to use tests: An Hypothetical Example The case of Sizewell B (UK)
Nuclear Power Plant in Cumberland
A number of children developed Leukaemia in the local area
An appraently large nmber of cancers in the local area seemed to indicate something was amiss
A court case was set up by parents to seek compensation for the illnesses and loss of life.
14. Establishing Cause and Effect Parents had to establish the link
The defense involved challenging the notion that the cluster was caused by something other than chance.
Their argument was that chance alone could account for the cluster of observed terminal illness.
Chance alone can produce apparently meaningful patterns in data.
The problem is to identify what outcomes are reasonable by chance and what goes beyond chance.
This is one of the fundamental problems for inferential statistics.
15. Hypothesis Testing Is intended to help researchers differentiate real and random patterns
In discussing hypothesis testing you will realise it is a logical process that uses
If you understand these concepts this topic is easy!
16. The Logic of Hypothesis Testing Hypothesis testing is a procedure that uses sample data to evaluate an hypothesis about a population parameter (e.g., mean, Std Deviation)
17. Steps in HypothesisTesting 1 State the hypothesis – Living close to a nuclear power plant causes cancer
Before selecting the sample we use the hypothesis to help identify what the sample should be.
If we don’t know the incidence of cancer in the population we need to identify the cancer base rate or find some other indication of likelihood of developing cancer
18. Steps in HypothesisTesting II We obtain a random sample of individuals living close to nuclear power plants (not just one)
Compare the obtained sample data with the prediction that was made from the hypothesis
The problem with this example is that there are lots of things we didn’t control that might lead to alternative explanations of the results (e.g. social class, exposure to other toxins)
19. A treatment is administered to a sample and the treated sample can be compared with the original, untreated population. In the power plant example suppose we collected data on some biological index, X, known to be a pre-cursor to cancer. The “quasi independent treatment variable” is “lives near plant” vs “lives far away”
21. Step 1: State the hypothesis State the hypothesis about the unknown population
The null hypothesis (H0) – Being exposed to background radiation equivalent to that found close to a power station has no effect on X (the measure of cancer pre-cursor).
The alternative hypothesis (H1) - Being exposed to background radiation equivalent to that found close to a power station WILL have an affect on X (the measure of cancer precursor).
Note at this stage we have not said whether X will go up or down.
We need to distinguish between directional and non-directional hypotheses
22. The Decision Criterion
23. The Decision Criterion
24. The Decision Criterion
25. Definition Alpha Level – (Level of significance) is a probability value used to define very unlikely outcomes if the Null Hypothesis is true.
Critical Region – Boundaries determined by the alpha level. If sample data fall in the critical region the Ho is rejected in favour of H1
Boundaries for the critical region defined by the Z score location. With a=.05 the boundaries separate the extreme 5% from the middle 95%
As the extreme is split across 2 tails of the distribution Z= ± 1.96
For a=.01 Z=± 2.58; for a=.001 Z=± 3.30
26. The locations of the critical region boundaries for three different levels of significance: ? = .05, ? = .01, and ? = .001.
27. Step 3: Collect data and compute sample statistics Obtain random sample and measure sample on key dependent variable
Note the data should be collected after the hypothesis has been generated (retro-fitting is not good practice)
Summarise the data (mean)
Compare the sample mean (say its 29) with the null hypothesis (mean(µ) = 26 Std Deviation (s)=4)
28. Compare Mean to Null Hypothesis
29. Make a Decision
30. Rejection vs Proof Logically it is easier to demonstrate a universal (population) hypothesis is false than it is true.
For example is the statement “all swans are white” tested if the next swan we see is white?
Or, if the next swan is black, is this a better test?
In experimental research we assume no effect then show that this assumption is (probably) incorrect
31. Uncertainty and Error in Hypothesis Testing There is always a chance that the selected sample can mislead us (however small)
A type 1 error occurs when Ho is rejected when it is true.
The alpha level (a=.05) is the probability of making a type 1 error.
A type II error occurs when we fail to reject Ho when it is really false. (ß)
32. Assumptions of the z test Random Sampling – important with respect to equality and representativeness.
Independent Observations in the sample – if this is violated the standard error is under estimated and the type 1 error rate may be a lot larger than you think.
s is unchanged by the treatment. We assume the treatment adds a constant. It is perfectly possible that there is a person by treatment effect.
Normal Sampling Distribution – to evaluate hypotheses with z scores we use a unit normal table to identify the critical region.
35. A sample is selected, then the sample mean is computed and placed in a frequency distribution. This process is repeated over and over until all the possible random samples are obtained and the complete set of sample means is in the distribution.
36. An example of a typical distribution of sample means. Each of the small boxes represents the mean obtained for one sample.
42. The distribution of sample means for random samples of size (a) n = 1, (b) n = 4, and (c) n = 100 obtained from a normal population with µ = 80 and s = 20. Notice that the size of the standard error decreases as the sample size increases.
46. 95% Confidence Interval
47. 90% Confidence Interval
48. 80% Confidence Interval
52. Using probability to evaluate a treatment effect.