Psychology 10 Analysis of Psychological Data February 26, 2014
The Plan for Today • The law of large numbers. • The frequentist approach to probability. • Areas under the normal curve. • Rules for combining probabilities. • The binomial distribution. • Introducing the idea of the sampling distribution.
Probability distributions • A probability distribution is the set of values that a random variable could take on, if we were to observe it… • …together with the long-run relative frequencies or probabilities of those values.
The law of large numbers • A phenomenon known as “the law of large numbers” states that if a random process is repeated a large number of times, the proportion of times that a particular event occurs will approach the probability of that event. • If we were to toss our unfair coin a very large number of times, the proportion of “heads” outcomes would approach .7.
Frequentist approach to probability • That idea leads to the frequentist approach to probability. • Frequentists define probability as long-run relative frequency. • Note that the frequentist definition and the analytical approach to probability converge. • For example, if you were to draw a card with replacement a very large number of times, in the long run the proportion of times you draw the Jack of Hearts will approach 1/52.
Another type of random variable • Imagine that we are going to observe the heights of adult white males. • I can imagine that there would be a lot of them around 69.5 inches tall. • As I think about values further from 69.5 inches, I expect that they would occur less frequently.
Continuous random variables • I have just drawn a probability distribution. • But how many possible values are there? • If I let those values take on positive values, the probabilities won’t sum to 1.0. • Hence, the probability of a continuous random variable taking on any particular value must be zero.
Continuous random variables (cont.) • We can still draw pictures of the relative likelihoods of values. • But events must be defined in terms of ranges of those values. • For example, we can look at this picture and estimate that the probability of observing an adult white male who is taller than 69.5 inches is about ½.
Another continuous random variable • The uniform distribution. • The relative likelihood of the possible values is a rectangle. • Curves like these (for continuous random variables) are not probabilities. • The technical term for such a curve is probability density function (or pdf for short).
The normal probability density function • How to draw a normal curve. • How to find probabilities associated with particular ranges of values under the normal curve. • Using the unit normal table.
Normal probabilities • Here are a few probabilities for the normal distribution that come up frequently. You should commit those in bold to memory: • The probability that a random draw will be within 1 sd of the mean is about .68. • The probability that a random draw will be greater than 1.645 is .05. • The probability that a random draw will be greater than 1.96 is .025. • The probability that a random draw will be greater than 2.576 is .005.
Some terms related to probability • An event is a well defined outcome of a random process. • Examples: coin = heads, exam score > 79. • Two events are mutually exclusive if they cannot both occur. • A coin cannot be both H and T on one toss. • Two events are independent if knowing something about one event tells you nothing about the other.
Rules for combining probabilities • The addition rule: • If two events (A and B) are mutually exclusive, then P(A or B) = P(A) + P(B). • Example: What is the probability that a single roll of a fair die will be 1 or 2? • Note that a single roll cannot be both 1 and 2, so the events are mutually exclusive. • P(1 or 2) = P(1) + P(2) = 1/6 + 1/6 = 1/3.
The addition rule (cont.) • What is the probability that a randomly observed adult male will be over 6 feet tall or less than 5 feet 6 inches tall? • Over 6 feet: (72 – 69.2) / 2.8 = 1.0. The area above 1.0 in a standard normal curve is .1587. • Under 5 feet 6 inches: (66 – 69.2) / 2.8 = -1.14. The area below is about .1271. • .1587 + .1271 = .2858.
The multiplication rule • The probability of two independent events A and B both occurring is P(A) × P(B). • Events are independent if knowing about the outcome of one tells us nothing at all about the outcome of the other.
Independent events • Are these independent? • Ethnicity and eye color? • No • Age and annual income? • No • One coin toss and a second coin toss? • Yes • One randomly chosen IQ score and a second randomly chosen IQ score? • Yes
The multiplication rule (cont.) • So I cannot (without more tools) answer a question like “What is the probability that a randomly observed person will be Caucasian and blue eyed?” • But I can answer questions like “What is the probability that two randomly observed IQs are both > 120?” (If I know the distribution of IQ.)
The multiplication rule (cont.) • Many IQ tests are designed to have mean = 100, sd = 15. • P(one IQ is > 120)? • (120 – 100) / 15 = 1.33. • Area > 1.33 = .0918. • So the probability of two independent scores both being above 120 is .0918 × .0918 = .0084.
The multiplication rule (cont.) • BUT: What is the probability that the husband and wife in a randomly observed married couple will both have IQs above 120? • We cannot say, because the two events are not independent.
Why do we care about probability? • Probability is concerned with making statements about what will happen in the world, given that certain things are true. • Inferential statistics is concerned with making statements about what is true in the world, given what has happened. • Those are opposite interests. • Nevertheless, probability is a crucial tool for inferential statistics.
Introducing the concept of the sampling distribution • (Coin tossing exercise.)
In-class exercises • Three probability problems: • What is the probability that a single draw from a standard normal distribution will be greater than 0.24? • What is the probability that a single draw from a normal distribution with a mean of 60 and a standard deviation of 8 will be between 59 and 62? • What is the probability that both of those events will occur in two independent draws?