Outline • Basic Idea • Different types of probability • Definitions and Rules • Conditional and Joint probabilities • Essentials of understanding stats • Discrete and Continuous probability distributions • Density • Permutations • A visit to the Binomial distribution • The Bayesian approach
The Problem with Probabilities • Can be very hard to grasp • e.g. Monty Hall problem • TV show “Let’s make a deal” • 3 closed doors, behind 1 is a prize (others have “goats”) • Select a door • Monty Hall opens one of the remaining doors that does NOT contain a prize • Now allowed to keep your original door or switch to the other one • Does it make a difference if you switch? • http://www.stat.sc.edu/~west/javahtml/LetsMakeaDeal.html
Properties of probabilities • 0≤ p(A) ≤ 1 • 0 = never happens • 1 = always happens • A priori definition • p(A) = number of events classifiable as A total number of classifiable events • A posteriori definition • p(A) = number of times A occurred total number of occurrences
Properties of probabilities • So: • p(A)= nA/N = number of events belonging to subset A out of the total possible (which includes A). • If 6 movies are playing at the theater and 5 are crappy but 1 is not so crappy what is the probability that I will be disappointed? • 5/6 or p = .8333
Probability in Perspective • Analytic view • The common approach: if there are 4 bad movies and one good one I have an 80% chance in selecting a bad one • Fisher • Relative Frequency view • Refers to the long run of events: the probability is the limit of chance i.e. in a hypothetical infinite number of movie weekends I will select a bad movie about 80% of the time • Neyman-Pearson • Subjective view • Probability is akin to a statement of belief and subjective e.g. I always seem to pick a good one • Bayesian
Some definitions • Mutually exclusive1 • both events cannot occur simultaneously • A + !A = impossible • Exhaustive sets • set includes all possible events • the sum of probabilities of all the events in the set = 1
Some definitions • Equal likelihood: roll a fair die each time the likelihood of 1-6 is the same; whichever one we get, we could have just as easily have gotten another • Counter example- put the numbers 1-7 in a hat. What’s the probability of even vs. odd? • Independent events: • occurrence of one event has no effect on the probability of occurrence of the other
Laws of probability: Addition • The question of Or • p(A or B) = p(A) + p(B) • Probability of getting a grape or lemon skittle in a bag of 60 pieces where there are 15 strawberry, 13 grape, 12 orange, 8 lemon, 12 lime? • p(G) = 13/60 p(L) = 8/60 • 13/60 + 8/60 = 21/60 = .35 or a 35% chance we’ll get one of those two flavors when we open the bag and pick one out
Laws of probability: Multiplication • The question of And • If A & B are independent • p(A and B) = p(A)p(B) • p(A and B and C) = p(A)p(B)p(C) • Probability of getting a grape and a lemon (after putting the grape back) after two draws from the bag • p(Grape)*p(Lemon) = 13/60*8/60 = ~.0288
Conditional Probabilities and Joint Events • Conditional probability • One where you are looking for the probability of some event with some sort of information in hand • e.g. the odds of having a boy given that you had a girl already.1 • Joint probability • Probability of the co-occurrence of events • E.g. Would be the probability that you have a boy and a girl for children i.e. a combination of events • In this case the conditional would be higher because if we knew there was already a girl that means they’re of child-rearing age, able to have kids, possibly interested in having more etc.
Conditional probabilities • If events are not independent then: • p(X|Y) = probability that X happens given that Y happens • The probability of X “conditional on” Y • p(A and B) = p(A)*p(B|A) • Stress and sleep relationship conditioned on gender • Little relation for fems, negative relation for guys • The observed p-value at the heart of hypothesis testing is a conditional probability • p(Data|H0)
Joint probability • When dealing with independent events, we can just use the multiplicative law. • Joint probabilities are of particular interest in classification problems and understanding multivariate relationships • E.g. Bivariate and multivariate normal distributions ?
Simpson’s paradox • Success rates of a particular therapy • What’s wrong with this picture? • Is the treatment a success?
Discrete probability distribution • Involves the distribution for a variable that takes on only a few values • Common example would be the Likert scale
Continuous probability distribution • We often deal with continuous probability distributions in inference, the most famous of which is the normal distribution • The height of the curve is known as the density • We expect values near the ‘hump’ to be more common
Permutations • Counting is a key part of understanding probability (e.g. we can’t tell how often something occurs if we don’t know how many events occur in general). • Some complexity arises when we consider whether we track the order and whether events are able to be placed back for future selection.1 • How many ways can a set of N units be ordered? • Factorial • Permutations of size k taken from N objects • Ordered, without replacement • There are 5 songs on your top list, you want to hear any combination of two. How many pairs of songs can you create? In this case ab != ba, i.e. each ordering counts • 20
Permutations • Combinations: finding the number of combinations of k objects you can choose from a set of n objects • Unordered, without replacement • In this case, any pair considered will not be considered again • i.e. ab = ba • From our previous example, there are now only 10 unique pairs to be considered • The combination described above will come back into play as we discuss the binomial
The Binomial • Bernoulli trials = 2 mutually exclusive outcomes • Distribution of outcomes • Order of items does not matter • Only the probability of various outcomes in terms of e.g. numbers of heads and tails • N = # trials = 3
Coin toss • How many possible outcomes of the 3 coin tosses are there? • List them out: HHH HHT HTT TTT TTH THH THT HTH • Now condense them ignoring order • e.g. HTT = THT = flips result in only 1 heads • What is the probability of 0 heads, 1 heads, 2 heads, 3 heads?
Distribution of outcomes • Now how about 10 coin flips? • That’d be a lot of work writing out all the possibilities. • What’s another way to find the probability of coin flips? • Use the formula for combinations
Binomial distribution • Find a probability for an event using: • N = number of trials • r = number of ‘successes’ • p = probability of ‘success’ on any trial • q = 1-p (probability of ‘failure’) • CNr=The number of combinations of N things taken r at a time
So if I want to know the odds of getting 9 heads out of 10 coin flips or p(H,H, H,H, H,H, H,H, H,T): • p(9) = • 10(.001953)(.5)=.0098 = .01
Using these probabilities • What is the probability of getting 4 or fewer heads in 10 coin tosses? • Addition • p(4 or1 less) = p(4) + p(3) + p(2) + p(1) + p(0) = • .205 + .117 + .044 + .010 + 001 = • p = .377 • About 38% chance of getting 4 or fewer heads on 10 flips
Test a Hypothesis • Now take it out a step. • Suppose you were giving some sort of treatment to depressed individuals and assumed the treatment could work or not work, and in general would have a 50/50 chance of doing so if it wasn’t anything special (i.e. just a placebo). Then it worked an average of 9 times out 10 administrations. • Would you think there was something special going on or that it was just a chance occurrence based on what was expected? • p = p(9) + p(10) = .011
Not just 50/50 • Not every 2 outcome situation has equal probabilities associated with each option • There are two parameters we are concerned with when considering a binomial distribution • 1. p = the probability of a success. (q is 1-p) • 2. n = the number of (Bernoulli) trials • More info about binomial distribution • m = Np • s2=Nqp • In R • Rcmdr (Distribution menu) • ?pbinom (command line) • Approximately “normal” curve when: • p is close to 0.5 • If not then “skewed” distribution • N large • If not then not as representative a distribution
Examples • Small N p = .8 N = 10
Bayesian Probability • Thomas Bayes (c. 1702 –1761) • The Bayesian approach involves weighing the probability of an event by prior experience/knowledge, and as such fits in well with accumulation of knowledge that is science. • As new evidence presents itself, we will revise our previous assessment of the likelihood of some event • Prior probability • Initial assessment • Posterior probability • Revised estimate
Bayesian Probability With regard to hypothesis testing: p(H0) = probability of the null hypothesis p(D|H0) = the observed p-value we’re used to seeing, i.e. the probability of the data given the null hypothesis p(H1) = probability of an alternative1 p(D|H1) = probability of the data given the alternative hypothesis
Empirical Bayes method in statistics • Bayesian statistics is becoming more common in a variety of disciplines • Advantages: all the probabilities regarding hypothesis testing make sense, interval estimates etc. are what we think they are and what they are not in null hypothesis testing • Disadvantage: if the priors are not well thought out, could lead to erroneous conclusions • Why don’t we see more of it? • You actually have to think of not only ‘non-nil’ hypotheses but perhaps several viable competing hypotheses, and this entails: • Actually knowing prior research very well1 • Not being lazy with regard to the ‘null’, which now becomes any other hypothesis • We will return with examples regarding proportions and means later in the semester.
Summary • While it seems second nature to assess probabilities, it’s actually not an easy process in the scientific realm • Knowing exactly what our probability regards and what it does not is the basis for inferring from a sample to the population • Not knowing what the probability entails results in much of the misinformed approach you see in statistics in the behavioral sciences