Probability & Statistics Review

Probability & Statistics Review Slide derived from class material by Prof. Carlos Guestrin, Carnegie Mellon University

The Big Picture Probability Model Data Estimation/learning But how to specify a model?

Graphical Models • How to specify the model? • What are the variables of interest? • What are their ranges? • How likely their combinations are? • You need to specify a joint probability distribution • But in a compact way • Exploit local structure in the domain • Today: we will cover some concepts that formalize the above statements

Probability Review • Events and Event spaces • Random variables • Joint probability distributions • Marginalization, conditioning, chain rule, Bayes Rule, law of total probability, etc. • Independence • Examples • Moments

Sample space and Events • S : Sample Space, sometimes denoted as W(omega) • The space that includes all the possible outcomes, thus P[S] = 1 • Example: If you toss a coin twice • S= {HH,HT,TH,TT} • Null space : • The space that contains no possible outcomes; thus P[] = 0 • Event: a subset of S • First toss is head = {HH,HT}

Probability Measure • Defined over S s.t. • 0 P(A) 1for all event A in S • P(S) = 1 • If A, B are disjoint (mutually exclusive), then • P(AB) = P(A) + P(B) • P(AB) = P[AB] = 0 • If A and B are not disjoint

Visualization • We can go on and define conditional probability, using the above visualization

Conditional Probability • P(A|B) = Probability of A given Event B. • We are interested in the probability of Event A when we know that Event B has already occurred. • Example: Probability that A B

Example: • The probability that it is Friday and that a student is absent is 0.03. Since there are 5 school days in a week, the probability that it is Friday is 0.2. What is the probability that a student is absent given that today is Friday? • Solution:

Other Examples • Probability that the USD will be weak given that the gold price increases. • Probability that there will be a tsunami given that there is an earthquake. • Probability that…..

Rule of total probability B2 B5 B3 B4 A B7 B6 B1

From Events to Random Variable • Concise way of specifying attributes of outcomes • Modeling students (Grade and Intelligence): • W = all possible students • What are events • Grade_A = all students with grade A • Grade_B = all students with grade B • Intelligence_High = … with high intelligence • Very cumbersome • We need “functions” that maps from W to an attribute space.

Random Variables W I:Intelligence High low A A+ G:Grade B

Random Variables W I:Intelligence High low A A+ G:Grade B P(I = high) = P( {all students whose intelligence is high})

Joint Probability Distribution • Random variables encodes attributes • Not all possible combination of attributes are equally likely • Joint probability distributions quantify this • P( X= x, Y= y) = P(x, y) • How probable is it to observe these two attributes together? • Generalizes to N-RVs • How can we manipulate Joint probability distributions?

Chain Rule • Always true • P(x,y,z) = p(x) p(y|x) p(z|x, y) • = p(z) p(y|z) p(x|y, z) • =…

Conditional Probability events X Y But we will always write it this way:

Marginalization • We know p(X,Y), what is P(X=x)? • We can use the law of total probability, why? B2 B5 B3 B4 A B1 B7 B6

Marginalization Cont. • Another example

Bayes Rule: in Practice • Suppose we know that P(smart) = .7 • If we also know that the student’s grade is A+, then how this affects our belief about his intelligence?

Bayes Rule cont. • You can condition on more variables

Independence • X is independent of Y means that knowing Y does not change our belief about X. • P(X|Y=y) = P(X) • P(X=x, Y=y) = P(X=x) P(Y=y) • Why this is true? • The above should hold for all x, y • It is symmetric and written as X  Y • Example: Probability that USD will be weak given that it’s raining today

Probability Review • Events and Event spaces • Random variables • Joint probability distributions • Marginalization, conditioning, chain rule, Bayes Rule, law of total probability, etc. • Structural properties • Independence, conditional independence • Examples • Moments

Monty Hall Problem • You're given the choice of three doors: Behind one door is a car; behind the others, goats. • You pick a door, say No. 1 • The host, who knows what's behind the doors, opens another door, say No. 3, which has a goat. • Do you want to pick door No. 2 instead?

Monty Hall Problem: Bayes Rule • : the car is behind door i, i = 1, 2, 3 • : the host opens door j after you pick door i Host opens the door you picked Host opens the door with the car You pick the winner door Host did not open the winner door

Monty Hall Problem: Bayes Rule cont. • WLOG, i=1, j=3 (you pick door 1, host opens door 3)

Monty Hall Problem: Bayes Rule cont.

Monty Hall Problem: Bayes Rule cont. • You should switch!

Brain Teaser Question: A rat is trapped in a maze. Initially it has to choose one of two directions. If he goes to the right, then he will wander around in the maze for 3 minutes and will then return to his initial position. If he goes to the left, then with probability 1/3 he will depart the maze after 2 minutes of traveling, and with probability 2/3 he will return to his initial position after 5 minutes of traveling. Assuming that the rat is at all times equally likely to go to the left or the right, what is the expected number of minutes that he will be trapped in the maze?

Continuous Random Variables • What if X is continuous? • Describe with Probability density function (pdf) instead of probability mass function (pmf) • A pdf is any function that describes the probability density in terms of the input variable x.

PDF • Properties of pdf • Actual probability can be obtained by taking the integral of pdf • E.g. the probability of X being between 0 and 1 is

Cumulative Distribution Function • Discrete RVs • Continuous RVs

Moments • Mean (Expectation): • Discrete RVs: • Continuous RVs: • Variance: • Discrete RVs: • Continuous RVs:

Properties of Moments • Mean • If X and Y are independent, • Variance • If X and Y are independent,

Example of Useful Distribution • Uniform • Normal or Guassian • Exponential • Poisson • Bernoulli • Binomial

What’s Next? Probability Model Data Estimation/learning

Statistical Inference • Given observations from a model • What (conditional) independence assumptions hold? • Structure learning • If you know the family of the model (ex, multinomial), What are the value of the parameters: Maximum Likelihood Estimation (MLE), Bayesian estimation. • Parameter learning

Self-Study & Presentation • Point Estimator • Minimum Mean Square Error • Maximum Likelihood • Interval Estimation • Confidence Interval • Regression Analysis • Fundamental Theory • Chebyshev’s Inequality and the Weak/Strong Law of Large Numbers • Central Limit Theorem

Probability & Statistics Review