Probability, part 2 (Ch. 13. Uncertainty)

Probability, part 2(Ch. 13. Uncertainty) Sept. 9, 2004 Jahwan Kim AIPR Lab Div. of CS, KAIST

Contents • Random Variables • Random variables and random vectors • Random Process • Joint Probability, Marginalization • Conditional Probability • Bayes formula • Puzzling examples • Independence Jahwan Kim, Probability

Random Variables • Suppose a probability P is defined on S. • Let V be either R or Rn. • Suppose X is a nice function on S with values in V. • “Nice function” in practical terms is “any function” you can think of. • To be precise, nice functions are measurable functions. • Then we may define a new probability PX on Vas follows: PX(A)=P(X-1(A)), for any nice A½V • That is, any function X on a probability space defines a new probability PX, defined on V. • This function X is called a random variable with values in V. Jahwan Kim, Probability

Random Variables • PX(A) is the probability that X2A. • Sometimes, we may reserve the term random variable only for those with values in R. • Typically, random variables with values in Rn are called random vectors. • Random vectors may be considered as a collection of random variables. • pdf of PX iscalled the pdf of the random variable/vector X. • Since PX is a probability on R or Rn, most of the times it will have the pdf. Jahwan Kim, Probability

Random VariablesContinuous and Discrete • When the values of X are discrete, X is called a discrete random variable. When X has a pdf as a true function and not as a generalized function i.e. distribution, X is called a continuous random variable. • There exists a random variable which is not discrete nor continuous, but they are not so interesting in general. • So we only consider a discrete or continuous random variables. • A discrete random variable is completely determined by its probability mass function. A continuous random variable is completely determined by its pdf. • Examples? Jahwan Kim, Probability

Random Process • A random variable with values in R1 is called a random process. • A random process is an infinite collection of random variables. • Examples • Xt = (strength observed by a sensor at time t).a temporal process • X(x,y) = brightness of an image at location (x,y).a spatial process • Will not consider random process in this course. Jahwan Kim, Probability

Joint Probability • Let X=(X1,…,Xn) be an n-dimensional random vector. • Each Xi is a random variable. • The probability PX is called the joint probability of X1,…,Xn. • The joint probability contains all the information necessary to reason about X1,…,Xn. • Do we know probability for Xi, when we know PX? And conversely, when we know probabilities for all Xi, do we know PX? Jahwan Kim, Probability

Marginalization • The answer to the first question is YES. • Suppose X=(X1,X2) is a discrete random variable. Suppose we know PX. Then P(X1=x) = P(X1=x and X22S2) = PX( X2{x}£S2 ) = PX( X2[j {(x,sj)} ) = j PX(x,sj) where S2 ={s1,…,sN} is the sample space for X2. The last summation is called a marginalization of X w.r.t. X2, or summing out X2. Jahwan Kim, Probability

Marginalization forContinuous Random Variables • Similarly, let X=(X1,X2) be a continuous random variable and suppose we know its pdf f. Then for any event A for X1, P(X12A) = PX(X2A£R) = sA£R fX(x1,x2) dx1dx2 = sA (sR fX(x1,x2) dx2 )dx1 = sA g(x1) dx1, where g(x1)=sR fX(x1,x2) dx2. • In other words, knowing the pdf f of X, we can recover the pdf of X1 by an integration of f w.r.t. all components of X other than X1. • Such integration is called marginalization or integrating out variables. Jahwan Kim, Probability

Joint Probability and Marginalization • Thus, knowing the joint probability of (X1,…,Xn) we can find probability for any Xi, via the process of marginalization. • What about the converse? Namely, if we know probabilities for all Xi, can we recover the joint probability? Jahwan Kim, Probability

Complexity of Joint Probability • Suppose X1,…,Xn are all discrete random variables, with the same sample space S of size N. • Knowing probability for Xi , Knowing probability mass function for Xi , Knowing a table of size N. • Therefore, knowing probabilities for each X1,…,Xn , Knowing a table of size nN. • But the sample space for the joint probability for X1,…,Xn is Sn, whose size is Nn. Therefore, Knowing the joint probability for X1,…,Xn , Knowing a table of size Nn • Thus the joint probability contains much, much more information than all its marginalization together. Jahwan Kim, Probability

Conditional Probability • Suppose X1,…,Xn represent the state of nature. • Sometimes we make observations, say X1=x. • Our knowledge about the state of nature necessarily changes after observation. • This is reflected in the language of probability, by conditional probability. • P(A|B) denotes the probability of event A when we know the event B occurred, and is called the conditional probability of A given B. • Similarly, for two random variables X and Y, when Y is fixed, we have a new random variable X|Y. Jahwan Kim, Probability

Conditional Probability • When B is observed, it defines the new probability P(¢|B). However, P(A|¢) with A fixed does NOT define a probability. • Question: Let A,A’ be mutually disjoint, and B,B’ be mutually disjoint. P(A[A’|B)=P(A|B)+P(A’|B)? P(A|B[B’)=P(A|B)+P(A|B’)? Jahwan Kim, Probability

Conditional Probability:Formulae • Formula for conditional probability: P(A|B)=P(AÅB)/P(B). • Product formula P(A,B)=P(A|B) P(B). ( P(AÅB) is usually denoted by P(A,B). ) • P(A|B)P(B)=P(A,B)=P(B|A)P(A), therefore P(A|B) = P(B|A)P(A) / P(B) which is the Bayes (inversion) formula. Jahwan Kim, Probability

Conditional pdf • Let f be the joint pdf for X,Y. Let fY be the pdf for Y. Then the pdf for X|Y is given by fX|Y=y(x)=f(x,y)/fY(y). • Namely, the pdf is only the renormalization (so that it integrates to 1) of the joint pdf. Jahwan Kim, Probability

More on Bayes Formula • Although simply obtained, Bayes formula is one of the key ingredient of modern probabilistic inference. • Some people take this formula axioms and derive other results from it. • P(A|B) / P(B|A)P(A) 1/P(B) is the constant to make entries P(B|A)P(A) a probability. • For random variables X and Y, P(Y|X) = P(X|Y)P(Y)/P(X) / P(X|Y)P(Y), i.e., proportional regardless of Y • In fact, P(X) can be computed as follows: P(X) = y P(X, Y=y) = y P(X|Y=y) P(Y=y) (Marginalization formula with conditional probability) Jahwan Kim, Probability

Conditional Probability:Examples • p. 475 of the textbook. Cavity, Toothache, and Catch • p. 483 Wumpus world • Puzzling example, from Minka’s tutorial My neighbor has two children. Assuming that the gender of a child is like a coin flip, it is most likely, a priori, that my neighbor has one boy and one girl, with probability 1/2. The other possibilities---two boys or two girls---have probabilities 1/4 and 1/4. Suppose I ask him whether he has any boys, and he says yes. What is the probability that the other child is a girl? By the above reasoning, it is twice as likely for him to have one boy and one girl than two boys, so it is twice as likely that the other child is a girl than a boy. So the probability is 2/3. Bayes' rule will give the same result. Suppose instead that I happen to see one of his children run by, and it is a boy. What is the probability that the other child is a girl? Now Bayes' rule gives 1/2, because observing the outcome of one coin has no affect on the other. If you don't believe this, draw a tree describing the possible states of the world and the possible observations, along with the probabilities of each leaf. Condition on the event observed by setting all contradictory leaf probabilities to zero and renormalizing the nonzero leaves. The two cases have two different trees and thus two different answers. Jahwan Kim, Probability

Conditional Probability:Puzzling Examples • Why such discrepancy? • From Minka’s: This seems like a paradox because it seems that in both cases we could condition on the fact that "at least one child is a boy." But that is not correct. • Moral of the story: It matters not only what the conclusion is, but also what the query is, in probabilistic reasoning. Jahwan Kim, Probability

Conditional Probability:Puzzling Examples • In fact, many such examples exist. Most famous one is the following ‘Three Prisoner’s Paradox’ (from J. Pearl, Probabilistic Reasoning in Intelligent Systems) Three prisoners A, B, and C have been tried and their verdict will be read and their sentences executed tomorrow morning. Only one of them will be declared guilty and hanged, while the other two will be acquitted. Prisoner A asks the prison guard in the middle of the night: “Please give this letter to one of B and C, the one who will be released. You and I know at least one of them will be freed.” After the guard does as A asked, A ask him again, “To whom did you give my letter? You can tell me because it won’t give me any clue about my own status.” The guard says, “I gave it to B.” Now A thinks, “Before my conversation with the guard, my chances of being executed were 33%. Now since one of C and I will be executed, the chances are now 50%. What did I do wrong?” Jahwan Kim, Probability

Conditional Probability:Puzzling Examples • The Thousand Prisoner Problem You are one of one thousand prisoners awaiting sentence. You know only one of you has been condemned. By some luck, you find a list of 998 prisoners, marked ‘innocent’. Your name is not among them. Should your chance of dying increase from 1/1000 to 1/2? • These examples all illustrate differences between logical inference and probabilistic one. For more in-depth discussion, see Pearl’s monograph. Jahwan Kim, Probability

Independence • There are cases where knowing all marginals is equivalent to knowing the joint. • Two events A and B are said to be independent iff P(A,B)=P(A)P(B). A and B are equivalent , P(A,B)=P(A)P(B) , P(A|B)=P(A) • Two random variables X and Y are independent iff P(X,Y)=P(X)P(Y). • Let fX, fY be the (marginal) pdf for X, Y, resp. Let f be the joint pdf for X,Y. Then X and Y are independent , f(x,y)=fX(x)fY(y). Jahwan Kim, Probability

Independence and Complexity • Let X1,…,Xn are all discrete random variables, with the same sample space S of size N. • If they are mutually independent, then knowing the joint probability for X1,…,Xn , knowing (marginal) probabilities for each X1,…,Xn , knowing (marginal) a table of size nN • Namely, when random variables are independent, their complexity is much reduced. Jahwan Kim, Probability

Probability, part 2 (Ch. 13. Uncertainty)

Probability, part 2 (Ch. 13. Uncertainty)

Presentation Transcript

Probability (Part 2)

Computer Science CPSC 502 Uncertainty Probability and Bayesian Networks (Ch. 6)

Ch. 2 Probability

Probability Review

Chapter 5, Part 2

Ch. 13 – Uncertainty

Computer Science CPSC 322 Lecture 28 Uncertainty and Probability (Ch. 6.1, 6.1.1, 6.1.3)

Probability

Uncertainty in Measurement

Uncertainty in Measurement

Probability Theory: Uncertainty Measure

Probability (Ch. 13. Uncertainty)

MEGN 537 – Probabilistic Biomechanics Ch.1 – Introduction Ch.2 – Mathematics of Probability

Probability

Basic Probability

Probability (Part 2)

EVOLUTION UNIT 7A Part 1 of 2

Probability Forecasts and Verification

Chapter 2 Probability

Chapter 2. Probability