Probability Basics

Probability Basics Dr. Yan Liu Department of Biomedical, Industrial & Human Factors Engineering Wright State University

Introduction • Let A be an uncertain event with possible outcomes A1, A2, … An • e.g. A=“Flipping a coin”, A1={Head}, A2 ={Tail} • The sample space S of event A is the collection of all possible outcomes of A, i.e. S =A1U A2U… U An (Ai is the ith outcome) • e.g. A=“Flipping a coin”, S = {Head, Tail} • “U” is called Union; A1U A2 means either A1 or A2 or both of them happen • “∩” is called intersect; A1∩ B1 means both A1 and B1 happen, so sometimes A1∩ B1 can also be called A1 and B1

Ai S A2 A2 B1 A1 A1 A1 S S S Introduction (Cont.) • Probabilities can be visually represented using Venn Diagrams • Probabilities must add up if two outcomes cannot occur at the same time; Mathematically, Pr(Ai)=area of Ai/ total area of S 0≤Pr(Ai)≤1 A1 ∩B1 A1 U A2

… A1 A2 An S B1 A1 S Introduction (Cont.) • If A1, A2, … An are all thepossible outcomes of event A and not two of these can occur at the same time, their probabilities must sum up to 1; A1, A2, … An are said to be collectively exhaustive and mutually exclusive • If two outcomes A1 and B1 can occur at the same time, then the probability of either A1 or B1 or both happening equals the sum of their individual probability minus the probability of them both happening at the same time

B: stock price up S A: Dow Jones up B ∩ A: stock price up and Dow Jones up Basic Probability Rules • Conditional Probability • The conditional probability of an outcome B in relationship to an outcome A is the probability that outcome B occurs given that outcome A has already occurred Informally, conditioning on an event coincides with reducing the total event to the conditioning event

Pr (Ace of Spades |Ace) = The probability of drawing an ace of spades in a deck of 52 cards equals 1/52. However, if you have known I have an ace in my hand, what is the probability of it being the ace of spades?

, and (for i=1,2,…,n; j=1,2,…,m) Basic Probability Rules (Cont.) • Multiplicative Rule • Calculating the probability of two outcomes happening at the same time • Events A (with outcomes A1,…,An) and B(with outcomes B1,…,Bm) are independentif and only if information about A does not provide any information about B and vice versa. Mathematically, • No arrow between two chance nodes in influence diagrams implies independence between the associated events • Dependence between A and B does not mean causation; it only means information about A helps in determining the likelihood of outcomes of B

, and (for i=1,…,n; j=1,…,m; k=1,…p) C A B C A B Basic Probability Rules (Cont.) • Events A (with outcomes A1,…,An) and B (with outcomes B1,…,Bm) are conditionally independentgiven C (with outcomes C1,…,Cp) if and only if once C is known, any knowledge about A does not provide more information about B and vice versa. Mathematically, Conditional independence in influence diagrams

Events A, B, and C all have two possible outcomes: the events happen or do not happen When C happens, we observe that Pr(B|A)=0.9 Pr(A|B)=0.9 Pr(A∩B)=0.81 Pr(A)=0.9 Pr(B)=0.9 When C happens, because Pr(A|B) = Pr(A), and Pr(B|A) = Pr(B), A and B are independent When C does not happen, we observe that Pr(B|A)=0.1 Pr(A|B)=0.1 Pr(A∩B)=0.01 Pr(B)=0.1 Pr(A)=0.1 When C does not happen, because Pr(A|B) = Pr(A), and Pr(B|A) = Pr(B), A and B are independent Conclusion: A and B are independent given C

Ai B3 B1 B2 S Ai∩B1 Ai∩B2 Ai∩B3 Law of Total Probability • If B1, B2,…, Bn are mutually exclusive and collectively exhaustive, then

Oil Example An oil company is considering a site for an exploratory well. If the rock strata underlying the site are characterized by what geologists call a “dome” structure, the chances of finding oil are somewhat greater than if dome structure exists. The probability of a dome structure is Pr(Dome)=0.6 . The conditional probabilities of finding oil in this site are as follows. Pr(Dry|Dome) = 0.6 Pr(Low|Dome) = 0.25 Pr(High|Dome) = 0.15 Pr(High|No Dome) = 0.025 Pr(Low|No Dome) = 0.125 Pr(Dry|No Dome) = 0.85 Pr(Dry)=? Pr(Low)=? Pr(High)=?

Dry (0.6) Dome (0.25) Low (0.6) (0.15) High (0.85) Dry No Dome (0.125) Low (0.025) (0.4) High Probability Tree of the Oil Example Pr(Dry) = Pr(Dry ∩ Dome) + Pr(Dry ∩ No Dome) Pr(Dry ∩ Dome) = Pr(Dry|Dome) Pr(Dome) = 0.6∙0.6 = 0.36 Pr(Dry ∩ No Dome) = Pr(Dry|No Dome) Pr(No Dome) = 0.85∙0.4 = 0.34 Pr(Dry) = 0.36 + 0.34 = 0.70 Pr(Low) = Pr(Low ∩ Dome) + Pr(Low ∩ No Dome) = Pr(Low|Dome)∙Pr(Dome) + Pr(Low|No Dome)∙Pr(No Dome) = 0.25∙0.6+ 0.125∙0.4 = 0.20 Pr(High) = Pr(High ∩ Dome) + Pr(High ∩ No Dome) = Pr(High|Dome)∙Pr(Dome) + Pr(High|No Dome)∙Pr(No Dome) = 0.15∙0.6 + 0.025∙0.4 = 0.10

From the multiplicative rule it follows that: (Eq. 1) • Dividing the LHS and RHS of Eq. 1 by Pr (Ai) yields: (Eq. 2) • From the law of total probability it follows that: (Eq. 3) • Substituting Eq. 3 into Eq. 2 yields one of most the well known theorems in probability theory – Bayes Theorem (Eq. 4) Bayes Theorem If B1, B2,…, Bn are mutually exclusive and collectively exhaustive Pr(Bj): Prior probability (it does not take into account any information about A) Pr(Bj | Ai): Posterior probability (it is derived from the specific outcome of A)

Dry (0.6) (?) Dome Dome (0.7) Dry (0.25) Low No Dome (?) (0.6) (0.15) High Dome (?) (0.85) Dry (0.2) Low No Dome (?) No Dome (0.125) Low Dome (?) (0.025) (0.4) High (0.1) High No Dome (?) Oil Example (Cont.) Flip Tree Using Bayes Theorem

Uncertain Quantities • Random Variable (rv) • A rv is a rule that associates a number with each outcome in the sample space S of an statistical experiment (a process that generates a set of results or outcomes, each with some possibility of occurring) Consider an experiment in which batteries are examined until a good one is obtained. Let G and B denote a good battery and bad battery, respectively S= {G, BG, BBG, BBBG, ….} Define X = the number of batteries examined before the experiment terminates Then X(G)=1, X(BG)=2, X(BBG)=3, … However, the argument of the random variable function is usually omitted. Hence, one writes Pr(X=2) = Pr(The second battery is a good one) Note: The above statement is only correct with the above definition of the X. Therefore, you should always specifically describe the definition of a rv before using it.

Uncertain Quantities (Cont.) • A rv is usually denoted with a capital letter (such as X) and the specific value it takes is usually represented with a small letter (such as x) • A rv is discrete if its possible values either constitute a finite set or can be listed in an infinite sequence in which there is a first element, a second element, etc. • e.g. The number of heads you get after you flip a coin 3 times • A rv is continuous if its set of possible values consists of an entire interval on the numerical line • e.g. The failure time of a component

Pr(X=x) x Discrete Probability Distributions • The probability distribution for a discrete rv an be expressed in two ways: probability mass function (PMF) and cumulative distribution function (CDF) • PMF of X lists the probabilities of each possible discrete outcome X= the number of heads you get after flipping a coin three times Σ=1

F(x)=Pr(X ≤ x) = F(x) x Discrete Probability Distributions (Cont.) • CDF of X at a specific value x gives the probability that X ≤ x: PMF of X in Coin-Flipping Example CDF of X in Coin-Flipping Example In decision analysis, PMF and CDF are referred to as risk profile and cumulative risk profile, respectively

Expected Value • We know a rv X has many possible outcomes. However, if you have to give a “best guess” for X, what number would you give? The expected value of X E(X), is usually used as the “best guess” • Interpretation of Expected Value • If you were able to observe the outcomes of Xa large number of times, the calculated average of these observations would be close to the E(X) • If X can take on any value in the set {x1, x2, …, xn}, then

Expected Value (Cont.) • If Y=g (X), then • If Y=aX +b, where a and b are constants, then • If Z=aX +bY, where a and b are constants, then

X= the number of heads you get after flipping a coin three times E(X) = 0∙0.125+ 1∙0.375+ 2∙0.375+ 3∙0.125 = 1.5 Y= g(X)=X2 E(Y) = 0∙0.125+ 1∙0.375+ 4∙0.375+ 9∙0.125 = 3.0

Variance and Standard Deviation • Variance of X, var(X), is a measure of its statistical dispersion, indicating how its possible values are spread around its expected value • Standard deviation of X, σX, is the square root of var(X) • If Y=aX +b, where a and b are constants, then • If Z=aX +bY, where a and b are constants and X and Y are independent of each other, then

var(Y) = 32∙var(X)=9∙0.75=6.75 X= the number of heads you get after flipping a coin three times, var(X) = ? σX= ? Y= 3X, var(Y) = ? σY= ? E(X2)= 0∙1/8+ 1∙3/8+ 4∙3/8+ 9∙1/8 =3.0 E2(X)= 1.52 =2.25 Therefore, var(X)= 3.0 – 2.25 = 0.75, σX = √var(X) = √0.75 = 0.866 σY = 3∙√var(X) = 3∙√0.75 = 2.598

and Continuous Probability Distribution • The probability distribution of a continuous rv can also be expressed in two ways: probability density function (PDF) and cumulative distribution function (CDF) • PDF of X is a function f(x) such that for any two numbers a and b (a < b)

F(x) f(x) F(x*) x* x x* x Continuous Probability Distribution (Cont.) • CDF of X is Pr(X≤x*)

Continuous Probability Distribution (Cont.) • Expected Value: • Variance: • Examples of Theoretical Density Functions • Normal Distribution • Exponential • Beta Note: For this course, you are NOT required to do calculus to find the probabilities, expected values, and variance. However, you DO need to know how to look up probabilities in the tables in the appendices of the textbook and how to use formulas of some theoretical probability distributions to calculate expected values and variances

Measures of Dependency • Covariance • A measure of the extent to which two random variables vary together linearly • Positive (negative) covariance indicates higher than average values of one variable tend to be paired with higher (lower) than average values of the other variable • X and Y are independent • X and Y are independent (a nonlinear relationship can exist that still would result in a zero covariance) • Useful Properties • Correlation The unit of measurement of cov(X,Y) is that of X times that of Y, while cor(X,Y) is dimensionless

Find Exercise Use these to construct a probability table. Now use the table to find the following: B A Probability Table

B A 0.476 0.204 0.3136 0.0064 (Multiplicative rule) (Multiplicative rule) (Law of total probability) (Law of total probability) Probability Table

(Multiplicative rule) (Multiplicative rule) (Law of total probability) (Multiplicative rule) Because (Multiplicative rule) Because

Probability Basics