440 likes | 589 Views
Chapter 7. Random Variables and Discrete Probability Distributions. Random Variables…. A random variable is a function or rule that assigns a number to each outcome of an experiment. Basically it is just a symbol that represents the outcome of an experiment.
E N D
Chapter 7 Random Variables and Discrete Probability Distributions
Random Variables… • A random variable is a function or rule that assigns a number to each outcome of an experiment. Basically it is just a symbol that represents the outcome of an experiment. • X = number of heads when the experiment is flipping a coin 20 times. • C = the daily change in a stock price. • R = the number of miles per gallon you get on your auto during a family vacation. • Y = the amount of medication in a blood pressure pill. • V = the speed of an auto registered on a radar detector used on I-20
Two Types of Random Variables… • Discrete Random Variable – usually count data [Number of] • * one that takes on a countable number of values – this means you can sit down and list all possible outcomes without missing any, although it might take you an infinite amount of time. • X = values on the roll of two dice: X has to be either 2, 3, 4, …, or 12. • Y = number of accidents on the UTA campus during a week: Y has to be 0, 1, 2, 3, 4, 5, 6, 7, 8, ……………”real big number” • Continuous Random Variable – usually measurement data [time, weight, distance, etc] • * one that takes on an uncountable number of values – this means you can never list all possible outcomes even if you had an infinite amount of time. • X = time it takes you to drive home from class: X > 0, might be 30.1 minutes measured to the nearest tenth but in reality the actual time is 30.10000001…………………. minutes?) • Exercise: try to list all possible numbers between 0 and 1.
Probability Distributions… • A probability distribution (density function) is a table, formula, or graph that describes the values of a random variable and the probability associated with these values. • – Discrete Probability Distribution, (this chapter) • X = outcome of rolling one die • – Continuous Probability Distribution (Chapter 8)
Discrete Probability Notation… • An upper-case letter will represent the name of the random variable, usually X. • Its lower-case counterpart, x, will represent the value of the random variable. • The probability that the random variable X will equal x is: • P(X = x) or more simply P(x) • X = number of heads in 10 flips of coin • P(X = 5) = P(5) = probability of 5 heads (x) in 10 flips
Discrete Probability Distributions… • Probabilities, P(x),associated with Discrete random variables have the following properties.
Developing Discrete Probability Distributions • Probability distributions can be estimated from relative frequencies. Consider the discrete (countable) number of televisions per household (X) from US survey data (Example 7.1)… 1,218 ÷ 101,501 = 0.012 e.g. P(X=4) = P(4) = 0.076 = 7.6%
Questions you might want answered • E.g. what is the probability there is at least one television but no more than three in any given household? “at least one television but no more than three” P(1 ≤ X ≤ 3) = P(1) + P(2) + P(3) = .319 + .374 + .191 = .884
Developing Discrete Probability Distributions • Techniques covered in the Probability Chapter can be used to develop probability distributions, for example, a mutual fund sales person knows that there is 20% chance of closing a sale on each call she makes. • What is the probability distribution of the number of sales if she plans to call three customers? • Random Variable = X = # Sales Made in 3 Attempts • Let S denote probability of closing a sale P(S)=.20 • Thus SC is not closing a sale, and P(SC)=.80 • Seems reasonable to assume that sales are independent.
Sample Space: List of all possible outcomes • S1S2S3 : P(X = 3) = (.2)*(.2)*(.2) = 0.008 : P(3) = .008 • SSSC : P(X = 2) = (.2)*(.2)*(.8) = 0.032 • SSCS : P(X = 2) = (.2)*(.8)*(.2) = 0.032 : P(2) = .032+.032+.032 • SCSS : P(X = 2) = (.8)*(.2)*(.2) = 0.032 (Additive Law) • SSCSC : P(X = 1) = (.2)*(.8)*(.8) = 0.128 • SCSSC : P(X = 1) = (.8)*(.2)*(.8) = 0.128 : P(1) = .128+.128+.128 • SCSCS : P(X = 1) = (.8)*(.8)*(.2) = 0.128 (Additive Law) • SCSCSC : P(X = 0) = (.8)*(.8)*(.8) = 0.512 : P(0) = .512 • NOTE: P(S1S2S3) = P(S1) * P(S2/S1) * P(S3/S1S2) “Mult. Rule” • = P(S1) * P(S2) * P(S3) “independent?” • = (.2)*(.2)*(.2) = 0.008
P(S)=.2 P(S)=.2 P(SC)=.8 P(S)=.2 P(S)=.2 P(SC)=.8 P(SC)=.8 P(S)=.2 P(S)=.2 P(SC)=.8 P(SC)=.8 P(S)=.2 P(SC)=.8 P(SC)=.8 Another Approach: Tree Diagram • Developing a Probability Distribution… Sales Call 1 Sales Call 2 Sales Call 3 (.2)(.2)(.8)= .032 S S S S S SC S SC S S SC SC SC S S SC S SC SC SC S SC SC SC • X P(x) • .23 = .008 • 3(.032)=.096 • 3(.128)=.384 • 0 .83 = .512 P(X=2) is illustrated here…
Final Discrete Probability Distribution • The mean of a discrete random variable is the weighted average of all of its values. The weights are the probabilities. This parameter is also called the expected value of X and is represented by E(X). • The variance is • The standard deviation is
Computing Mean, Variance, and Std. Dev. for Discrete Random Variable • Mean = 0*(.008) + 1*(.096) + 2*(.384) + 3*(.512) • = 2.4 • Variance = (0-2.4)2*(.008) + (1-2.4)2*(.096) • + (2-2.4)2*(.384) + (3-2.4)2*(.512) • = .046 + .188 + .061 + .184 = .479 • Std. Dev. = SQRT(.479) = .692 • We are as smart as the goddess of statistics now, since we know the true mean, variance, and standard deviation of the population.
Laws of Expected Value…”Useful to know” • E(c) = c * The expected value of a constant (c) is just the value of the constant. • E(X + c) = E(X) + c • * The expected value of a random variable plus a constant is the expected value of the random variable plus the constant • 3. E(cX) = cE(X) • The expected value of a constant times a random variable is the constant times the expected value of the random variable.
Laws of Expected Value…”Useful to know” • E(c1X1 + c2X2 + c3X3 + c4X4 + c5X5) • = c1E(X1) + c2E(X2) + c3E(X3) + c4E(X4) + c5E(X5) • Example: what is the expected mean weight of a surgical pack containing 5 components [maybe we could weigh the pack to determine if one of the components is missing]. • True when random variables are independent!!!
Laws of Variance… • V(c) = 0 • The variance of a constant (c) is zero. • V(X + c) = V(X) • The variance of a random variable and a constant is just the variance of the random variable. • V(cX) = c2V(X) • The variance of a random variable and a constant coefficient is the coefficient squared times the variance of the random variable.
Example: You weight all 30,000 students • Random Variable: X = students weight • Mean(X) = X-Bar = 160 lbs • Variance(X) = s2 = 900 lbs2 • StdDev(X) = s = 30 lbs • ************************************* • You now discover that the scales reported a student’s weight 5 lbs too heavy. The student’s real weights (Y) should have been Y = X – 5. What are the mean and variance of the student’s REAL weights • Mean(Y) = Mean(X) – 5 = 160 – 5 = 155 lbs • Variance(Y) = Variance(X) = 900 • StdDev(Y) = SQRT(900) = 30
Example: You measure the height of all 30,000 students • Random Variable: X = students height in “Feet” • Mean(X) = X-Bar = 5.8 feet • Variance(X) = s2 = 0.09 feet2 • StdDev(X) = s = 0.3 feet • ************************************* • You now discover that the President wanted to measure student’s heights in “Inches” and not “Feet”. The student’s height in “Inches” (Y) should have been Y = 12*X . What are the mean and variance of the student’s heights in Inches? • Mean(Y) = 12*Mean(X) = 12*5.8 = 69.6 inches • Variance(Y) = 122*Variance(X) = 144*(.09) = 12.96 • StdDev(Y) = SQRT(12.96) = 3.6
Laws… • We can derive laws of expected value and variance for the sum of two independent random variables as follows… • E(X + Y) = E(X) + E(Y) • V(X + Y) = V(X) + V(Y) • ************************************************************** • X = weight of right shoes: Mean(X) = .5 lbs and Var(X) = .0004 • Y = weight of left shoes: Mean(Y) = .5 lbs and Var(Y) = .0004 • ************************************************************** • What is the mean and variance of a “Pair” of shoes. P = X +Y • E(P) = E(X + Y) = E(X) + E(Y) = .5 + .5 = 1.0 • V(P) = V(X+Y) = V(X) + V(Y) = .0004 + .0004 = .0008 • NOTE: WEIGHTS OF RIGHT AND LEFT SHOE INDEPENDENT • *************************************************************** • ? How could you determine the mean and variance of the weight of an automobile after you make all the parts but before you assemble the automobile
Binomial Distribution… 2 parameters [n and p] • The binomial distribution is the probability distribution that results from doing a “binomial experiment”. Binomial experiments have the following properties: • Fixed number of trials, represented as n. • Each trial has two possible outcomes, a “success” and a “failure”. • P(success)=p (and thus: P(failure)=1–p), for all trials. • The trials are independent, which means that the outcome of one trial does not affect the outcomes of any other trials.
Success and Failure… • …are just labels for a binomial experiment, there is no value judgment implied. You may define either one of the 2 possible outcomes as “Success” • For example a coin flip will result in either heads or tails. If we define “heads” as success then necessarily “tails” is considered a failure (inasmuch as we attempting to have the coin lands heads up). • Other potential examples of binomial random variables: • A firecracker pops or fails to pop • A patient get an infection during an operation or does not get an infection
Binomial Random Variable… • The random variable of a binomial experiment is defined as the number of successes, X, in the n trials, where the probability of success on a single trial is p. • E.g. flip a fair coin 10 times… • 1) Fixed number of trials n=10 • 2) Each trial has two possible outcomes {heads (success), tails (failure)} • 3) P(success)= 0.50; P(failure)=1–0.50 = 0.50 • 4) The trials are independent (i.e. the outcome of heads on the first flip will have no impact on subsequent coin flips). • Hence flipping a coin ten times is a binomial experiment since all conditions were met.
Binomial Distribution [formula] • The binomial random variable (# of successes in n trials) can take on values 0, 1, 2, …, n. Thus, its a discrete random variable. • Once we know a random variable is binomial, we can calculate the probability associated with each value of the random variable from the binomial distribution: • x = # successes and n-x = # failures for x=0, 1, 2, …, n
Ways to Calculate Binomial Probabilities • Use the binomial distribution formula [not a good approach unless n is fairly small] • Use the binomial tables at the back of most stat books [not real good unless your specific value of “n” and “p” happen to be included in the tables] • Approximate the binomial probabilities from some other distributional form (normal) [no need to do this now that we have access to various statistical software that will do it for us] • Use Excel stat function “=BINOMDIST(x,n,p,false)” which will return the individual probability. Replace false with true and you will get the sum of the binomial probabilities from 0 up to x.
Problem: Pat Statsdud… • Pat Statsdud failed to study for the next stat exam. Pat’s exam strategy is to rely on luck for the next quiz. The quiz consists of 10 multiple-choice questions (n=10). Each question has five possible answers, only one of which is correct (p=0.2). Pat plans to guess the answer to each question. • What is the probability that Pat gets no answers correct? • P(X=0) = P(0) = • What is the probability that Pat gets two answers correct? • P(X=2) = P(2) =
Pat Statsdud… • n=10, and P(success) = .20 • What is the probability that Pat gets no answers correct? • I.e. # success, x, = 0; hence we want to know P(x=0) Pat has about an 11% chance of getting no answers correct using the guessing strategy.
Pat Statsdud… • n=10, and P(success) = .20 • What is the probability that Pat gets two answers correct? • I.e. # success, x, = 2; hence we want to know P(x=2) Pat has about a 30% chance of getting exactly two answers correct using the guessing strategy.
Cumulative Probability… • “Find the probability that Pat fails the quiz” • If a grade on the quiz is less than 50% (i.e. 5 questions • out of 10), that’s considered a failed quiz. • P(fail quiz) = P(X < 4) = P(0)+P(1)+P(2)+P(3)+P(4) • Called a cumulative probability, that is, P(X ≤ x) • Note: Calculating all these individual probabilities would be tedious and time consuming, however, the Binomial tables at back of book gives you the cumulative probabilities [n=10, p=0.2, x=4]
Pat Statsdud… • Calculate Individual Probabilities and Add Up! • P(X ≤ 4) = P(0) + P(1) + P(2) + P(3) + P(4) • We already know P(0) = .1074 and P(2) = .3020. Using the binomial formula to calculate the others: • P(1) = .2684 , P(3) = .2013, and P(4) = .0881 • Hense P(X ≤ 4) = .1074 + .2684 + … + .0881 = .9672 • OR • Use binomial tables at back of book for n=10, p=0.2, and x=4 “Next Slide”
Binomial Table… • “What is the probability that Pat fails the quiz”? • i.e. what is P(X ≤ 4), given P(success) = .20 and n=10 ? P(X ≤ 4) = .967
Binomial Table… • “What is the probability that Pat gets no answers correct?” • i.e. what is P(X = 0), given P(success) = .20 and n=10 ? P(X = 0) = P(X ≤ 0) = .107
Binomial Table… • “What is the probability that Pat gets two answers correct?” • i.e. what is P(X = 2), given P(success) = .20 and n=10 ? P(X = 2) = P(X≤2) – P(X≤1) = .678 – .376 = .302 remember, the table shows cumulative probabilities…
=BINOMDIST() Excel Function… • There is a binomial distribution function in Excel that can also be used to calculate these probabilities. For example: • What is the probability that Pat gets two answers correct? # successes # trials P(success) True: cumulative prob. False: individual prob. P(X=2)=.3020
=BINOMDIST() Excel Function… • There is a binomial distribution function in Excel that can also be used to calculate these probabilities. For example: • What is the probability that Pat fails the quiz? # successes # trials P(success) cumulative (i.e. P(X≤x)?) P(X≤4)=.9672
Binomial Distribution… • As you might expect, statisticians have determined formulas for the mean, variance, and standard deviation of a binomial random variable. They are: • Previous example: n=10, p=0.2 • μ = n*p = 10*0.2 = 2 • σ2 = n*p*(1-p) = 10*0.2*0.8= 1.6 • σ = SQRT(1.6) = 1.26
Poisson Distribution… 1 parameter [μ] • Named for Simeon Poisson, the Poisson distribution is a discrete probability distribution and refers to the number of events (a.k.a. successes) within a specific time period or region of space. For example: • The number of cars arriving at a service station in 1 hour. (The interval of time is 1 hour.) • The number of flaws in a bolt of cloth. (The specific region is a bolt of cloth.) • The number of accidents in 1 day on a particular stretch of highway. (The interval is defined by both time, 1 day, and space, the particular stretch of highway.)
Poisson Probability Distribution… • The probability that a Poisson random variable assumes a value of x is given by: • Note: μ is the only parameter [tell me μ and I can calculate the probabilities] • and e is the natural logarithm base. • FYI:
Example 7.12… • The number of typographical errors in new editions of textbooks varies considerably from book to book. After some analysis he concludes that the number of errors is Poisson distributed with a mean of 1.5 typos per 100 pages. The instructor randomly selects 100 pages of a new book. What is the probability that there are no typos? • That is, what is P(X=0) given that = 1.5? “There is about a 22% chance of finding zero errors”
Poisson Distribution… • As mentioned on the Poisson experiment slide: • The probability of a success is proportional to the size of the interval • Thus, knowing an error rate of 1.5 typos per 100 pages, we can determine a mean value for a 400 page book as: • =1.5(4) = 6 typos / 400 pages.
Example 7.13… • For a 400 page book, what is the probability that there are • no typos? • P(X=0) = “there is a very small chance there are no typos”
Example 7.13… • For a 400 page book, what is the probability that there are five or less typos? • P(X≤5) = P(0) + P(1) + … + P(5) • This is rather tedious to solve manually. A better alternative is to refer to Table 2 in Appendix B… • …k=5, =6, and P(X ≤ k) = .446 “there is about a 45% chance there are 5 or less typos”
Example 7.13… • …Excel is an even better alternative:
Poisson Practice • The number of infections [X] in a hospital each week has been shown to follow a poisson distribution with mean 3.0 infections per week. Calculate the following probabilities. • P(X = 0) = • P(X < 4) = • P(X > 9) = • If you found 9 infections next week, what would you say??