- 102 Views
- Uploaded on
- Presentation posted in: General

Chapter 6: Probability Distributions

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Chapter 6: Probability Distributions

Section 6.1: How Can We Summarize Possible Outcomes and Their Probabilities?

- Random variable
- Probability distributions for discrete random variables
- Mean of a probability distribution
- Summarizing the spread of a probability distribution
- Probability distribution for continuous random variables

- The numerical values that a variable assumes are the result of some random phenomenon:
- Selecting a random sample for a population
or

- Performing a randomized experiment

- Selecting a random sample for a population

- A random variable is a numerical measurement of the outcome of a random phenomenon.

- Use letters near the end of the alphabet, such as x, to symbolize
- Variables
- A particular value of the random variable

- Use a capital letter, such as X, to refer to the random variable itself.
Example: Flip a coin three times

- X=number of heads in the 3 flips; defines the random variable
- x=2; represents a possible value of the random variable

- The probability distribution of a random variable specifies its possible values and their probabilities.
Note: It is the randomness of the variable that allows us to specify probabilities for the outcomes

- A discrete random variableX has separate values (such as 0,1,2,…) as its possible outcomes
- Its probability distribution assigns a probability P(x) to each possible value x:
- For each x, the probability P(x) falls between 0 and 1
- The sum of the probabilities for all the possible x values equals 1

- What is the estimated probability of at least three home runs?
P(3)+P(4)+P(5)=0.13+0.03+0.01=0.17

- The mean of a probability distribution for a discrete random variable is
where the sum is taken over all possible values of x.

- The mean of a probability distribution is denoted by the parameter, µ.
- The mean is a weighted average; values of x that are more likely receive greater weight P(x)

- The mean of a probability distribution of a random variable X is also called the expected value of X.
- The expected value reflects not what we’ll observe in a single observation, but rather that we expect for the average in a long run of observations.
- It is not unusual for the expected value of a random variable to equal a number that is NOT a possible outcome.

- Find the mean of this probability distribution.

The mean:

= 0(0.23) + 1(0.38) + 2(0.22) + 3(0.13) + 4(0.03) + 5(0.01) = 1.38

The standard deviation of a probability distribution, denoted by the parameter, σ, measures its spread.

- Larger values of σ correspond to greater spread.
- Roughly, σ describes how far the random variable falls, on the average, from the mean of its distribution

- A continuous random variable has an infinite continuum of possible values in an interval.
- Examples are: time, age and size measures such as height and weight.
- Continuous variables are measured in a discrete manner because of rounding.

- A continuous random variable has possible values that form an interval.
- Its probability distribution is specified by a curve.
- Each interval has probability between 0 and 1.
- The interval containing all possible values has probability equal to 1.

Chapter 6: Probability Distributions

Section 6.2: How Can We Find Probabilities for Bell-Shaped Distributions?

- Normal Distribution
- 68-95-99.7 Rule for normal distributions
- Z-Scores and the Standard Normal Distribution
- The Standard Normal Table: Finding Probabilities
- Using the TI-calculator: find probabilities

- Using the Standard Normal Table in Reverse
- Using the TI-calculator: find z-scores
- Probabilities for Normally Distributed Random Variables
- Percentiles for Normally Distributed Random Variables
- Using Z-scores to Compare Distributions

The normal distribution is symmetric, bell-shaped and characterized by its mean µ and standard deviation .

- The normal distribution is the most important distribution in statistics
- Many distributions have an approximate normal distribution
- Approximates many discrete distributions well when there are a large number of possible outcomes
- Many statistical methods use it even when the data are not bell shaped

- Normal distributions are
- Bell shaped
- Symmetric around the mean

- The mean () and the standard deviation () completely describe the density curve
- Increasing/decreasing moves the curve along the horizontal axis
- Increasing/decreasing controls the spread of the curve

- Within what interval do almost all of the men’s heights fall? Women’s height?

- 68% of the observations fall within one standard deviation of the mean
- 95% of the observations fall within two standard deviations of the mean
- 99.7% of the observations fall within three standard deviations of the mean

- Heights of adult women
- can be approximated by a normal distribution
- = 65 inches; =3.5 inches

- 68-95-99.7 Rule for women’s heights
- 68% are between 61.5 and 68.5 inches
[ µ = 65 3.5 ]

- 95% are between 58 and 72 inches
[ µ 2 = 65 2(3.5) = 65 7 ]

- 99.7% are between 54.5 and 75.5 inches
[ µ 3 = 65 3(3.5) = 65 10.5 ]

- 68% are between 61.5 and 68.5 inches

68%

(by 68-95-99.7 Rule)

?

16%

-1

+1

65 68.5 (height values)

? = 84%

- What proportion of women are less than 69 inches tall?

- The z-score for a value x of a random variable is the number of standard deviations that x falls from the mean
- A negative (positive) z-score indicates that the value is below (above) the mean
- z-scores can be used to calculate the probabilities of a normal random variable using the normal tables in the back of the book

- A standard normal distribution has mean µ=0 and standard deviation σ=1
- When a random variable has a normal distribution and its values are converted to z-scores by subtracting the mean and dividing by the standard deviation, the z-scores have the standard normal distribution.

Table A enables us to find normal probabilities

- It tabulates the normal cumulative probabilities falling below the point +z
To use the table:

- Find the corresponding z-score
- Look up the closest standardized score (z) in the table.
- First column gives z to the first decimal place
- First row gives the second decimal place of z

- The corresponding probability found in the body of the table gives the probability of falling below the z-score

- Find the probability that a normal random variable takes a value less than 1.43 standard deviations above µ; P(z<1.43)=.9236

TI Calculator = Normcdf(-1e99,1.43,0,1)= .9236

- Find the probability that a normal random variable takes a value greater than 1.43 standard deviations above µ: P(z>1.43)=1-.9236=.0764

TI Calculator = Normcdf(1.43,1e99,0,1)= 0.0764

- Find the probability that a normal random variable assumes a value within 1.43 standard deviations of µ
- Probability below 1.43σ = .9236
- Probability below -1.43σ = .0764 (1-.9236)
- P(-1.43<z<1.43) =.9236-.0764=.8472

TI Calculator = Normcdf(-1.43,1.43,0,1)= .8472

To calculate the cumulative probability

- 2nd DISTR; 2:normalcdf(lower bound, upper bound,mean,sd)
- Use –1E99 for negative infinity and 1E99 for positive infinity

- Find probability to the left of -1.64
- P(z<-1.64)=normcdf(-1e99,-1.64,0,1)=.0505

- Find probability to the right of 1.56
- P(z>1.56)=normcdf(1.56,1e99,0,1)=.0594

- Find probability between -.50 and 2.25
- P(-.5<z<2.25)=normcdf(-.5,2.25,0,1)=.6793

- To solve some of our problems, we will need to find the value of z that corresponds to a certain normal cumulative probability
- To do so, we use Table A in reverse
- Rather than finding z using the first column (value of z up to one decimal) and the first row (second decimal of z)
- Find the probability in the body of the table
- The z-score is given by the corresponding values in the first column and row

- Rather than finding z using the first column (value of z up to one decimal) and the first row (second decimal of z)

- Example: Find the value of z for a cumulative probability of 0.025.
- Look up the cumulative probability of 0.025 in the body of Table A.
- A cumulative probability of 0.025 corresponds to z = -1.96.
- Thus, the probability that a normal
random variable falls at least 1.96

standard deviations below the

mean is 0.025.

- Example: Find the value of z for a cumulative probability of 0.975.
- Look up the cumulative probability of 0.975 in the body of Table A.
- A cumulative probability of 0.975 corresponds to z = 1.96.
- Thus, the probability that a normal
random variable takes a value no more

than 1.96 standard deviations above

the mean is 0.975.

- 2nd DISTR 3:invNorm; Enter
- invNorm(percentile,mean,sd)
- Percentile is the probability under the curve from negative infinity to the z-score

- Enter

- The probability that a standard normal random variable assumes a value that is ≤ z is 0.975. What is z? Invnorm(.975,0,1)=1.96
- The probability that a standard normal random variable assumes a value that is > z is 0.0275.
What is z? Invnorm(.975,0,1)=1.96

- The probability that a standard normal random variable assumes a value that is ≥ z is 0.881.
What is z? Invnorm(1-.881,0,1)=-1.18

- The probability that a standard normal random variable assumes a value that is < z is 0.119.
What is z? Invnorm(.119,0,1)= -1.18

- Find the z-score z such that the probability within z standard deviations of the mean is 0.50.
- Invnorm(.75,0,1)= .67
- Invnorm(.25,0,1)= -.67

- Probability = P(-.67<Z<.67)=.5

- State the problem in terms of the observed random variable X, i.e., P(X<x)
- Standardize X to restate the problem in terms of a standard normal variable Z
- Draw a picture to show the desired probability under the standard normal curve
- Find the area under the standard normal curve using Table A

- Adult systolic blood pressure is normally distributed with µ = 120 and σ = 20. What percentage of adults have systolic blood pressure less than 100?
- P(X<100) =
- Normcdf(-1E99,100,120,20)=.1587
- 15.9% of adults have systolic blood pressure less than 100

- Adult systolic blood pressure is normally distributed with µ = 120 and σ = 20. What percentage of adults have systolic blood pressure greater than 100?
- P(X>100) = 1 – P(X<100)
- P(X>100)= 1-.1587=.8413
- Normcdf(100,1e99,120,20)=.8413
- 84.1% of adults have systolic blood pressure greater than 100

- Adult systolic blood pressure is normally distributed with µ = 120 and σ = 20. What percentage of adults have systolic blood pressure greater than 133?
- P(X>133) = 1 – P(X<133)
- P(X>133)= 1-.7422=.2578
- Normcdf(133,1E99,120,20)=.2578
- 25.8% of adults have systolic blood pressure greater than 133

- Adult systolic blood pressure is normally distributed with µ = 120 and σ = 20. What percentage of adults have systolic blood pressure between 100 and 133?
- P(100<X<133) = P(X<133)-P(X<100)
- Normcdf(100,133,120,20)=.5835
- 58% of adults have systolic blood pressure between 100 and 133

- Adult systolic blood pressure is normally distributed with µ = 120 and σ = 20. What is the 1st quartile?
- P(X<x)=.25, find x:
- Look up .25 in the body of Table A to find z= -0.67
- Solve equation to find x:

- Check:
- P(X<106.6) P(Z<-0.67)=0.25
- TI Calculator = Invnorm(.25,120,20)=106.6

- Adult systolic blood pressure is normally distributed with µ = 120 and σ = 20. 10% of adults have systolic blood pressure above what level?
- P(X>x)=.10, find x.
- P(X>x)=1-P(X<x)
- Look up 1-0.1=0.9 in the body of Table A to find z=1.28
- Solve equation to find x:

- Check:
- P(X>145.6) =P(Z>1.28)=0.10
- TI Calculator = Invnorm(.9,120,20)=145.6

Z-scores can be used to compare observations from different normal distributions

- Example:
- You score 650 on the SAT which has =500 and
=100 and 30 on the ACT which has =21.0 and

=4.7. On which test did you perform better?

- Compare z-scores
SAT: ACT:

- Since your z-score is greater for the ACT, you performed better on this exam

- You score 650 on the SAT which has =500 and

Chapter 6: Probability Distributions

Section 6.3: How Can We Find Probabilities When Each Observation Has Two Possible Outcomes?

- The Binomial Distribution
- Conditions for a Binomial Distribution
- Probabilities for a Binomial Distribution
- Factorials
- Examples using Binomial Distribution
- Do the Binomial Conditions Apply?
- Mean and Standard Deviation of the Binomial Distribution
- Normal Approximation to the Binomial

- Each observation is binary: it has one of two possible outcomes.
- Examples:
- Accept, or decline an offer from a bank for a credit card.
- Have, or do not have, health insurance.
- Vote yes or no on a referendum.

- Each of n trials has two possible outcomes: “success” or “failure”.
- Each trial has the same probability of success, denoted by p.
- The ntrials are independent.
- The binomial random variable X is the number of successes in the n trials.

- Denote the probability of success on a trial by p.
- For n independent trials, the probability of x successes equals:

Rules for factorials:

- n!=n*(n-1)*(n-2)…2*1
- 1!=1
- 0!=1
For example,

- 4!=4*3*2*1=24

- John Doe claims to possess ESP.
- An experiment is conducted:
- A person in one room picks one of the integers 1, 2, 3, 4, 5 at random.
- In another room, John Doe identifies the number he believes was picked.
- Three trials are performed for the experiment.
- Doe got the correct answer twice.

If John Doe does not actually have ESP and is actually guessing the number, what is the probability that he’d make a correct guess on two of the three trials?

- The three ways John Doe could make two correct guesses in three trials are: SSF, SFS, and FSS.
- Each of these has probability: (0.2)2(0.8)=0.032.
- The total probability of two correct guesses is 3(0.032)=0.096.

- The probability of exactly 2 correct guesses is the binomial probability with n = 3 trials, x = 2 correct guesses and p = 0.2 probability of a correct guess.

2nd Vars

0:binampdf(n,p,x)

Binampdf(3,.2,2)=0.096

- 1000 employees, 50% Female
- None of the 10 employees chosen for management training were female.

- The probability that no females are chosen is:
- Binompdf(10,.5,0)=9.765625E-4
- It is very unlikely (one chance in a thousand) that none of the 10 selected for management training would be female if the employees were chosen randomly

- Before using the binomial distribution, check that its three conditions apply:
- Binary data (success or failure).
- The same probability of success for each trial (denoted by p).
- Independent trials.

- The data are binary (male, female).
- If employees are selected randomly, the probability of selecting a female on a given trial is 0.50.
- With random sampling of 10 employees from a large population, outcomes for one trial does not depend on the outcome of another trial

- The binomial probability distribution for n trials with probability p of success on each trial has mean µ and standard deviation σ given by:

- Data:
- 262 police car stops in Philadelphia in 1997.
- 207 of the drivers stopped were African-American.
- In 1997, Philadelphia’s population was 42.2% African-American.
- Does the number of African-Americans stopped suggest possible bias, being higher than we would expect (other things being equal, such as the rate of violating traffic laws)?

- Assume:
- 262 car stops represent n = 262 trials.
- Successive police car stops are independent.
- P(driver is African-American) is p = 0.422.

- Calculate the mean and standard deviation of this binomial distribution:

- Recall: Empirical Rule
- When a distribution is bell-shaped, close to 100% of the observations fall within 3 standard deviations of the mean.

- If there is no racial profiling, we would not be surprised if between about 87 and 135 of the 262 drivers stopped were African-American.
- The actual number stopped (207) is well above these values.
- The number of African-Americans stopped is too high, even taking into account random variation.

- Limitation of the analysis:
- Different people do different amounts of driving, so we don’t really know that 42.2% of the potential stops were African-American.

- The binomial distribution can be well approximated by the normal distribution when the expected number of successes, np, and the expected number of failures, n(1-p) are both at least 15.