Understanding Probability Distributions: Normal Distribution & Z-scores

Recitation 3

The Normal Distribution

Probability Distributions A probability distribution is a table or an equation that links each outcome of a statistical experiment with its probability of occurrence.

Distributions fit with different types of variables: Discrete variables: takes on a countable number of values -the number of job classifications in an agency -the number of employees in a department -the number of training sessions Continuous variables: takes a countless (or super big) range of numerical values -temperature -pressure -height, weight, time -Dollars: budgets, income. (not strictly continuous) but they can take so many values that are so close that you may as well treat them that way

Visualized Discrete vs. Continuous The difference is being able to go on the discrete histogram and saying what is the prob. of a 3? and you can see it is .4. For a continuous variable, you would have to say a 1.1 and a 2.9 (give a range and take the area—integrate)

Real life normal distribution

The Normal Distribution Characteristics -continuous variables only -The bell curve shape is familiar -most values cluster around the mean mu -As values fall at a greater distance from the mean, their likelihood of occurring shrinks -Its shape is completely determined by its mean and its standard deviation -The height of the curve is the greatest at the mean (where probability of occurrence is highest)

-68.26% of values fall within one standard deviation of the mean in either direction -95.44% of values fall within 2 standard deviations of the mean in either direction -99.72% of values fall within 3 standard deviations of the mean in either direction

z scores • The number of standard deviations a score of interests lies away from the mean in a normal distribution • It is used to convert raw data into their associated probability of occurrence with reference to the mean • The score we are interested in is X. To find the z score of X, subtract the mean mu from it then divide by the amount of standard deviations (sigma) to determine how many SD’s the score is from the mean

z scores • The z score itself equals the number of SD’s (sigma) that a score of interest (X) is from the mean (mu) in a normal distribution • A data value X one standard deviation above the mean has a z score of 1 • A data value X 2 SD’s above the mean has a z score of 2 • The probability associated with a z score of one is 0.3413; see below in the blue oval: (68.26/2)=34.13% of the data values lie between the mean and 1 SD above it

z scores • The z score for 1 SD below the mean will be the same in magnitude (0.3413) but with a negative z score of -1.0 • Thus, the z score of -1.0 contains 34.13% of the data • i.e. just over one third of the data fall between mu and 1 SD below it

Example: • What is the likelihood that a value has a z score of 2.0?

Example: • What is the likelihood that a value has a z score of 2.0? It is equal to 95.55/2=47.72% (Meaning, just over 47% of the data fall between mu and 2 SD above it)

The normal distribution table • Displays the percentage of data values falling between the mean mu and each z score • the first 2 digits are in the far left column • the third digit is on the top row • The associated probability is where they meet

Locating z score for 1.0

example What percent of the data lies between mu and 1.33 SD away?

Locate 1.33 on the z table The answer is that 40.824% of the data fall between the mean and 1.33 SD from the mean

Application Example: The police chief is reviewing the academy’s exam scores. The police department’s entrance exam has a normal distribution with a mean of 100 and SD of 10. Someone scored 119.2 on the exam. Is this a good score?

Solution -another way of asking this is: what is the probability that any random applicant takes the test and scores a 119.2? -If the probability is high, then it is an average or mediocre score, if the probability is low, then it is an exceptional score -Step 1: convert the test score to a z score using the formula: (119.2-100)/10=1.92

Solution -Step 2: Use the z score of 1.92 (how many standard deviations the score is above the mean, since it is a positive z score) Look it up in the z table.

Solution -Step 3: The value here is .4726 -But you’re not done. Here is what you just found: .50 .4726 -We also need to add in the part of the curve shaded in green, or all of The scores under the mean. (0.5+0.4726=) 97.26 is the percentile, or in other words, 97.26% of the scores fall below this score -The probability that a randomly selected individual will get this score or better is 1-97.26=.0274

Tips • Always draw a picture, it helps you reason through your answer • The z curve is symmetric, so if a your score was a -1.92, it would still contain ~47.2% of the data.

The Binomial Distribution The last section, I promise.

A gem from the reading

Probability Distributions A probability distribution is a table or an equation that links each outcome of a statistical experiment with its probability of occurrence.

Binomial Distribution Definition • The probability an event will occur a specified number of times within a specified number of trials • Examples: mail will be delivered before a certain time every day this week • equipment in a factory remains operational in a 10 day period • This is a DISCRETE distribution that deals with the likelihood of observing a certain number of events in a set number of repeated trials

A Bernoulli process • The Binomial distribution can be used when the process is Bernoulli • Bernoulli characteristics: • The outcome of a trial is either a success or a failure • The outcomes are mutually exclusive • The probability of a success is constant from trial to trial • One trial’s probability of success is not affected by the trial before it (INDEPENDENCE) • Examples of independent events could be multiple coin tosses , A fire occurring in a community isn’t affected by if one happened the night before

When looking at Bernoulli Events • You can calculate their probability with the binomial distribution • Examples of Bernoulli events: • coin flip is either heads, or not heads • A crime is either solved or not solved

To calculate a probability using the Binomial Distribution you need • n=number of trials • r=number of successes • p=probability that the event will be a success • q=(1-p)

Breaking down the formula Is a combination, it is read, “a combination of n THINGS taken rat a time” The formula is:

Example We flip a coin three times, and we want to know the probability of getting three heads

Step 1 • Define N, P, R, and Q n (number of trials) =3 r (successes)=3 [number of heads] p (probability of getting a heads on a flip)= 0.5 q (1-p)=0.5 Now fill in the formula

Important when solving • 0!=1 • Any number raised to the power of 0 = 1

Example 2 A Public works department has been charged with discrimination. Last year, 40% of people who passed the civil service exam were minorities (eligible to be hired). From this group, Public works hired 10 people, and 2 were minorities. What is the probability that if Public works DID NOT discriminate it still would have hired 2 or fewer minorities? (assuming everyone had the same probability of getting hired)

Step 1 • identify n, p, r, and q • n (number of trials=number of people hired) =10 • r (successes)=2 [number of hired minorities] • p (probability of getting hired=% of minorities in the pool)= 0.4 • q (1-p)=0.6

Step 2 Reason through the problem. It asks the likelihood that Public Works hired 2 or fewer minorities. Thus, we need to calculate the binomial for 2 hires, 1 hire, and 0 hires.

Step 3 set up the probability calculations: Two minorities: n (number of trials=number of people hired) =10 r (successes)=2 [number of hired minorities] p (probability of getting hired=% of minorities in the pool)= 0.4 q (1-p)=0.6 (10!/2!8!) * 0.4^2 * .6^8 = 0.12

Formula reminder Binomial: Combinations:

Step 4 One minority n (number of trials=number of people hired) =10 r (successes)=1 [number of hired minorities] p (probability of getting hired=% of minorities in the pool)= 0.4 q (1-p)=0.6 (10!/1!9!) * 0.4^1 * 0.6 ^9 = 0.04 • Repeat for one minority hired

Step 5 No minorities at all n (number of trials=number of people hired) =10 r (successes)=0 [number of hired minorities] p (probability of getting hired=% of minorities in the pool)= 0.4 q (1-p)=0.6 (10!/0!10!) * 0.4^0 * 0.6^10=.006 • Repeat for 0 minorities hired

Step 6 • Add these probabilities together: • =0.166 • The likelihood of hiring 2 minorities by chance is !6%

Understanding Probability Distributions: Normal Distribution & Z-scores

Understanding Probability Distributions: Normal Distribution & Z-scores

Presentation Transcript

Group 3 Recitation A13

Recitation Meeting 3

Complexity Recitation 3

IE 302 Recitation 3

Recitation 3 April 30

Recitation April 3, 2014

Module 5, Recitation 3

Recitation 3

Module 6, Recitation 3

Internet Networking recitation #3

Routing Recitation #3

Recitation

Introduction to InfoSec – Recitation 3

RECITATION 3

Recitation 3/27/2009

Module 4, Recitation 3

Module 5, Recitation 3

Recitation 3

Recitation 3

PAI Recitation 3 – Logic

Recitation 3