540 likes | 634 Views
Basic Quantitative Methods in the Social Sciences (AKA Intro Stats). 02-250-01 Lecture 4. A Quick Review. The entire area under the normal curve can be considered to be a proportion of 1.00 A proportion of .50 lies to the left of the mean, and a proportion of .50 lies to the right of mean.
E N D
Basic Quantitative Methods in the Social Sciences(AKA Intro Stats) 02-250-01 Lecture 4
A Quick Review • The entire area under the normal curve can be considered to be a proportion of 1.00 • A proportion of .50 lies to the left of the mean, and a proportion of .50 lies to the right of mean
Area Under the Normal Distribution and Z-Scores Normal Distribution with z-score points of reference:
Properties of Area Under the Normal Distribution • Since the normal curve is a bell shape, the proportion of scores between whole z-scores is not equal • For example, .3413 of the scores lie between the z-scores of 0 (the mean) and 1 (or -1), while only .1359 of the scores lie between the z-scores of 1 and 2 (or -1 and -2)
Properties of Area Under the Normal Distribution .3413 .3413 .1359 .1359 .0215 .0215 .0013 .0013 Z = -3 -2 -1 0 +1 +2 +3
Properties of Area Under the Normal Distribution Z-scores* Proportion under the curve -1 to +1 .6826 (.3413+.3413) -2 to +2 .9544 -3 to +3 .9974 -4 to +4 1.0000 *Z-scores are expressed in standard deviation units, i.e., a z-score of -1 represents one standard deviation below (to the left of) the mean
Normal Distribution Example • A study of 2500 University of Windsor students showed that the average amount of sleep lost in the week prior to writing a statistics exam (in hours) was normally distributed with = 7.79 and = 1.75 (don’t worry, this isn’t real data!) • This distribution is shown with the abscissa (x-axis) marked in raw score and z-score units:
Normal Distribution Example .3413 .3413 .1359 .1359 .0215 .0215 .0013 .0013 X = 2.54 4.29 6.04 7.79 9.54 11.29 13.04 Z = -3 -2 -1 0 +1 +2 +3 Z = -3 -2 -1 0 +1 +2 +3
Example cont. • We can see from this diagram that 34.13% of U of W students lost between 6.04 and 7.79 hours of sleep in the week prior to a stats test (between z=-1 and z=0) • 13.59% of students lost between 9.54 and 11.29 hours of sleep in that week (between z=+1 and z=+2) • 49.87% of students lost between 2.54 & 7.79 hours of sleep (between z=-3 and z=0) (.0215+.1359+.3413 = .4987 = 49.87%)
Properties of Area Under the Normal Distribution • The symbol is used to denote the z-score having area (alpha) to its right under the normal curve • The proportion of area under the curve between the mean and a z-score can be found with the help of a table (Table E.10, Howell, p. 452) and a little math… • In this example, we want to know the area between the mean and z = 0.20: • Look under the column “mean to z” at z=0.20 • The proportion = 0.0793 • Therefore, .0793 (or almost 8%) is the proportion of data scores between the mean and the score that has a z score of 0.20
Example cont. • This means that the area between the mean and z = 0.20 has an area under the curve of 0.0793: .0793 .4207 Z: 0 0.20
Example cont. • Since half of the normal distribution has an area of .5000, we can determine the area beyond z = .20 by subtracting the area from the mean to z = .20 from .5000: • Area beyond z=.20 = .5000 - .0793 • Area beyond z=.20 = .4207 • (Note: If you look at the “smaller portion” in the table, you will see it’s .4207)
Example cont. • Since the normal curve is symmetrical, the area between the mean and z = -.20 is equal to the area between the mean and z = +.20: .0793 .0793 .4207 .4207 Z: -0.20 0 +0.20
Normal Distribution Table • Table E.10 has 3 columns: • Mean to z • Larger portion • Smaller portion
A Couple of Notes • 1) Always report proportions (area under the curve) to four decimal places. This means that if you report an area as a percentage, it will have two decimal places (e.g., .7943 = 79.43%) • 2) When using Table E.10, be careful not to confuse z=.20 with z=.02 (this is a common mistake) • 3) Remember that a negative z value has the same proportion under the curve as the positive z value because the normal distribution is symmetrical • 4) When working on z-score problems, it is highly recommended that you draw a normal distribution and plot the mean, x, and their corresponding z-scores
Another Example! • We often want to know what the area between two scores is, as in this example: • Assume that the marks in this class are normally distributed with = 69.5 and = 7.4. What proportion of students have marks between 50 and 80?
Example: Area Between 2 Scores 1) Calculate the z-scores for X values (50 & 80) z = (50-69.5)/7.4 = -19.5/7.4 = -2.64 z = (80-69.5)/7.4 = 10.5/7.4 = 1.42 2) Find the proportions between the mean and both z-scores (consult Table E.10) z(-2.64) = .4959 is the proportion between the mean and z. z(1.42) = .4222 is the proportion between the mean and z.
Example: Area Between 2 Scores • Third, add these proportions together to find your answer: .4959 + .4222 = .9181 • This means that 91.81% of students have Stats marks between 50 and 80
Smaller and Larger Portions • Smaller portion = proportion in the tail • Larger portion = proportion in the body • Using the same data ( = 69.5 and = 7.4) we can calculate areas using the Smaller and Larger Portions in the Normal Distribution table: • Find the number of students who have stats marks of less than 80.6 • z = (80.6-69.5)/7.4 = +1.5
Larger Portion • Area below z = +1.5 = 0.9332 • This means that 93.32% of students had a mark of 80.6 or less in this class
Smaller Portion • Find the number of students who have marks of 76.93 or better: • z = (76.93-69.5)/7.4 = 1.00 • Area in smaller portion = .1587 • This means that 15.87% of students in this class had a mark of 76.93 or better
Converting Back to X • Assume = 30 and = 5, what raw scores correspond to z=-1.00 and z=+1.5?
Proportion • What proportion of scores lie between z=-1.00 and z=+1.50? • Area from mean to z=-1.00 = .3413 • Area from mean to z=+1.50 = .4332 • Add them together to get the proportion that lies between these two z-scores: .3413+.4332 = .7745
Finding for Number of Observations • In this example, if we know the sample size, (e.g., n=212) we can calculate how many people lie between z=-1.00 and z=+1.50: • Area between z=-1.00 and z=+1.50 = .7745 (see the last slide) • Multiply the proportion by n: (.7745)(212) = 164.19 Approximately 164 people
And a Little More • Finally, we can find a z-score from the table if we know the proportion of scores (i.e., we can work backwards): • Suppose the birth weight of newborns is normally distributed with = 7.73 and = 0.83 • What birth weight identifies the top (heaviest) 10% of newborns?
Example cont. • Look at Table E.10 and find the z-score that identifies the top proportion of 0.1000: look in the smaller portion column (the tail) .1000 z = ?
Example cont. • Looking in the smaller portion column, we find that • z=1.28 has an area of .1003 • z=1.29 has an area of .0985 • Which do we pick? • Pick the one that is closest to an area of .1000: this is z=1.28
Example cont. • Now solve for X: X = (1.28)(0.83) + 7.73 = 1.06 + 7.73 = 8.79 So any weight equal to or greater than 8.79 pounds is in the top 10% of birth weights
Probability • Everything that can possibly happen has some likelihood of happening: probability is a measure of that likelihood • Probability: The quantitative expression of likelihood of occurrence
Probability • Probability is a ratio of frequencies • The numerator (top) is the frequency of the outcome of interest • The denominator (bottom) is the frequency of all possible outcomes
Coin Toss Example • If a fair* coin is tossed in the air, it can land on either heads or tails • This means a coin has 2 possible outcomes • If we want to know the probability of tossing a fair* coin and having it land on heads, we calculate as follows: *Note: fair means a normal coin, one that is not weighted differently
Coin Toss Frequency of interest Frequency of all possible outcomes For a coin toss, this is : 1 2 The probability of the coin landing on heads is: p(heads) = ½, or p(heads) = .5
Another Example • Suppose there are 90 students in a class, 59 of them are women and 31 are men • If one of the students is chosen at random, the probability of choosing a woman is: p(woman) = 59/90
More Probability • If the entire class was women (e.g., there were no male students), the probability of choosing a woman would be 90/90 • If the entire class was men, the probability of choosing a woman would be 0/90
More Probability • As a numerical value, probabilities can range from 0.00 to 1.00 • The numerator can range from a minimum of 0 to a maximum equal to the denominator
Express Yourself! • Probability can be expressed as a fraction, e.g., p(woman) = 59/90 • Or as a decimal fraction: p(woman) = .6556 • Although not usually expressed as a percentage (e.g., 65.56%), they often are in popular media
Probability cont. • Even if we do not know the actual observed frequencies (e.g., the number of women), probabilities can be determined theoretically • Without throwing a die, we can deduce the probability of landing on a 5
Die Example cont. • We know the die has 6 sides - 6 possible outcomes • We are only interested in one side (the 5), so the probability of landing on a 5 is: p(5) = 1/6 = 0.1667
Probability and the Normal Distribution • The normal distribution can be thought of as a probability distribution. Here’s how: • We know (from Table E.10) the proportion of scores that fall above or below a given z score • If you were to randomly pick a score from a sample of scores, what is the probability that you would pick a score that has a corresponding z score of .40 or greater?
Probability and the Normal Distribution • The proportion of scores above or below a given z score is the same as the probability of selecting a score above or below the z score • e.g., the probability of selecting a score from a normal distribution that has a z score of .40 or greater is .3446 (the area in the smaller portion of z = .40)
Example #1 • Suppose people’s scores on a personality test are normally distributed with a mean of 50 and a population standard deviation of 10. • If you were to pick a person completely at random, what is the probability that you would pick someone with a score on this personality test that is higher than 60?
Example #1 • Step #1: Write down what you know • Step #2: What do you want to find? • Step #3: Draw the normal distribution, write in the mean, standard deviation, and the X and shade the area you are looking for
Example #1, Step #3 X: 20 30 40 50 60 70 80
Example #1 • Step #4: Calculate z score(s) • Step #5: Use Table E.10 to find the probability of selecting a score in your shaded area • Here we want or • Look up the smaller portion of z=1.00
Example #1 • Step #6: Interpret: • The probability of picking someone at random who has a personality test score of 60 or greater is .1587
Example #2 • Length of time spent waiting in line to buy tickets at the movies is normally distributed with a mean of 12 minutes and a population standard deviation of 3 minutes. • If you go to see a movie, what is the probability that you will wait in line to buy tickets for between 7.5 and 15 minutes?
Example #2 • Step #1: Write down what you know • Step #2: What do you want to find? • Step #3: Draw the normal distribution, write in the mean, standard deviation, and both X scores and shade the area you are looking for