More About Confidence Intervals

Presentation 10 More About Confidence Intervals

Types of CI’s in Chapter 12 • 1 mean • Difference Between 2 Independent means • Difference Between 2 Paired Means • Difference Between 2 Proportions In cases B, C and D we are interested in comparing 2 populations with regard to a parameter. There are two possible ways to get the samples from the two populations: • Independent samples – The data from one sample do not tell us anything for the data in the other sample (cases B and D) • Paired Data – A natural pairing exists among the two samples, e.g. “before and after” studies, studies on twins, etc. (case C) Basic formula for CI remains the same! Estimate ± Multiplier x Standard Error of the Estimate

Recognize the Situation • The biggest challenge that most of you face at this point is reading a problem and deciding which kind of confidence interval is required. So, I will make it very clear how to do so, and then we will get some practice. • First, you need to identify the response variable and then determine what type of variable (categorical or quantitative) it is. • If it is categorical, we are dealing with proportions. From there, you should be able to determine whether we are looking at just one proportion or the difference between two proportions. • If the variable of interest is quantitative, we are dealing with means. If it is just one mean, you are all set. If we are looking at the difference between two means, you need to determine if they are paired or independent. • Recognizing what you need to do is half the battle. Once you have accomplished that, it is just a matter of putting the right pieces together. Every confidence interval requires a sample estimate, a multiplier, and a standard error, and you should have the right formulas written down for each type of CI. Once you have made the correct diagnosis, just plug and have fun with the calculations!

Data Table based on Each Observation • Example 1: John records the number of blue eyed individuals from a sample of 60 men and 60 women. Construct an appropriate confidence interval for the difference between men and women with respect to blue eyes. Think about what variables are recorded for each subject. In this case we have gender and eye color for each subject, both of which are categorical variables. When we want to compare the categorical response variable (Blue Eyes) over 2-levels of the categorical predictor variable (Gender), we want a confidence interval for 2 Proportions. Note: Means would make NO SENSE here. You can’t have the mean of a categorical variable! Construct a 95% CI for the difference in the proportion of men and women who have blue eyes.

Example 2: John recorded the lengths of height of 50 randomly chosen redwood trees in State College. He is interested in estimating the average height of redwood trees in State College. It is easy to see that the data would consist of a single quantitative variable (height) measured for each tree. An appropriate CI might be a 95% CI for the mean height of redwood trees. That is a CI for 1 Mean. Note:If height had been replaced by a categorical variable (e.g. Tree greater than 200 ft - Yes/No) then a confidence interval for 1 Proportion would have been appropriate.

Examples: Independent vs. Paired Data Independent Data: Occurs when the observations are not related in any way. For example taking a random sample of 50 males and 50 females and recording their SAT scores. The scores from the first female and the first male are NOT related. The observations are independent. Paired Data: Occurs when the observations are paired. For example if we select 50 random subjects to participate in a diet study and we record their weights before and after. The weight before is paired with the weight after for each individual. Paired data occurs when either there are repeated measurements on the same unit (e.g. before and after some treatment) or if the units themselves are naturally paired (ex. twins, husband and wife, etc. )

Structure of Paired and Independent Data • Independent Data: A random sample of 400 apples is taken off the shelf at a grocery store. The apples are classified as yellow or red, and the amount of vitamin C in each apple is recorded. What type of CI makes sense here? A CI for the difference in the mean amount of vitamin C between yellow and red apples. That is a CI for 2 Means.

Structure of Paired and Independent Data • Paired Data: A random sample of 200 patients is administered a new cholesterol drug. The patients cholesterol is recorded before and after taking the drug. What type of CI makes sense here? A CI for the mean decrease in cholesterol. That is a CI for 1 Mean based on the pair-wise differences (decrease in cholesterol).

Practice… • Twenty-five people have their blood pressure measured in the morning and again in the afternoon. The data will be used to determine whether blood pressure increases during the day. Independent Paired • What is the difference in average ages at which teachers and plumbers retire? Independent Paired • A sample of 100 students at a university was asked how many hours a week they spent studying and how many they spent socializing. The difference was computed for each student. Independent Paired • What is the difference in average salaries for high school graduates and college graduates? Independent Paired • Students are asked their actual weight and their ideal weight in order to determine how far they are from their "goal". Independent Paired

General Format of a CI • In Chapter 10 we have seen how to create confidence interval for a proportion. Recall that a β% C.I. for some population proportion p is where is the sample proportion (the statistic), and the z* multiplier depends on the desired confidence level, β% and is obtained from the standard normal tables. More specifically, z* is such that P(-z*<Z<z*)= β%. • In general, the format of a CI for a parameter is Sample Estimate ± Multiplier x Standard Error of the Sample Estimate • In the following, we will see what is the appropriate sample statistic what is its standard error and how to obtain the multiplier for each of the situations.

CI for One Mean • Here is the case were we want to make inference about the population mean of a quantitative random variable. • The sample statistic used in this case is the sample mean • The standard error of the sample mean is where s is the sample standard deviation, and n is the sample size. • It remains to specify the “Multiplier” in the general form of a CI. To do so we nee to introduce some further distribution theory. • In Chapter 9 we have seen that if we have a sample from a population with some mean µ and some standard deviation σ, then under some conditions is normal with mean µ and std deviation σ/√n. Equivalently, • If σ was known, based on this result we would be able to create a CI for µ. However, usually this is not the case.

CI for One Mean • Replacing σ with s, we have that if one of the following conditions is true: • the random variable of interest is bell-shaped (in practice, for small samples the data should show no extreme skewness or outliers). • the random variable is not bell-shaped, but a large random sample is measured, n ≥ 30. • Some Properties of the t-distribution: • There are infinitely many t-distributions, each characterized by one parameter, the degrees of freedom (df). • The degrees of freedom are positive integers, e.g. 1,2,… • Random variables with t-distribution are continuous. • The density curve of a t-distribution is symmetric, bell-shaped and centered at zero (similar to the standard normal curve). • As the degrees of freedom increase, the variance of the t- random variable decreases, i.e. the density curve is less spread, and actually it approaches the standard normal density. (That implies that the density curve of a t-distribution is more spread out than the standard normal curve.)

CI for One Mean • Based on these results we have that the multiplier for the confidence interval of µ is the value in the t-distribution with df=n-1, such that the area between the (multiplier) and the -(multiplier) is equal to the desired confidence level. • The multiplier in this case is denoted with t*. • We can easily obtain the values of the multiplier from Table A2. Here are some examples for the values of the multiplier: • n= 41 (i.e. df=40), confidence level 95%, t*=2.02. • n= 10 (i.e. df=9), confidence level 99%, t*=3.25. Summary – Steps to obtain CI for µ: • Check if the condition is satisfied, i.e. bell shaped population or n≥30. • Calculate • Based on the required confidence level, β%, and the degrees of freedom (n-1), use Table A2 to get the multiplier t*. • The β% CI for µ is

Special case of CI for 1 Mean: CI for Paired Data • Consider the example were we are interested in the difference in the mean blood pressure before exercise and after exercise. • We are interested in estimating µ1 -µ2 for µ1: mean blood pressure before exercise µ2: mean blood pressure after exercise. • For each person we have two measurements resulting in two samples, the ''before'' sample (the values of blood pressure before exercise) and the "after" sample (the values of blood pressure before exercise). • However, we are just interested in the difference between the "before" measurement and the "after" measurement. So, for each pair of values we computer their difference resulting in one sample of the differences. Then, using the sample of the differences we can create a C.I. for the population mean of the differences using the same procedure as the CI for one mean! • Let µd= the population mean of the differences, and the sample mean of the differences, then • The CI for µd is where sd is the sample standard deviation of the differences.

Difference between two means (Independent Samples). Steps to obtain CI of µ1 - µ2(difference between 2 pop. Means): 1. Check if the following conditions are valid: • The two samples are independent. • Each sample is either coming from a bell shaped population or the sample size is ≥30. 2. Calculate the sample statistic and the standard error where n1, n2 are the sizes of the two samples and s12,s22 are the variances of the two samples. 3. The multiplier for the confidence interval is a t-multiplier (t*) and the df are approximately equal to the lesser of n1-1 and n2 -1. 4. The β% CI for µ1 - µ2 is

Difference between two proportions(Independent Samples). Steps to obtain CI of p1 – p2 (difference between 2 pop. Prop.): 1. Check if the following conditions are valid: • The two samples are independent. • All the quantities are at least 5 and preferably at least 10. 2. Calculate the sample statistic and the standard error where n1, n2 are the sizes of the two samples and are the sample proportions in the two samples. 3. The multiplier for the confidence interval is a z-multiplier (z*) like in the one sample case, i.e. P(-z*<Z<z*)= β%. 4. The β% CI for p1 – p2 is

Table of CI Types

Conditions Necessary for Confidence Intervals • 1 mean or Difference Between Paired Means Population is normal (bell-shaped) or n≥30. • Difference Between 2 Independent means At least one of the above conditions must hold for BOTH samples. The two samples are indepentet. • Difference Between 2 Proportions Both AND must be greater than or equal to 10.

Example 1 Veronica records the weights of 64 adult black bears trapped in New York in the fall of 2002. The sample mean weight was 210 lbs and with a standard deviation of 25 lbs. Construct a 95% confidence interval for the mean weight of adult black bears. • The parameter of interest is μ, the population mean weight of black bears. • Conditions: The sample size is greater than 30, n=64> 30. • The multiplier is a t*. Use table A.2 in your text. The df = n-1 = 63 and the CI level=95%. Note: If they do not have the specific df, then use the next LOWEST number in the table. So for df=60, we get t*=2. • 95% CI for μ: 210± 2(3.125) = (203.8,216.3) • Interpretation: We are 95% confident that the mean weight of adult black bears is between 203.8 and 216.3 lbs.

Example 2 Margaret conducts a study to determine the difference in opinion between men and women on abortion. She randomly asks 200 men and 300 women whether they are pro-life or pro-choice. 80 men and 180 women say they are pro-choice. Construct a 99% confidence interval for the difference in the proportion of men and women who are pro-choice. • The parameter of interest is pf – pm . • Conditions: All quantities, and are greater than 10. • For 99% confidence level z*= 2.58. • The 99% CI for pf - pm is: .20 ±2.58(.0447) = (.085,.315). • Interpretation: We are 99% confident that the proportion of females who are pro-choice is between 8.5% and 31.5% greater than the proportion of males who are pro-choice.

Identifying the C.I For each example below, decide which type of confidence interval should be calculated. • We want to estimate the difference between the heights of smokers and non-smokers at PSU. • We want to calculate an interval that contains the fraction of all PSU students who are right-handed. • We want to capture the difference between the proportions of smokers and non-smokers at PSU who have two or more tattoos. • We want to estimate the daily sugar intake (in grams) of adult Americans.

More About Confidence Intervals

More About Confidence Intervals

Presentation Transcript

CONFIDENCE INTERVALS

Confidence Intervals

Confidence Intervals

Confidence Intervals

Confidence Intervals

Confidence Intervals

Confidence Intervals

Confidence Intervals

Confidence intervals

Confidence Intervals

Confidence Intervals

Confidence Intervals

Confidence Intervals

Confidence Intervals

Confidence Intervals

Confidence Intervals

CONFIDENCE INTERVALS

Confidence Intervals

Confidence Intervals