1 / 19

Objectives (BPS chapter 24)

Objectives (BPS chapter 24). Inference for regression Conditions for regression inference Estimating the parameters Using technology Testing the hypothesis of no linear relationship Testing lack of correlation Confidence intervals for the regression slope Inference about prediction

druce
Download Presentation

Objectives (BPS chapter 24)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Objectives (BPS chapter 24) Inference for regression • Conditions for regression inference • Estimating the parameters • Using technology • Testing the hypothesis of no linear relationship • Testing lack of correlation • Confidence intervals for the regression slope • Inference about prediction • Checking the conditions for inference

  2. The data in a scatterplot are a random sample from a population that may exhibit a linear relationship between x and y. Different sample  different plot. Now we want to describe the populationmean response my as a function of the explanatory variable x:my = a + bx. And we want to assess whether the observed relationship is statistically significant (not entirely explained by chance events due to random sampling).

  3. The regression model The least-squares regression line ŷ = a + bxis a mathematical model of form “sample data = fit + residual.” For each data point in the sample, the residual is the difference (y− ŷ). At the population level, the model becomes yi = (a + bxi) + (ei)with residuals ei independent and normally distributed N(0, s). The population mean response my is my = a + bx

  4. my = a + bx The intercept a,the slope b, and the standard deviation s of y are the unknown parameters of the regression model. We rely on the random sample data to provide unbiased estimates of these parameters. • The value of ŷ from the least-squares regression line is really a prediction of the mean value of y (my) for a given value of x. • The least-squares regression line [ŷ = a + bx] obtained from sample data is the best estimate of the true population regression line [my = a + bx]. ŷ unbiased estimate for mean responsemy aunbiased estimate for intercepta bunbiased estimate for slopeb

  5. Conditions for inference • The observations are independent. • The relationship is indeed linear. • The standard deviation of y,σ, is the same for all values of x. • The response y varies normallyaround its mean.

  6. For any fixed x, the responses y follow a Normal distribution with standard deviation s. Regression assumes equal variance of y (s is the same for all values of x). The population standard deviation sfor y at any given value of x represents the spread of the normal distribution of the ei around the mean my. The regression standard error, s, for n sample data points is calculated from the residuals (yi – ŷi): s is an unbiased estimate of the regression standard deviation s.

  7. Confidence interval for the slope β Estimating the regression parameter b for the slope is a case of one-sample inference with σunknown. Hence we rely on t distributions. The standard error of the slope b is: (s is the regression standard error.) Thus, a level C confidence interval for the slope b is: estimate ± t*SEestimate b± t* SEb t* is t critical for t(df = n − 2) density curve with C% between –t* and +t*

  8. Testing the hypothesis of no relationship To test for the existence of a significant relationship, we can test if the parameter for the slope bis significantly different from zero using a one-sample t-test procedure. The standard error of the slope b is: We test the hypotheses H0: b= 0 Ha: b≠ 0, >0, or <0 (two- or one-sided) We calculatet = b/SEb which has the t (n – 2) distribution to find the P-value of the test.

  9. Testing for lack of correlation The regression slope b and the correlation coefficient r are related and b = 0 r = 0. Similarly, the population parameter for the slope β is related to the population correlation coefficient ρ, and when β= 0 ρ = 0. Thus, testing the hypothesis H0: β = 0 is the same as testing the hypothesis of no correlation between x and y in the population from which our data were drawn.

  10. Inference about prediction One use of regression is for predicting the value of y, ŷ, for any value of x within the range of data tested: ŷ = a + bx. But the regression equation depends on the particular sample drawn. More reliable predictions require statistical inference To estimate an individual response y for a given value of x, we use a prediction interval. If we randomly sampled many times, there would be many different values of yobtained for a particular x following N(0, σ) around the mean response µy.

  11. The level Cprediction interval for a single observation on y when x takes the value x* is: ŷ± t*n − 2 SEŷ t* for t distribution with n – 2 df 95% prediction interval for ŷ The prediction interval represents mainly the error from the normal distribution of the residuals ei. Graphically, a series of confidence intervals for the whole range of x values is shown as a continuous interval on either side of ŷ.

  12. Confidence interval for µy We may also want to predict the population mean value of y,µy, for any value of x within the range of data tested. Using inference, we calculate a level Cconfidence intervalfor the population mean μy of all responses y when x takes the value x*: This interval is centered on ŷ, the unbiased estimate of μy.The true value of the population mean μy at a givenvalue of x will indeed be within our confidence interval in C% of all intervals calculated from many different random samples.

  13. The level Cconfidence interval for the mean response μy at a given value x* of x is centered on ŷ (unbiased estimate of μy): ŷ± tn − 2 * SEm^ t* for t distribution with n – 2 df 95% confidence interval for my A separate confidence interval is calculated for μy along all the values that x takes. Graphically, the series of confidence intervals for the whole range of x values is shown as a continuous interval on either side of ŷ.

  14. The confidence intervalfor μycontains with C% confidence the population mean μy of all responses at a particular value of x. • The prediction interval contains C% of all the individual values taken by y at a particular value of x. Least-squares regression line 95% prediction interval for ŷ 95% confidence interval for my Estimating my uses a smaller confidence interval than estimating an individual in the population because the sampling distribution is narrower than the population distribution.

  15. Residuals are randomly scattered  good! Curved pattern  the relationship is not linear. Change in variability across plotσ not equal for all values of x.

  16. Example The annual bonuses ($ 1000) of six randomly selected emplyees and their years of services were recorded. We wish to analyze the relationship between the two variables. Data was analyzed using MINITAB. The output is shown below Predictor Coef SE Coef T P Constant 0.933 4.192 0.22 0.835 Years 2.114 1.076 1.96 0.121 S = 4.50291 R-Sq = 49.1% R-Sq(adj) = 36.4% Predicted Values for New Observations New Obs Fit SE Fit 95% CI 95% PI 1 11.50 2.45 (4.71, 18.30) (-2.72, 25.73) Values of Predictors for New Observations New Obs Years 1 5.00

  17. Example a. What is the equation of the least squares regression line ? b. Calculate the 95% confidence interval for the true slope coefficient. c. Based on the above output, at the .05 level of significance, test if slope β is significantly different from zero.  The test is not significant, fail to reject null hypothesis

  18. Example d. What is the predicted annual bonus of an employee with 5 years of service ? e. What is the value of the residual for the data value (5, 17)? f. Construct a 95% prediction interval for a single employee’s bonus whose year of service is 7 years.

  19. Example f. Construct a 95% confidence interval for the mean bonus when years of service is 7.

More Related