1 / 31

Topic 6: Estimation and Prediction of Y h

Topic 6: Estimation and Prediction of Y h. Outline. Estimation and inference of E(Y h ) Prediction of a new observation Construction of a confidence band for the entire regression line. Estimation of E(Y h ).

shadi
Download Presentation

Topic 6: Estimation and Prediction of Y h

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Topic 6: Estimation and Prediction of Yh

  2. Outline • Estimation and inference of E(Yh) • Prediction of a new observation • Construction of a confidence band for the entire regression line

  3. Estimation of E(Yh) • E(Yh) = μh =β0+ β1Xh, the mean value of Y for the subpopulation with X=Xh • We will estimate E(Yh) by • KNNL use for this estimate, see equation (2.28) on pp 52

  4. Theory for Estimation of E(Yh) • is Normal with mean μh and variance • The Normality is a consequence of the fact that b0 + b1Xh is a linear combination of Yi’s • See KNNL pp 52-54 for details

  5. Application of the Theory • We estimate σ2( ) by • It then follows that • Details for confidence intervals and significance tests are consequences

  6. 95% Confidence Interval for E(Yh) • ± tcs( ) where tc = t(.975, n-2) • NOTE: significance tests can be constructed but they are rarely used in practice

  7. Toluca Company Example (pg 19) • Manufactures refrigeration equipment • One replacement part manufactured in lots of varying sizes • Company wants to determine the optimum lot size • To do this, company needs to first describe the relationship between work hours and lot size

  8. Scatterplot w/ regr line

  9. SAS CODE ***Generating the data set***; data toluca; infile ‘../data/CH01TA01.txt'; input lotsize hours; data other; size=65; output; size=100; output; data toluca1; set toluca other; proc print data=toluca1; run;

  10. SAS CODE ***Generating the confidence intervals for all values of X in the data set***; proc reg data=toluca1; model hours=size/clm; id lotsize; run; clm option generates confidence intervals for the mean

  11. Notes • Standard error affected by how far Xh is from (see Figure 2.6) • Recall teeter-totter idea…a change in the slope has bigger impact on Y as you move away from

  12. Prediction of Yh(new) • Want to predict value for a new observation at X=Xh • Model: Yh(new) = β0 + β1Xh +  • Since E(e)=0 same value as for E(Yh) • Prediction interval, however, relies heavily on assumption that e are Normally distributed Note!!

  13. Prediction of Yh(new) • Var(Yh(new))=Var( )+Var(ξ ) • Then follows that

  14. Notes • Procedure can be modified for the mean of m observations at X=Xh (see 2.39a and 239b on page 60) • Standard error affected by how far Xh is from (see Figure 2.6)

  15. SAS CODE ***Generating the prediction intervals for all values of X in data set***; proc reg data=toluca1; model hours=lotsize/cli; id lotsize; run; cli option generates prediction interval for a new observation

  16. These are wrong…same as before. Does not include variability about regression line

  17. Notes • The standard error(Std Error Mean Predict)given in this output is the standard error of not s(pred) • The prediction interval is correct and wider than the previous confidence interval

  18. Notes • To get correct standard error need to add the variance about the regression line

  19. Confidence band for regression line • ± Ws( ) where W2=2F(1-α; 2, n-2) • This gives combined “confidence intervals” for all Xh • Boundary values of confidence bands define a hyperbola • Will be wider at Xh than single CI

  20. Confidence band for regression line • Theory comes from the joint confidence region for (β0, β1 ) which is an ellipse (Stat 524) • We can find an alpha for tc that gives the same results • We find W2 and then find the alpha for tc that will give W = tc

  21. SAS CODE data a1; n=25; alpha=.10; dfn=2; dfd=n-2; tsingle=tinv(1-alpha/2,dfd); w2=2*finv(1-alpha,dfn,dfd); w=sqrt(w2); alphat=2*(1-probt(w,dfd)); t_c=tinv(1-alphat/2,dfd); output; proc print data=a1; run;

  22. SAS OUTPUT Used for 90% confidence band Used for single 90% CI

  23. SAS CODE symbol1 v=circle i=rlclm97; proc gplot data=toluca; plot hours*lotsize; run;

  24. Estimation of E(Yh) and Prediction of Yh = b0 + b1Xh

  25. SAS CODE symbol1 v=circle i=rlclm95; proc gplot data=toluca; plot hours*lotsize; symbol1 v=circle i=rlcli95; proc gplot data=toluca; plot hours*lotsize; run; Confidence intervals Prediction intervals

  26. Confidence band

  27. Confidence intervals

  28. Prediction intervals

  29. Background Reading • Program topic6.sas has the code for the various plots and calculations; • Sections 2.7, 2.8, and 2.9

More Related