1 / 27

BOOTSTRAPPING LINEAR MODELS

INTRODUCTION TO. BOOTSTRAPPING LINEAR MODELS. V & R 6.6. Stat 6601 Presentation. Presented by: Xiao Li (Winnie) Wenlai Wang Ke Xu Nov. 17, 2004. Bootstrapping Linear Models. 11/17/2004. Preview of the Presentation. Introduction to Bootstrap Data and Modeling

jael-knapp
Download Presentation

BOOTSTRAPPING LINEAR MODELS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. INTRODUCTION TO BOOTSTRAPPING LINEAR MODELS V & R 6.6 Stat 6601 Presentation Presented by: Xiao Li (Winnie) Wenlai Wang Ke Xu Nov. 17, 2004

  2. Bootstrapping Linear Models 11/17/2004 Preview of the Presentation • Introduction to Bootstrap • Data and Modeling • Methods on Bootstrapping LM • Results • Issues and Discussion • Summary

  3. Bootstrapping Linear Models 11/17/2004 What is Bootstrapping ? • Invented by Bradley Efron, and further developed by Efron and Tibshirani • A method for estimating the sampling distribution of an estimator by resampling with replacement from the original sample • A method to determine the trustworthiness of a statistic (generalization of the standard deviation)

  4. Bootstrapping Linear Models 11/17/2004 Why uses Bootstrapping ? • Start with 2 questions: • What estimator should be used? • Having chosen an estimator, how accurate is it? • Linear Model with normal random errors having constant variance  Least Square • Generalized non-normal errors and non-constant variance  ???

  5. Bootstrapping Linear Models 11/17/2004 The Mammals Data • A data frame with average brain and body weights for 62 species of land mammals. • “body” : Body weight in Kg • “brain” : Brain weight in g • “name”: Common name of species

  6. Bootstrapping Linear Models 11/17/2004 Data and Model Linear Regression Model: where j = 1, …, n, and is considered random y = log(brain weight) x = log(body weight)

  7. See Code Bootstrapping Linear Models 11/17/2004 Summary of Original Fit Residuals: Min 1Q Median 3Q Max -1.71550 -0.49228 -0.06162 0.43597 1.94829 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2.13479 0.09604 22.23 <2e-16 *** log(body) 0.75169 0.02846 26.41 <2e-16 *** Residual standard error: 0.6943 on 60 DF Multiple R-Squared: 0.9208 Adjusted R-squared: 0.9195 F-statistic: 697.4 on 1 and 60 DF p-value: < 2.2e-16

  8. Bootstrapping Linear Models 11/17/2004 for Original Modeling Code library(MASS) library(boot) c <- par(mfrow=c(1,2)) data <- data(mammals) plot(mammals$body, mammals$brain, main='Original Data', xlab='body weight', ylab='brain weight', col=’brown’) # plot of data plot(log(mammals$body), log(mammals$brain), main='Log-Transformed Data', xlab='log body weight', ylab='log brain weight', col=’brown’) # plot of log-transformed data mammal <- data.frame(log(mammals$body), log(mammals$brain)) dimnames(mammal) <- list((1:62), c("body", "brain")) attach(mammal) log.fit <- lm(brain~body, data=mammal) summary(log.fit)

  9. Bootstrapping Linear Models 11/17/2004 Two Methods • Case-based Resampling: randomly sample pairs (Xi, Yi) with replacement • No assumption about variance homogeneity • Design fixes the information content of a sample • Model-based Resampling: resample the residuals • Assume model is correct with homoscedastic errors • Resampling model has the same “design” as the data

  10. Bootstrapping Linear Models 11/17/2004 Case-Based Resample Algorithm For r = 1, …, R, • sample randomly with replacement from {1, 2, …,n} • for j = 1, …, n, set , then • fit least squares regression to , …, giving estimates , , .

  11. Bootstrapping Linear Models 11/17/2004 Model-Based Resample Algorithm For r = 1, …, n, • For j = 1, … , n, • Set • Randomly sample from , …, ; then • Set • Fit least squares regression to ,…, giving estimates , , .

  12. Compare Bootstrapping Linear Models 11/17/2004 Case-Based Bootstrap Output: ORDINARY NONPARAMETRIC BOOTSTRAP Bootstrap Statistics : original bias std. error t1* 2.134789 -0.0022155790 0.08708311 t2* 0.751686 0.0001295280 0.02277497 BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS Intervals : Level Normal Percentile BCa 95% ( 1.966, 2.308 ) ( 1.963, 2.310 ) ( 1.974, 2.318 ) 95% ( 0.7069, 0.7962 ) ( 0.7082, 0.7954 ) ( 0.7080, 0.7953 ) Calculations and Intervals on Original Scale

  13. Bootstrapping Linear Models 11/17/2004 Case-Based Bootstrap Bootstrap Distribution Plots for intercept andSlope

  14. Bootstrapping Linear Models 11/17/2004 Case-Based Bootstrap Standardized Jackknife-after-Bootstrap Plots for intercept andSlope

  15. Bootstrapping Linear Models 11/17/2004 for Case-Based Code # Case-Based Resampling fit.case <- function(data) coef(lm(log(data$brain)~log(data$body))) mam.case <- function(data, i) fit.case(data[i, ]) mam.case.boot <- boot(mammals, mam.case, R = 999) mam.case.boot boot.ci(mam.case.boot, type=c("norm", "perc", "bca")) boot.ci(mam.case.boot, index=2, type=c("norm", "perc", "bca")) plot(mam.case.boot) plot(mam.case.boot, index=2) jack.after.boot(mam.case.boot) jack.after.boot(mam.case.boot, index=2) RANDOM!

  16. Compare Bootstrapping Linear Models 11/17/2004 Model-Based Bootstrap Output: ORDINARY NONPARAMETRIC BOOTSTRAP Bootstrap Statistics : original bias std. error t1* 2.134789 0.0049756072 0.09424796 t2* 0.751686 -0.0006573983 0.02719809 BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS Intervals : Level Normal Percentile Bca 95% ( 1.945, 2.315 ) ( 1.948, 2.322 ) ( 1.941, 2.316 ) 95% ( 0.6990, 0.8057 ) ( 0.6982, 0.8062 ) ( 0.6987, 0.8077 ) Calculations and Intervals on Original Scale

  17. Bootstrapping Linear Models 11/17/2004 Model-Based Bootstrap Bootstrap Distribution Plots for intercept andSlope

  18. Bootstrapping Linear Models 11/17/2004 Model-Based Bootstrap Standardized Jackknife-after-Bootstrap Plots for intercept andSlope

  19. Bootstrapping Linear Models 11/17/2004 Code for Model-Based # Model-Based Resampling (Resample Residuals) fit.res <- lm(brain ~ body, data=mammal) mam.res.data <- data.frame(mammal, res=resid(fit.res), fitted=fitted(fit.res)) mam.res <- function(data, i){ d <- data d$brain <- d$fitted + d$res[i] coef(update(fit.res, data=d)) } fit.res.boot <- boot(mam.res.data, mam.res, R = 999) fit.res.boot boot.ci(fit.res.boot, type=c("norm", "perc", "bca")) boot.ci(fit.res.boot, index=2, type=c("norm", "perc", "bca")) plot(fit.res.boot) plot(fit.res.boot, index=2) boot.ci(fit.res.boot, type=c("norm", "perc", "bca")) jack.after.boot(fit.res.boot) jack.after.boot(fit.res.boot, index=2) FIXED!

  20. Bootstrapping Linear Models 11/17/2004 Comparisons and Discussion

  21. Bootstrapping Linear Models 11/17/2004 Case-Based Vs. Model-Based • Model-based resampling enforces the assumption that errors are randomly distributed by resampling the residuals from a common distribution • If the model is not specified correctly – i.e., unmodeled nonlinearity, non-constant error variance, or outliers – these attributes do not carry over to the bootstrap samples • The effects of outliers is clear in the case-based, but not with the model-based.

  22. FAIL Bootstrapping Linear Models 11/17/2004 When Might Bootstrapping Fail? • Incomplete Data • Assume that missing data are not problematic • If multiple imputation is used beforehand • Dependent Data • Bootstrap imposes mutual dependence on the Yj, and thus their joint distribution is • Outliers and Influential Cases • Remove/Correct obvious outliers • Avoid the simulations to depend on particular observations

  23. Bootstrapping Linear Models 11/17/2004 Review & More Resampling • Resampling techniques are powerful tools for: -- estimating SD from small samples -- when the statistics do not have easily determined SD • Bootstrapping involves: -- taking ‘new’ random samples with replacement from the original data -- calculate boostrap SD and statistical test from the average of the statistic from the bootstrap samples • More resampling techniques: -- Jackknife resampling -- Cross-validation

  24. Bootstrapping Linear Models 11/17/2004 SUMMARY • Introduction to Bootstrap • Data and Modeling • Methods on Bootstrapping LM • Results and Comparisons • Issues and Discussion

  25. Bootstrapping Linear Models 11/17/2004 Reference • Anderson, B. “Resampling and Regression” McMaster University. http://socserv.mcmaster.ca/anderson • Davision, A.C. and Hinkley D.V. (1997) Bootstrap methods and their application. pp.256-273. Cambridge University Press • Efron and Gong (February 1983), A Leisurely Look at the Bootstrap, the Jackknife, and Cross Validation, The American Statistician. • Holmes, S. “Introduction to the Bootstrap” Stanford University. http://wwwstat.stanford.edu/~susan/courses/s208/ • Venables and Ripley (2002), Modern Applied Statistics with S, 4th ed. pp. 163-165. Springer

  26. Thank You Applause Please Bootstrapping Linear Models 11/17/2004

  27. Bootstrapping Linear Models 11/17/2004 Extra Stuff… • Jackknife Resampling takes new samples of the data by omitting each case individually and recalculating the statistic each time • Resampling data by randomly taking a single observation out • # of jackknife samples used # of cases in the original sample • Works well for robust estimators of location, but not for SD • Cross-Validation randomly splits the sample into two groups comparing the model results from one sample to the results from the other. • 1st subset is used to estimate a statistical model (screening/training sample) • Then test our findings on the second subset. (confirmatory/test sample)

More Related