BSTA 670 – Statistical Computing

BSTA 670 – Statistical Computing Lecture 16: Resampling Methods (Lecture 2 of 2)

Bias of Estimator • One measure of accuracy of an estimator is the SD. • Another measure of accuracy is bias, the difference between the expected value of estimator and the quantity being estimated.

Bias of an Estimator The bias of an estimator is the difference between the Expected value of the estimator and the quantity being estimated. Both the bootstrap and the jackknife can be used to provide estimates of the bias of an estimator.

Bootstrap Estimate of Bias of Estimator Bootstrap estimate of bias where

Bootstrap Estimate of Bias of Estimator Look at ratio of estimated bias to SE (Bias/SE) to assess the impact of the bias. If this is small, less than about .25 SEs, one can typically ignore the bias. Need more stringent criterion if CIs are desired.

Bootstrap Estimate of Bias of Estimator Increase B to large value, say no less than 1000, to assess bias. Alternative: can use an “better bootstrap bias estimate”. (See Efron and Tibshirani book, 1993)

“Better bootstrap bias estimate” Consider a “plug-in estimate” of the parameter. Here, the function is estimated by the same function of the empirical distribution, . Let , be the vector representing the proportion of bootstrap elements that equal each x. Let , be the resampling vector of these probabilities.

“Better bootstrap bias estimate” The observed value of the statistic can be represented as Here, each point occurs once in the sample out of n. Considered generating B bootstrap samples. Let . The Bootstrap bias estimate can be written as . The “better bootstrap bias estimate can is

“Better bootstrap bias estimate” Both the bootstrap bias estimate and the better bootstrap bias estimate converge, as B goes to infinity, to the “true bias”. The better bootstrap bias estimate converges significantly faster.

Jackknife Maurice Quenouille John Tukey The jackknife was the first of the computer-based methods for estimating standard errors and biases. First proposed in 1956 by Maurice Quenouille as a method for bias reduction. In 1958, John Tukey proposed a jackknife estimate of the standard error.

Jackknife Estimate of Bias Have a data set . The ith jackknife sample, denoted , Is defined to be the original data set, x, with The ith point removed,

Jackknife Estimate of Bias Say the statistic of interest is . The statistic evaluated at , denoted , is the ith jackknife replication of the statistic of interest. The jackknife estimate of bias is:

Jackknife Estimate of Bias Terminology” A little algebra show that this is:

Jackknife Estimate of Bias Jackknife estimate of bias requires n computations, rather than B for the bootstrap.

Jackknife Estimate of the SE Note that the 1/(n-1) for the bootstrap estimate of the SE is replaced by (n-1)/n, which is much larger. Why? The jackknife deviations tend to be much smaller than the bootstrap deviations, . This is due to the fact that the jackknife samples are much more similar to the original sample than are the bootstrap samples.

Jackknife Estimate of the SE The factor (n-1)/n is what is needed to make sejack exactly equal to the unbiased estimate of the se of the mean. The jackknife fails badly if the statistic of interest is not “smooth, meaning that small changes in the data result in small changes in the statistic. (similar to differentiable requirement) An example where the jackknife fails if the median.

Example of Bootstrap & Jackknife in R Estimate variance, confidence intervals, and bias for Coefficient of Variation for Grades from an Introductory Biostatistics class. > # Estimate Variance of CV using Jackknife > > # Data - Grades from class (final + lab + homework) > x<-c(60.05,66.24,66.45,69.44,72.36,74.56,75.22,85.08,85.44, + 85.51,85.52,86.07,86.81,87.01,87.82,88.04,88.05,88.25, + 88.26,88.47,88.52,88.68,89.29,89.55,89.86,89.86,90.00, + 90.12,90.24,90.52,90.52,90.62,91.51,91.65,91.68,91.93, + 91.97,92.06,92.12,92.18,92.29,92.82,93.04,93.27,93.48, + 93.58,93.86,93.92,94.03,94.05,94.13,94.97,96.26,96.42, + 96.61,96.68,97.20,99.37,99.46,99.91)

Example of Bootstrap & Jackknife in R > # CV function > cv<-function(x) sqrt(var(x))/mean(x) > > print(cv(x)) # Print CV of original data [1] 0.09224178 > > # Jackknife estimator loop > # pseudo = vector of pseudovalues > pseudo<-numeric(length(x)) > for(i in 1:length(x)) { + xjack<-x[-i] + pseudo[i]<- length(x)*cv(x) - (length(x)-1)*cv(xjack) + } > > mean(pseudo) [1] 0.09317745 > var(pseudo) [1] 0.01339053 > hist(pseudo) >

Example of Bootstrap & Jackknife in R

Example of Bootstrap & Jackknife in R > varjack<-var(pseudo)/length(x) > > # 95% CI for cv(x) > > LL<-mean(pseudo) - qt(0.975,length(x)-1)*sqrt(var(pseudo)/length(x)) > UL<-mean(pseudo) + qt(0.975,length(x)-1)*sqrt(var(pseudo)/length(x)) > > print( paste("cv of original data: ", cv(x))) [1] "cv of original data: 0.0922417837594268" > print( paste("Variance of cv: ", varjack)) [1] "Variance of cv: 0.000223175555145362" > print( paste("95% Confidence interval for cv based on jackknife: ", LL, " , ", UL)) [1] "95% Confidence interval for cv based on jackknife: 0.06328445853 , 0.12307044411" > >

Example of Bootstrap & Jackknife in R > # Now use bootstrap to obtain variance of cv and percentile CI > B<-1000 > n<-length(x) > > cvb<-numeric(length(x)) > for(i in 1:B){ + xb<-sample(x,replace=T) + cvb[i]<-cv(xb) + } > > meanboot<-mean(cvb) > varboot<-var(cvb) > hist(cvb)

Example of Bootstrap & Jackknife in R

Example of Bootstrap & Jackknife in R > # Percentile Method 95% Confidence Interval > LL<-quantile(cvb,0.025) > UL<-quantile(cvb,0.975) > print( paste("95% Percentile Method Confidence interval for cv: ", LL, " , ", UL)) [1] "95% Percentile Method Confidence interval for cv: 0.06227141977 , 0.11743313880" > > > # Get Bootstrap 95% assuming Normality > # cv_hat +- 1.96*sqrt(varboot) > # But, we should check to see if there is a bias in the bootstrap estimate. > # If so, we can adjust for it. > bias<- mean(cvb) - cv(x) > print(bias) [1] -0.001535666 > > # Bootstrap corrected estimate of the cv > print( cv(x)-bias ) [1] 0.09377745 > # > LL<- (cv(x) - bias) - 1.96*sqrt(varboot) > UL<- (cv(x) - bias) + 1.96*sqrt(varboot) > print( paste("95% Normal Method Bootstrap Confidence interval for cv: ", LL, " , ", UL)) [1] "95% Normal Method Bootstrap Confidence interval for cv: 0.06608547795 , 0.12146942173"

Bootstrapping Regression Models • General linear regression model: • is a known function or form and β is a vector of parameters. • Interested in estimating a SD or CI of a function of the parameters.

Bootstrapping Regression Models There are three methods commonly used for bootstrapping (others also exist): • Method 1: Resample or bootstrap the original observations to generate bootstrap parameter estimates that can be used to obtain bootstrap CIs.

Bootstrapping Regression Models • Method 2: Resample the residuals of the original observations ( ). Compute Determine the function of interest of the . Repeat B times, etc. Notes on Method 2: ►Assumes errors are identically distributed ► High-leverage outlier impact may be lost

Bootstrapping Regression Models • Method 3: Estimate residual variance from sample. For each i=1,…n, generate a residual based on error~N(0,resid var). Construct Y=beta*X + residual. Determine the function of interest of the Repeat B times, etc. Use only if we believe the residual errors are Normally distributed.

Bootstrapping Regression Models In R, can use the “bootstrap” and “boot” libraries for bootstrapping, including Bootstrapping regression models.

Cross-validation Residuals (errors in model) do not provide a reasonable indication of how well the model or expert will perform when used to make predictions from new data not included in development of the prediction rules or model. There are various ways to overcome this problem. The most simplistic approach would be to not use the entire data set for training (modeling). The portion not used then is then used to examine the performance of the model or prediction rules.

Cross-validation This is the basic idea of cross validation, which is a model evaluation method that is superior to residuals. In cross-validation, the original data set is partitioned into smaller data sets. The analysis is performed on a single subset, with the results validated against the remaining subsets. The subset used for the analysis is called the “training” set and the other subsets are called “validation” sets (or “testing” sets).

Three Cross-validation Methods Holdout method: The data set is RANDOMLY separated into two sets, the training set and the testing set. The model is fit to the training set only, usually about 2/3 of the data. Then, the model based on the training set is used to predict the outcomes for the data in the testing set. The “mean absolute test set error” or another function of the error, is used to evaluate the model.

Three Cross-validation Methods Holdout method: The holdout method is preferable over the residual approach, since evaluating the model residuals does not provide an indication of how well the model will perform for new observations. The error evaluation typically has a high variance. The error evaluation may depend heavily on the split between the training and testing data sets.

Three Cross-validation Methods K-fold cross-validation method: The data set is divided into k subsets. The holdout method is repeated k times. Each time, one of the k subsets is used as the test set and the other k-1 subsets are put together to form a training set. The average error across all k trials is computed.

Three Cross-validation Methods K-fold cross-validation method: This approach is less sensitive to the data split as the holdout method. Every data point gets to be in a test set exactly once, and gets to be in a training set k-1 times. The variance of the resulting estimate is reduced as k is increased. A disadvantage of this method: The training has to be completed k times, meaning it takes k times as much computation time.

Three Cross-validation Methods K-fold cross-validation method: A variant of the K-fold cross-validation method: Randomly divide the data into a test and training set k different times. Advantage: You can independently choose how large each test set is and how many trials you average over.

Three Cross-validation Methods Leave-one-out cross validation: This is a K-fold cross validation with K equal to N, the number of data points. The model is trained N separate times, each time on all the data except for one point. A prediction is made for the point left out. The average error is computed and used to evaluate the model.

Cross validation is not the same as a jackknife • Both are resampling procedures, with the major difference lying in their applications. • Cross validation is used for model validation (and model selection). • Jackknife is used for variance and bias estimation.

BSTA 670 – Statistical Computing

BSTA 670 – Statistical Computing

Presentation Transcript

Computing at UF

Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More

Chapter 9 Mobile Computing and Commerce and Pervasive Computing

Component-based Computing implications for Application Architectures

Pervasive and Mobile Computing: A 3-tier Architecture

Statistical Process Control

Introduction to Scientific Computing

Statistical inference for astrophysics

Distributed Cluster Computing Platforms

Unit 1: Statistical Analysis

Optical Computing

Distributed Computing and Analysis

XML 과 웹서비스 - Web 2.0 and Web Service Computing -

Introduction to Supercomputers, Architectures and High Performance Computing

Soft Computing

4-1 Statistical Inference

Introduction to Parallel Computing

Supplement – Statistical Thermodynamics

Domain Adaptation for Statistical Machine Translation