1 / 51

Don't Be Loopy: Re-Sampling and Simulation the SAS® Way

Don't Be Loopy: Re-Sampling and Simulation the SAS® Way. David L. Cassell Design Pathways Corvallis, OR. Introduction. Bootstrapping Jackknifing Cross-validation Simulations Monte Carlo … and on and on…. First, the BAD WAY. The typical bootstrap code – a huge macro loop Slow Awkward

lindsey
Download Presentation

Don't Be Loopy: Re-Sampling and Simulation the SAS® Way

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Don't Be Loopy: Re-Sampling and Simulation the SAS® Way David L. Cassell Design Pathways Corvallis, OR

  2. Introduction Bootstrapping Jackknifing Cross-validation Simulations Monte Carlo … and on and on… David L. Cassell, Design Pathways

  3. First, the BAD WAY The typical bootstrap code – a huge macro loop Slow Awkward Very complex code Log-filling Output-clogging Did I mention ‘slow’? David L. Cassell, Design Pathways

  4. The typical BAD bootstrap code – a huge macro loop %do i = 1 %to &REPS ; %* steps to generate one data set; %* the proc to do the analysis; %* some way of appending the new results; %end; %* a proc to compute the bootstrap estimates; %mend; David L. Cassell, Design Pathways

  5. Interlude – What is a bootstrap? Types of Re-sampling: Random draws Designed subsets Exchange labels David L. Cassell, Design Pathways

  6. Interlude – What is a bootstrap? Want to approximate sampling distribution Simple: SRS with replacement from original sample Non-parametric (mostly) Want: bias, std error, CI, or … Assumptions: exchangeability, … David L. Cassell, Design Pathways

  7. Interlude – What is a bootstrap? We’ll start with the simple bootstrap Get a URS sample of size N Compute your statistic Repeat B=1000 or 10,000 or … times Look at the behavior of your B values David L. Cassell, Design Pathways

  8. Interlude – What is a bootstrap? Warning: do not forget exchangeability! The simple / naïve bootstrap doesn’t work right on: Time series data Repeated measures data Survey sample data Data with analytic weights . . . . David L. Cassell, Design Pathways

  9. Interlude – What is a bootstrap? A common approach is the bootstrap percentile interval: Take your B values from before Pull the 2.5th and 97.5th percentiles to get a 95% percentile interval as your CI David L. Cassell, Design Pathways

  10. The typical BAD bootstrap code – a huge macro loop %macro bootie ( input=, reps= ); %do i = 1 %to &REPS ; %* steps to generate one data set; %* the proc to do the analysis; %* some way of appending the new results; %end; %* a proc to compute the bootstrap estimates; %mend; David L. Cassell, Design Pathways

  11. A Better Bootstrap 1. Generate ALL of the bootstrap samples as one data set 2. Use the same proc as before, but use by-processing 3. Use the same computations to get the bootstrap estimates David L. Cassell, Design Pathways

  12. A Better Bootstrap proc surveyselect data=YourData out=outboot seed=30459584 method=urs samprate=1 outhits rep=1000; run; David L. Cassell, Design Pathways

  13. A Better Bootstrap proc univariate data=outboot; var x; by Replicate; output out=out1 q1=q1 median=med q3=q3; run; data out2; set out1; trimean = (q1 + 2*med + q3) / 4; run; David L. Cassell, Design Pathways

  14. A Better Bootstrap proc univariate data=out2; var trimean; output out=final pctlpts=2.5, 97.5 pctlpre=ci; run; David L. Cassell, Design Pathways

  15. A Better Bootstrap – More sasfile YourData load; proc surveyselect data=YourData out=outboot seed=30459584 method=urs samprate=1 outhits rep=1000; run; sasfile YourData close; David L. Cassell, Design Pathways

  16. A Better Bootstrap – More ods listing close; proc univariate data=outboot; var x; by Replicate; output out=out1 q1=q1 median=med q3=q3; run; ods listing; David L. Cassell, Design Pathways

  17. A Better Bootstrap – ODS OUTPUT ods output Modes=modal; proc univariate data=outboot modes; var YourVariable; by Replicate; run; ods output close; David L. Cassell, Design Pathways

  18. Case Resampling Simple bootstrap, as before Apply to: PROC REG, PROC LOGISTIC, …. The approach can be criticized on several grounds David L. Cassell, Design Pathways

  19. Case Resampling data test; x=1; y=45; output; do x = 2 to 29; y = 3*x + 6*rannor(1234); output; end; x=30; y=45; output; run; David L. Cassell, Design Pathways

  20. y 90 80 70 60 50 40 30 20 10 0 0 10 20 30 x Case Resampling David L. Cassell, Design Pathways

  21. Case Resampling ods listing close; proc surveyselect data=temp1 out=boot1 seed=38474 method=urs samprate=1 outhits rep=1000; run; proc reg data=boot1 outest=est1(drop=_:); model y=x; by replicate; run; ods listing; David L. Cassell, Design Pathways

  22. Case Resampling proc univariate data=est1; var x; output out=final pctlpts=2.5, 97.5 pctlpre=ci; run; David L. Cassell, Design Pathways

  23. Case Resampling proc robustreg data=temp1 method=MM; model y=x; run; David L. Cassell, Design Pathways

  24. Case Resampling PROC REG (1.74, 2.80) bootstrap (case resampling) (1.65, 2.90) PROC ROBUSTREG (2.39, 3.13) David L. Cassell, Design Pathways

  25. Resampling residuals Fit the model Bootstrap sample for the residuals Add the randomly resampled e to Y-hat Fit the model for each of the B reps Compute bootstrap estimates David L. Cassell, Design Pathways

  26. Resampling residuals 1 perform the regression, get Y-hat and e 2 split the data 3 copy the FIT data set repeatedly 4 URS sample of residuals for each replicate 5 merge residuals with records 6 fit the model on each replicate 7 compute bootstrap estimates David L. Cassell, Design Pathways

  27. Resampling residuals proc reg data=test; model y=x; output out=out1 p=yhat r=res; run; David L. Cassell, Design Pathways

  28. Resampling residuals data fit(keep=yhat x order) resid(keep=res); set out1; order+1; run; David L. Cassell, Design Pathways

  29. Resampling residuals proc surveyselect data=fit out=outfit method=srs samprate=1 rep=1000; run; David L. Cassell, Design Pathways

  30. Resampling residuals data outres2; do replicate = 1 to 1000; do order = 1 to numrecs; p = ceil( numrecs * ranuni(394747373) ); set resid nobs=numrecs point=p; output; end; end; stop; run; David L. Cassell, Design Pathways

  31. Resampling residuals data prepped; merge outfit outres2; by replicate order; new_y = yhat + res; run; David L. Cassell, Design Pathways

  32. Resampling residuals proc reg data=prepped outest=est1( drop=_: ); model new_y = x; by replicate; run; David L. Cassell, Design Pathways

  33. Resampling residuals proc univariate data=est1; var x; output out=final pctlpts=2.5, 97.5 pctlpre=ci; run; David L. Cassell, Design Pathways

  34. ?The? Bootstrap? Simple bootstrap Residual resampling Parametric bootstrap Smooth bootstrap Wild bootstrap Double bootstrap Various ‘adjusted’ bootstraps . . . . . David L. Cassell, Design Pathways

  35. The Jackknife Non-parametric N systematic samples of size N-1 Less general than the bootstrap Easier to apply to complex sampling schemes David L. Cassell, Design Pathways

  36. The Jackknife data outb; do replicate = 1 to numrecs; do rec = 1 to numrecs; set test nobs=numrecs point=rec; if replicate ^= rec then output; end; end; stop; run; David L. Cassell, Design Pathways

  37. The Jackknife ods listing close; proc univariate data=outb; var y; by replicate; output out=outall kurtosis=curt; run; ods listing; David L. Cassell, Design Pathways

  38. The Jackknife proc univariate data=outall; var curt; output out=final mean=jmean std=jstd; run; David L. Cassell, Design Pathways

  39. Randomization Tests Resampling plan Re-label the data points randomly Compare against original Random subset of full permutation test David L. Cassell, Design Pathways

  40. Cross-Validation Another type of resampling plan K replicate samples Each sample uses (K-1)/K to model and 1/K for testing David L. Cassell, Design Pathways

  41. Cross-Validation LOOCV – Leave-One-Out Cross-Validation K-fold Cross-Validation Random K-fold Cross-Validation David L. Cassell, Design Pathways

  42. Random K-Fold Cross-Validation %let K=10; %let rate= %sysevalf( (&K-1) / &K ); proc surveyselect data=temp1 out=xv seed=495857 samprate=&RATE outall rep=&K ; run; data xv; set xv; if selected then new_y=y; run; David L. Cassell, Design Pathways

  43. Random K-Fold Cross-Validation proc reg data=xv; model new_y=x; by replicate; output out=out1(where=(new_y=.)) p=yhat; run; David L. Cassell, Design Pathways

  44. Random K-Fold Cross-Validation data out2; set out1; d=y-yhat; absd=abs(d); run; proc summary data=out2; var d absd; output out=out3 std(d)=rmse mean(absd)=mae; run; David L. Cassell, Design Pathways

  45. Monte Carlo Simulations Sample from theoretical distributions Sample from population of data points David L. Cassell, Design Pathways

  46. Simulations proc surveyselect data=largefile out=process_set seed=45884743 method=srs sampsize=1000; run; data processor; array{5,5} a1-a25; set process_set; . . . . . run; David L. Cassell, Design Pathways

  47. Simulations proc plan seed=4958584; factors replicate=100 ordered SiteNo = 30 of 200 / noprint; output out=plan9; run; David L. Cassell, Design Pathways

  48. CONCLUSIONS Cassell’s “7 Habits of Highly Effective SAS-ers” • KNOW YOUR PROBLEM • USE THE RIGHT TOOL • FEWER STEPS GET YOU FARTHER • STAY TALL AND THIN • TOO MUCH OF A GOOD THING IS BAD • SKIP THE EXPENSIVE STUFF • SHARPEN THE SAW David L. Cassell, Design Pathways

  49. CONCLUSIONS SAS is great at resampling and simulations. You just have to code it in SAS instead of something else! Don’t run 5003 steps when 3 steps will do it. Don’t assume everything is a macro problem. David L. Cassell, Design Pathways

  50. CONCLUSIONS Resampling methods and simulations do not solve all your problems. Use your brain before you use your keyboard. David L. Cassell, Design Pathways

More Related