1 / 87

A Casual Tutorial on Sample Size Planning for Multiple Regression Models

A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics. Area = 0.16. 1.00. Area = 0.47. 2.00. Area = 0.81. 3.00. Area = 0.955. 3.87. Buzzwords.

Download Presentation

A Casual Tutorial on Sample Size Planning for Multiple Regression Models

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics

  2. Area = 0.16 1.00

  3. Area = 0.47 2.00

  4. Area = 0.81 3.00

  5. Area = 0.955 3.87

  6. Buzzwords • Beta () = P(Type II error) = P(Conclude the experimental groups are the same when they really are different) • Power = 1 -  = P(Conclude experimental groups are different when they really are!)

  7. The Non Centrality ParameterTwo Group t-test

  8. An Example Scenario • Alpha =0.05, sigma=2 • |mu1 – mu2| = 2, that is, a two unit diff in means for a population • Propose n1 = 10 and n2 = 10

  9. Rejection region for two tailed t-test alpha=0.05, df = 18

  10. Noncentrality value =2.236, Critical value = |2.101| Table B.5, Values between 2.0 and 3.0, alpha = 0.05, df = 18 Power between 0.47 and 0.81, SAS calculation 0.56195

  11. The Key Point of the Review • One conjectures the difference in means to estimate power in studies that compare means. • In regression models, one conjectures the difference in R-square between a model that includes predictors of interest and a model without these predictors.

  12. Regression Power and Sample Size • Power for specific predictors in the presence of other covariates in a model. • More complex to conceptualize than testing differences among means.

  13. Example Data Set

  14. The Hypothetical ScenarioA model with 4 terms Predictors for PSA of interest that we choose to power: • SVI • c_volume Two Covariates to be included : cpen, gleason

  15. Approaches in Estimating the Parameters to Calculate Power Plan A • Complete specification of the parts for the expression:

  16. Details The full model We want to power the test that a model with these 2 predictors is statistically better than a model excluding them. The reduced model

  17. Note Full Model Predictors of interest

  18. Reduced Model Note R-Square difference 0.45 – 0.34= 0.11

  19. procpower ; multreg model=fixed alpha= .05 nfullpredictors= 4 ntestpredictors= 2 rsqfull=0.45 rsqdiff=0.11 ntotal= 978070605040 power=. ; plot x=n min=40 max=100 key = oncurves yopts=(ref=0.8.977 crossref=yes) ; run; The POWER Procedure Type III F Test in Multiple Regression Fixed Scenario Elements Method Exact Model Fixed X Number of Predictors in Full Model 4 Number of Test Predictors 2 Alpha 0.05 R-square of Full Model 0.45 Difference in R-square 0.11 Computed Power N Index Total Power 1 97 0.979 2 80 0.949 3 70 0.916 4 60 0.864 5 50 0.787 6 40 0.677

  20. Great, but I don’t have a dataset

  21. Use the Correlation Matrix

  22. Piece 1 Correlation of Y with all Predictors

  23. Piece 2 Correlation of All Predictors with Each Other

  24. Piece 3 Correlation of Y with Reduced Model Predictors

  25. Piece 4 Correlation of All Reduced Predictors with Each Other

  26. Matrix Arithmetic with Correlation Matrix

  27. Hold on, we will find out to do this arithmetic later

  28. Different Rsquare Reductions procpower ; multreg model=fixed alpha= .05 nfullpredictors= 4 ntestpredictors= 2 rsqfull=0.45 rsqdiff=0.11.10.09.08 ntotal= 978070605040 power=. ; plot x=n min=40 max=100 key = oncurves yopts=(ref=0.8.977 crossref=yes) ; run;

  29. Matrix Arithmetic with Compound Correlation Matrix

  30. prociml; %let phi=0.35; %let rx=0.2; phi_yx_full={&phi,&phi,.2,.2}; rxx_full={1 &rx &rx &rx , &rx 1 &rx &rx , &rx &rx 1 &rx , &rx &rx &rx 1 }; phi_yx_red={&rx,&rx}; rxx_red={1 &rx , &rx 1 }; r2_full=(phi_yx_full)` * (rxx_full**(-1)) * (phi_yx_full); r2_red=phi_yx_red` * rxx_red**(-1) * phi_yx_red; r2diff=r2_full-r2_red; partial = (r2diff/(1-r2_red))**.5; print r2_full r2_red r2diff partial; run;quit; R2_FULL R2_RED R2DIFF PARTIAL 0.2171875 0.0666667 0.1505208 0.4015873

  31. The POWER Procedure Type III F Test in Multiple Regression Fixed Scenario Elements Method Exact Model Fixed X Number of Predictors in Full Model 4 Number of Test Predictors 2 Alpha 0.05 R-square of Full Model 0.22 Computed Power R-square N Index Diff Total Power 1 0.15 40 0.659 2 0.15 50 0.770 3 0.15 60 0.850 4 0.15 70 0.905 5 0.16 40 0.689 6 0.16 50 0.798 7 0.16 60 0.873 8 0.16 70 0.923 procpower ; multreg model=fixed alpha= .05 nfullpredictors= 4 ntestpredictors= 2 rsqfull=0.22 rsqdiff=0.15.16 ntotal= 40506070 power=. ; plot x=n min=40 max=100 key = oncurves yopts=(ref=0.8 crossref=yes) ; run;

  32. Plan B • Specify the typical value of the multiple partial correlation coefficient between Y and X. • Multiple correlation coefficient describes the overall relationship between Y and 2 or more predictors controlling for still other variables.

  33. Using Our Example • Say that we conjecture that the partial correlation between our Y and X’s of interest is: • For our example this value was 0.408 Recall Rsqare diff in full and reduced models

  34. The POWER Procedure Type III F Test in Multiple Regression Fixed Scenario Elements Method Exact Model Fixed X Number of Predictors in Full Model 4 Number of Test Predictors 2 Alpha 0.05 Computed Power Partial N Index Corr Total Power 1 0.408 97 0.979 2 0.408 80 0.949 3 0.408 60 0.864 4 0.408 50 0.787 5 0.408 40 0.677 6 0.350 97 0.910 7 0.350 80 0.843 8 0.350 60 0.713 9 0.350 50 0.623 10 0.350 40 0.514 procpower ; multreg model=fixed alpha= .05 nfullpredictors= 4 ntestpredictors= 2 partialcorr= .408.35 ntotal= 9780605040 power=. ; plot x=n min=40 max=100 key = oncurves yopts=(ref=.8.85.977 crossref=yes) ;run; Note n=4*10=40 under powers

  35. Plan CUse the Table from Gatsonis and Sampson (1989)

  36. U : the number of predictors of interest=2 p : the total number of predictors in the model=4 N = table value + p + 1 For 80% power N = 72 + 4 + 1 = 77

  37. Proc Power and the Table The POWER Procedure Type III F Test in Multiple Regression Fixed Scenario Elements Method Exact Model Random X Number of Predictors in Full Model 4 Number of Test Predictors 2 Alpha 0.05 Total Sample Size 77 Computed Power Partial Index Corr Power 1 0.35 0.802 2 0.40 0.908 procpower ; multreg model=random alpha= .05 nfullpredictors= 4 ntestpredictors= 2 partialcorr= .35.40 ntotal= 77 power=. ; plot x=n min=60 max=120 key = oncurves yopts=(ref=.8.90 crossref=yes) ;run;

  38. Comments • Power and sample size is ‘tricky.’ • The n= 10 for each predictor will almost always under power a study. • Plan A or B using the matrix mult is likely the best. One can specify regular correlations instead of partial correlations. • This talk was developed with fixed effects, arguably one should plan for random effects unless for an experiment. SAS can easily calculate this. Gatsonis tables provide power for random effect settings. (usually n’s are close)

  39. Further Work for Somebody • A corresponding multiple logistic regression approach, that is, powering more than one predictor of interest with additional covariates in the model.

  40. An Algorithm for Estimating Power and Sample Size for Logistic Models with One or More Independent Variables of Interest Jay Northern D. Keith Williams, PhD Zoran Bursac, PhD Joint Statistical Meetings, Denver, CO August 3 – August 7, 2008

  41. Background • Existing tools are based on Hsieh, Block, and Larsen (1998) paper,and Agresti (1996) text. • PASS • %powerlog macro

  42. Macro Details • Fit the full and the reduced model • In the reduced model one can exclude one or more covariates of interest in order to test them simultaneously in the presence of other covariates • Perform the likelihood ratio test with appropriate chi-square critical value based on correct number of degrees of freedom

More Related