A Casual Tutorial on Sample Size Planning
Download
1 / 87

A Casual Tutorial on Sample Size Planning for Multiple Regression Models - PowerPoint PPT Presentation


  • 63 Views
  • Uploaded on

A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics. Area = 0.16. 1.00. Area = 0.47. 2.00. Area = 0.81. 3.00. Area = 0.955. 3.87. Buzzwords.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' A Casual Tutorial on Sample Size Planning for Multiple Regression Models' - ulysses-miller


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

A Casual Tutorial on Sample Size Planning

for Multiple Regression Models

D. Keith Williams M.P.H. Ph.D.

Department of Biostatistics






Buzzwords
Buzzwords

  • Beta () = P(Type II error) = P(Conclude the experimental groups are the same when they really are different)

  • Power = 1 -  = P(Conclude experimental groups are different when they really are!)



An example scenario
An Example Scenario

  • Alpha =0.05, sigma=2

  • |mu1 – mu2| = 2, that is, a two unit diff in means for a population

  • Propose n1 = 10 and n2 = 10



Noncentrality value =2.236, Critical value = |2.101|

Table B.5, Values between 2.0 and 3.0, alpha = 0.05, df = 18

Power between 0.47 and 0.81, SAS calculation 0.56195


The key point of the review
The Key Point of the Review

  • One conjectures the difference in means to estimate power in studies that compare means.

  • In regression models, one conjectures the difference in R-square between a model that includes predictors of interest and a model without these predictors.


Regression power and sample size
Regression Power and Sample Size

  • Power for specific predictors in the presence of other covariates in a model.

  • More complex to conceptualize than testing differences among means.



The hypothetical scenario a model with 4 terms
The Hypothetical ScenarioA model with 4 terms

Predictors for PSA of interest that we choose to power:

  • SVI

  • c_volume

    Two Covariates to be included :

    cpen, gleason


Approaches in estimating the parameters to calculate power plan a
Approaches in Estimating the Parameters to Calculate Power Plan A

  • Complete specification of the parts for the expression:


Details
Details

The full model

We want to power the test that a model with these

2 predictors is statistically better than a model excluding them.

The reduced model


Full model

Note

Full Model

Predictors of interest


Reduced model
Reduced Model

Note

R-Square difference

0.45 – 0.34=

0.11


procpower ;

multreg

model=fixed

alpha= .05

nfullpredictors= 4

ntestpredictors= 2

rsqfull=0.45

rsqdiff=0.11

ntotal= 978070605040

power=. ;

plot x=n min=40 max=100

key = oncurves

yopts=(ref=0.8.977 crossref=yes)

;

run;

The POWER Procedure

Type III F Test in Multiple Regression

Fixed Scenario Elements

Method Exact

Model Fixed X

Number of Predictors in Full Model 4

Number of Test Predictors 2

Alpha 0.05

R-square of Full Model 0.45

Difference in R-square 0.11

Computed Power

N

Index Total Power

1 97 0.979

2 80 0.949

3 70 0.916

4 60 0.864

5 50 0.787

6 40 0.677




Piece 1

Correlation of Y with all Predictors


Piece 2

Correlation of All Predictors with Each Other


Piece 3

Correlation of Y with Reduced Model Predictors


Piece 4

Correlation of All Reduced Predictors with Each Other




Different rsquare reductions
Different Rsquare Reductions

procpower ;

multreg

model=fixed

alpha= .05

nfullpredictors= 4

ntestpredictors= 2

rsqfull=0.45

rsqdiff=0.11.10.09.08

ntotal= 978070605040

power=. ;

plot x=n min=40 max=100

key = oncurves

yopts=(ref=0.8.977 crossref=yes)

;

run;



prociml;

%let phi=0.35;

%let rx=0.2;

phi_yx_full={&phi,&phi,.2,.2};

rxx_full={1 &rx &rx &rx ,

&rx 1 &rx &rx ,

&rx &rx 1 &rx ,

&rx &rx &rx 1 };

phi_yx_red={&rx,&rx};

rxx_red={1 &rx ,

&rx 1 };

r2_full=(phi_yx_full)` * (rxx_full**(-1)) * (phi_yx_full);

r2_red=phi_yx_red` * rxx_red**(-1) * phi_yx_red;

r2diff=r2_full-r2_red;

partial = (r2diff/(1-r2_red))**.5;

print r2_full r2_red r2diff partial;

run;quit;

R2_FULL R2_RED R2DIFF PARTIAL

0.2171875 0.0666667 0.1505208 0.4015873


The POWER Procedure

Type III F Test in Multiple Regression

Fixed Scenario Elements

Method Exact

Model Fixed X

Number of Predictors in Full Model 4

Number of Test Predictors 2

Alpha 0.05

R-square of Full Model 0.22

Computed Power

R-square N

Index Diff Total Power

1 0.15 40 0.659

2 0.15 50 0.770

3 0.15 60 0.850

4 0.15 70 0.905

5 0.16 40 0.689

6 0.16 50 0.798

7 0.16 60 0.873

8 0.16 70 0.923

procpower ;

multreg

model=fixed

alpha= .05

nfullpredictors= 4

ntestpredictors= 2

rsqfull=0.22

rsqdiff=0.15.16

ntotal= 40506070

power=. ;

plot x=n min=40 max=100

key = oncurves

yopts=(ref=0.8 crossref=yes)

;

run;


Plan b
Plan B

  • Specify the typical value of the multiple partial correlation coefficient between Y and X.

  • Multiple correlation coefficient describes the overall relationship between Y and 2 or more predictors controlling for still other variables.


Using our example
Using Our Example

  • Say that we conjecture that the partial correlation between our Y and X’s of interest is:

  • For our example this value was 0.408

Recall Rsqare diff in full and reduced models


The POWER Procedure

Type III F Test in Multiple Regression

Fixed Scenario Elements

Method Exact

Model Fixed X

Number of Predictors in Full Model 4

Number of Test Predictors 2

Alpha 0.05

Computed Power

Partial N

Index Corr Total Power

1 0.408 97 0.979

2 0.408 80 0.949

3 0.408 60 0.864

4 0.408 50 0.787

5 0.408 40 0.677

6 0.350 97 0.910

7 0.350 80 0.843

8 0.350 60 0.713

9 0.350 50 0.623

10 0.350 40 0.514

procpower ;

multreg

model=fixed

alpha= .05

nfullpredictors= 4

ntestpredictors= 2

partialcorr= .408.35

ntotal= 9780605040

power=. ;

plot x=n min=40 max=100

key = oncurves

yopts=(ref=.8.85.977 crossref=yes)

;run;

Note n=4*10=40

under powers


Plan c use the table from gatsonis and sampson 1989
Plan CUse the Table from Gatsonis and Sampson (1989)


U : the number of predictors of interest=2

p : the total number of predictors in the model=4

N = table value + p + 1

For 80% power N = 72 + 4 + 1 = 77


Proc power and the table
Proc Power and the Table

The POWER Procedure

Type III F Test in Multiple Regression

Fixed Scenario Elements

Method Exact

Model Random X

Number of Predictors in Full Model 4

Number of Test Predictors 2

Alpha 0.05

Total Sample Size 77

Computed Power

Partial

Index Corr Power

1 0.35 0.802

2 0.40 0.908

procpower ;

multreg

model=random

alpha= .05

nfullpredictors= 4

ntestpredictors= 2

partialcorr= .35.40

ntotal= 77

power=. ;

plot x=n min=60 max=120

key = oncurves

yopts=(ref=.8.90 crossref=yes)

;run;


Comments
Comments

  • Power and sample size is ‘tricky.’

  • The n= 10 for each predictor will almost always under power a study.

  • Plan A or B using the matrix mult is likely the best. One can specify regular correlations instead of partial correlations.

  • This talk was developed with fixed effects, arguably one should plan for random effects unless for an experiment. SAS can easily calculate this. Gatsonis tables provide power for random effect settings. (usually n’s are close)


Further work for somebody
Further Work for Somebody

  • A corresponding multiple logistic regression approach, that is, powering more than one predictor of interest with additional covariates in the model.


An Algorithm for Estimating Power and Sample Size for Logistic Models with One or More Independent Variables of Interest

Jay Northern

D. Keith Williams, PhD

Zoran Bursac, PhD

Joint Statistical Meetings, Denver, CO August 3 – August 7, 2008


Background
Background Logistic Models with One or More Independent Variables of Interest

  • Existing tools are based on Hsieh, Block, and Larsen (1998) paper,and Agresti (1996) text.

    • PASS

    • %powerlog macro


Macro details
Macro Details Logistic Models with One or More Independent Variables of Interest

  • Fit the full and the reduced model

    • In the reduced model one can exclude one or more covariates of interest in order to test them simultaneously in the presence of other covariates

  • Perform the likelihood ratio test with appropriate chi-square critical value based on correct number of degrees of freedom


Results
Results Logistic Models with One or More Independent Variables of Interest


End Logistic Models with One or More Independent Variables of Interest


Plan c exchangeable matrix in plan a
Plan C Logistic Models with One or More Independent Variables of Interest Exchangeable Matrix in Plan A


Full correlation matrix
Full Correlation Matrix Logistic Models with One or More Independent Variables of Interest


The correlation of y with all x s full model
The Correlation of Y with All X’s Logistic Models with One or More Independent Variables of Interest Full Model


Correlation matrix of x s full model
Correlation Matrix of X’s Logistic Models with One or More Independent Variables of Interest Full Model


The correlation of y with all x s reduced model
The Correlation of Y with All X’s Logistic Models with One or More Independent Variables of Interest Reduced Model


Correlation matrix of x s
Correlation Matrix of X’s Logistic Models with One or More Independent Variables of Interest


Regular correlations versus partial correlations
Regular Correlations Logistic Models with One or More Independent Variables of Interest Versus Partial Correlations


Correlation matrix
Correlation Matrix Logistic Models with One or More Independent Variables of Interest

Reduced Rxy

Full R xy

X’s of interest

Covariates in reduced model Rxx


Correlation matrix1
Correlation Matrix Logistic Models with One or More Independent Variables of Interest

Reduced Rxy

Full R xy

X’s of interest

Covariates in reduced model Rxx


Correlation matrix2
Correlation Matrix Logistic Models with One or More Independent Variables of Interest

Reduced Rxy

Full R xy

X’s of interest

Covariates in reduced model Rxx


The gold standard approach some matrix algebra
The Gold Standard Approach Logistic Models with One or More Independent Variables of Interest Some Matrix Algebra


=0.35 Logistic Models with One or More Independent Variables of Interest


The gold standard approach some matrix algebra1
The Gold Standard Approach Logistic Models with One or More Independent Variables of Interest Some Matrix Algebra


=0.35 Logistic Models with One or More Independent Variables of Interest


Full correlation matrix1
Full Correlation Matrix Logistic Models with One or More Independent Variables of Interest


The correlation of y with all x s full model1
The Correlation of Y with All X’s Logistic Models with One or More Independent Variables of Interest Full Model


Correlation matrix of x s full model1
Correlation Matrix of X’s Logistic Models with One or More Independent Variables of Interest Full Model


The correlation of y with all x s reduced model1
The Correlation of Y with All X’s Logistic Models with One or More Independent Variables of Interest Reduced Model


Correlation matrix of x s1
Correlation Matrix of X’s Logistic Models with One or More Independent Variables of Interest


The calculations
The Calculations Logistic Models with One or More Independent Variables of Interest

Power = 0.97


The POWER Procedure Logistic Models with One or More Independent Variables of Interest

Type III F Test in Multiple Regression

Fixed Scenario Elements

Method Exact

Model Fixed X

Number of Predictors in Full Model 7

Number of Test Predictors 2

Alpha 0.05

R-square of Full Model 0.250568

R-square of Reduced Model 0.111111

Computed Power

N

Index Total Power

1 50 0.753

2 60 0.836

3 70 0.894

4 80 0.933

5 97 0.970

procpower ;

multreg

model=fixed

alpha= .05

nfullpredictors= 7

ntestpredictors= 2

rsqfull=0.2505682

rsqdiff=0.1111111

ntotal= 5060708097

power=. ;

plot x=n min=60 max=100

key = oncurves

yopts=(ref=.8.85.9.95 crossref=yes)

;

run;


Matrix arithmetic with compound correlation matrix1
Matrix Arithmetic with Compound Correlation Matrix Logistic Models with One or More Independent Variables of Interest


The POWER Procedure Logistic Models with One or More Independent Variables of Interest

Type III F Test in Multiple Regression

Fixed Scenario Elements

Method Exact

Model Fixed X

Number of Predictors in Full Model 4

Number of Test Predictors 2

Alpha 0.05

R-square of Full Model 0.22

Computed Power

R-square N

Index Diff Total Power

1 0.15 40 0.659

2 0.15 50 0.770

3 0.15 60 0.850

4 0.15 70 0.905

5 0.16 40 0.689

6 0.16 50 0.798

7 0.16 60 0.873

8 0.16 70 0.923

procpower ;

multreg

model=fixed

alpha= .05

nfullpredictors= 4

ntestpredictors= 2

rsqfull=0.22

rsqdiff=0.15.16

ntotal= 40506070

power=. ;

plot x=n min=40 max=100

key = oncurves

yopts=(ref=0.8 crossref=yes)

;

run;


Calculations
Calculations Logistic Models with One or More Independent Variables of Interest

The number of predictors of interest

2

The total number of predictors in the model

4


Approaches in estimating the parameters to calculate power plan a1
Approaches in Estimating the Parameters to Calculate Power Logistic Models with One or More Independent Variables of Interest Plan A

  • Complete specification of the parts for the expression:

= 0.45

= 0.34


Approaches in estimating the parameters to calculate power plan a2
Approaches in Estimating the Parameters to Calculate Power Logistic Models with One or More Independent Variables of Interest Plan A

  • Complete specification of the parts for the expression:


Total area in blue. Logistic Models with One or More Independent Variables of Interest

Power = 0.97

F(2,92)

F(2,92,19.4)

3.07

19.4

Critical Value for alpha = .05

Noncentrality Parameter


ad