Effective Research Design & Grant Planning: Power & Precision

Power (and Precision)Effective Research Design Planningfor Grant Proposals & More Walt Stroup, Ph.D. Professor & Chair, Department of Statistics University of Nebraska, Lincoln SSP Core Facility

Outline for Talk • What is “Power Analysis”? Why should I do it? • Essential Background • A Word about Software • Decisions that Affect Power – several examples • Latest Thinking • Final Thoughts SSP Core Facility

Power and Precision Defined • Precision a.k.a “Margin of Error” • In most cases, the standard error of relevant estimate • Power • Prob { reject H0 given H0 false } • Prob { research hypothesis statistically significant } • Power analysis • essentially, “If I do the study this way, power = ?” • Sample size estimation • How many observations required to achieve given power? SSP Core Facility

What’s involved in Power Analysis • WHAT IT’S NOT: • “Painting by numbers...” • IF IT’S DONE RIGHT • Power analysis should be • a comprehensive conversation to plan the study • a “dress rehearsal” for the statistical analysis once the data are collected SSP Core Facility

Why do a Power Analysis? • For NIH Grant Proposal • because it’s required • For many other grant proposals • because it gives you a competitive edge • Other reasons • practical: increases chance of success; reduces “we don’t have time to do it right, but lots of time to do it over” syndrome • ethical SSP Core Facility

Ethical??? • Last Ph.D. in U.S. Senate • Irritant to doctrinaire left and right • Keynote address to 1997 American Stat. Assoc. “... we can continue to make policy based on ‘data-free ideology’ on we can inform policy where possible by competent inquiry...” late U.S. Senator Daniel Patrick Moynihan SSP Core Facility

Ethical • Results of your study may affect policy • Well-conceived research means • better information • greater chance of sound decisions • Poorly-conceived research • lost opportunity • deprives policy-makers of information that might have been useful • or worse: bad information misinforms or misleads public SSP Core Facility

What affects Power & Precision? • A short statistics lesson • What goes into computing test statistics • What test statistics are supposed to tell us • A bit about the distribution of test statistics • Central and non-central t, F, and chi-square ( mostly F ) SSP Core Facility

What goes into a test statistic? Research hypothesis – motivation for study Assumed not true unless data show compelling evidence otherwise Research hypothesis: HA ; opposite: H0 SSP Core Facility

What goes into a test statistic? • Visualize using F • But same basic principles for t, chi-square, etc • F is ratio of variation attributable to factor under study vs. variation attributable to noise N of obs effect size variance of noise (i.e. among obs) SSP Core Facility

When H0 True – i.e. no trt effect SSP Core Facility

When H0 false (i.e. Research HA true) SSP Core Facility

What affects Power? N of obs effect size variance of noise (i.e. among obs) SSP Core Facility

What should be in a conversation about Power? • Effect size: what is the minimum that matters? • Variance: how much “noise” in the response variable (range? distribution? count? pct?) • Practical Constraints • Design: same N can produce varying Power N of obs effect size variance of noise (i.e. among obs) SSP Core Facility

About Software (part I) • Canned Software • lots of it • Xiang and Zhou working on report • “painting by numbers” • Simulation • most accurate; not constrained by canned scenarios • you can see what will happen if you actually do this... • “Exemplary data set” + modeling software • nearly as accurate as simulation • “dress rehearsal” for actual analysis • MIXED, GLIMMIX, NLMIXED: if you can model it you can do power analysis SSP Core Facility

Design Decisions – Some Examples • Main Idea: For the same amount of effort, or $$$, or # observations, power and precision can be quite different • Power analysis objective: Work smarter, not harder • Simple example – design of regression study • From STAT 412 exercise SSP Core Facility

Treatment Design Exercise • Class was asked to predict Bounce Height of basketball from Drop Heightandto see if relationship changes depending on floor surface • Decision: What drop heights to use??? SSP Core Facility

Objectives and Operating Definitions • Recall objective: does drop: bounce height relationship change with floor surface? operating definition SSP Core Facility

Consequences of Drop Height Decisions • Should we use fewer drops heights & more obs per drop height or vice versa? table from Stat 412 Avery archive SSP Core Facility

Simulation • CRD example: 3 treatments, 5 reps / treatment • Suspected Effect size:6-10% relative to control, whose mean is known to be ~ 100 • Standard deviation: 10 considered “reasonable” • Simulate 1000 experiments • Reject H0: equal trt means 228 times • power = 0.228 at alpha=0.05 • Ctl mean ranked correctly 820 times • (intermediate mean rankedcorrectly 589 times) SSP Core Facility

“Exemplary Data” • Many software packages for power & sample size • e.g SAS PROC POWER • for FIXED effect models only • “Exemplary Data” more general • Especially (but not only) when “Mixed Model Issues” • random effects • split-plot structure • errors potentially correlated: longitudinal or spatial data • any other non-standard model structure • Methods use PROC MIXED or GLIMMIX • adapted from Stroup (2002, JABES) • Chapter 12, SAS for Mixed Models • (Littell, et al, 2006) SSP Core Facility

“Exemplary Data” - Computing Power using SAS • create data set like proposed design • run PROC GLIMMIX (or MIXED) with variance fixed • =(F computed by GLIMMIX)rank(K) [or chi-sq with GLM] • use GLIMMIX to compute  • critical F (Fcrit ) is value s.t. P{F(rank(K), υ, 0 ) > Fcrit}=  [or chi-square] • Power = P{F[rank(K), υ, ] >Fcrit } • SAS functions can compute Fcrit & Power SSP Core Facility

Compute Power with GLIMMIX – CRD example /* step 1 - create data set with same structure as proposed design use MU (expected mean) instead of observed Y_ij values */ /* this example shows power for 5, 10, and 15 e.u. per trt */ data crdpwrx1; input trt mu; do n=5to15by5; do eu=1to n; output; end; end; cards; 1 100 2 94 3 90 ; SSP Core Facility

Compute Power with GLIMMIX – CRD example /* step 2 - use PROC GLIMMIX to compute non-centrality parameters for ANOVA tests & contrasts ODS statements output them to new data sets */ procsortdata=crdpwrx1; by n; procglimmix data=crdpwrx1; by n; class trt; model mu=trt; parms (100)/hold=1; contrast'et1 v et2' trt 01 -1; contrast'c vs et' trt 2 -1 -1; odsoutput tests3=b; odsoutput contrasts=c; run; SSP Core Facility

/* step 3: combine ANOVA & contrast n-c parameter data sets use SAS functions PROBF and FINV to compute power */ data power; set b c; alpha=0.05; ncparm=numdf*fvalue; fcrit=finv(1-alpha,numdf,dendf,0); power=1-probf(fcrit,numdf,dendf,ncparm); procprint; Note close agreement of Simulated Power (0.228) and “exemplary data” power (0.224) Obs Effect Label DF DenDF alpha nc fcrit power 1 trt 2 12 0.05 2.53333 3.88529 0.22361 2 et1 v et2 1 12 0.05 0.40000 4.74723 0.08980 3 c vs et 1 12 0.05 2.13333 4.74723 0.26978 SSP Core Facility

More Advanced Example • Plots in 8 x 3 grid • Main variation along 8 “rows” • 3 x 2 treatment design • Alternative designs • randomized complete block (4 blocks, size 6) • incomplete block (8 blocks, size 3) • split plot • RCBD “easy” but ignores natural variation SSP Core Facility

Picture the 8 x 3 Grid Gradient e.g. 8 schools, gradient is “SES”, 3 classrooms each SSP Core Facility

SAS Programs to Compare 8 x 3 Design data a; input bloc trtmnt @@; do s_plot=1to3; input dose @@; mu=trtmnt*(0*(dose=1)+4*(dose=2)+8*(dose=3)); output; end; cards; 1 1 1 2 3 1 2 1 2 3 2 1 1 2 3 2 2 1 2 3 3 1 1 2 3 3 2 1 2 3 4 1 1 2 3 4 2 1 2 3 ; Split-Plot procglimmix data=a noprofile; class bloc trtmnt dose; model mu=bloc trtmnt|dose; random trtmnt/subject=bloc; parms (4) (6) / hold=1,2; lsmeans trtmnt*dose / diff; contrast'trt x lin' trtmnt*dose 10 -1 -101; odsoutput diffs=b; odsoutput contrasts=c; run; SSP Core Facility

8 x 3 – Incomplete Block data a; input bloc @@; do eu=1to3; input trtmnt dose @@; mu=trtmnt*(0*(dose=1)+4*(dose=2)+8*(dose=3)); output; end; cards; 1 1 1 1 2 1 3 2 1 1 1 2 2 2 3 1 1 1 3 2 3 4 1 1 2 1 2 2 5 1 2 1 3 2 2 6 1 2 2 1 2 3 7 1 3 2 1 2 3 8 2 1 2 2 2 3 ; procglimmix data=a noprofile; class bloc trtmnt dose; model mu=trtmnt|dose; random intercept / subject=bloc; parms (4) (6) / hold=1,2; lsmeans trtmnt*dose / diff; contrast'trt x lin' trtmnt*dose 10 -1 -101; odsoutput diffs=b; odsoutput contrasts=c; run; SSP Core Facility

8 x 3 Example - RCBD data a; input trtmnt dose @@; do bloc=1to4; mu=trtmnt*(0*(dose=1)+4*(dose=2)+8*(dose=3)); output; end; cards; 1 1 1 2 1 3 2 1 2 2 2 3 ; procglimmix data=a noprofile; class bloc trtmnt dose; model mu=bloc trtmnt|dose; parms (10) / hold=1; lsmeans trtmnt*dose / diff; contrast'trt x lin' trtmnt*dose 10 -1 -101; odsoutput diffs=b; odsoutput contrasts=c; run; SSP Core Facility

How did designs compare? • Suppose main objective is compare regression over 3 levels of doses: do they differ by treatment? (similar to basketball experiment) • Operating definition is thus H0: dose regression coefficient equal • Power forRandomized Block:0.66 • Power for Incomplete Block: 0.85 • Power for Split-Plot: 0.85 • Same # observations – you can work smarter SSP Core Facility

But what if I don’t know Trt Effect Size or Variance? • “How can I do a power analysis? If I knew the effect size and the variance I wouldn’t have to do the study.” • What trt effect size is NOT: it is NOT the effect size you are going to observe • It is somewhere between • what current knowledge suggests is a reasonable expectation • minimum difference that would be considered “important” or “meaningful” SSP Core Facility

And Variance?? • Know thy relevant background / Do thy homework • Literature search: what have others working with similar subjects reported as variance? • Pilot study • Educated guess • range you’d expect 95% of likely obs? divide it by 4 • most extreme values you can plausibly imagine? divide range by 6 SSP Core Facility

Hierarchical Linear Models • From Bovaird (10-27-2006) seminar • 2 treatment • 20 classrooms / trt • 25 students / classroom • 4 years • reasonable ideas of classroom(trt), student(classroom*trt), within student variances as well as effect size • Implement via exemplary data + GLIMMIX SSP Core Facility

Categorical Data? • Example: Binary data • “Standard” has success probability of 0.25 • “New & Improved” hope to increase to 0.30 • Have N subjects at each of L locations • For sake of argument, suppose we have • 900 subjects / location • 10 locations SSP Core Facility

Power for GLMs • 2 treatments • P{favorable outcome} • for trt 1 p= 0.30; for trt 2 p=0.25 • power if n1=300; n2=600 data a; input trt y n; datalines; 1 90 300 2 150 600 ; proc glimmix; class trt; model y/n=trt / chisq; ods output tests3=pwr; run; data power; set pwr; alpha=0.05; ncparm=numdf*chisq; crit=cinv(1-alpha,numdf,0); power=1-probchi(crit,numdf,ncparm); proc print; run; exemplary data SSP Core Facility

Power for GLMM • Same trt and sample size per location as before • 10 locations • Var(Location)=0.25; Var(Trt*Loc)=0.125 • Variance Components: variation in log(OddsRatio) • Power? data a; input trt y n; do loc=1 to 10; output; end; datalines; 1 90 300 2 150 600 ; proc glimmix data=a initglm; class trt loc; model y/n = trt / oddsratio; random intercept trt / subject=loc; random _residual_; parms (0.25) (0.125) (1) / hold=1,2,3; ods output tests3=pwr; run; SSP Core Facility

GLMM Power Analysis Results Gives you expected Conf Limits for # Locations & N / Loc contemplated Gives you the power of the test of TRT effect on prob(favorable) SSP Core Facility

GLMM Power: Impact of Sample Size? • N of subjects per trt per location? • N of Locations? • Three cases • n-300/600 10 loc • n=600/1200, 10 loc • n=300/600, 20 loc data a; input trt y n; do loc=1 to 10; output; end; datalines; 1 90 300 2 150 600 ; data a; input trt y n; do loc=1 to 10; output; end; datalines; 1 180 600 2 300 1200 ; data a; input trt y n; do loc=1 to 20; output; end; datalines; 1 90 300 2 150 600 ; SSP Core Facility

GLMM Power: Impact of Sample Size? Recall, for 10 locations, N=300/600, CI for OddsRatio was (0.884, 1.871); Power was 0.274 For 10 locations, N=600 / 1200 N alone has almost no impact For 20 locations, N=300 / 600 SSP Core Facility

Recent developments • Continue binary example • Power analysis shows: what do you do? SSP Core Facility

More Information • Consider studies directed toward improving success rate similar to that proposed in study • Lit search yields 95 such studies • 29 have reported statistically significant gains of p1-p2>0.05 (or, alternatively, significant odds ratios of [(30/70)/(25/75)]=1.28 or greater) • If this holds, “prior” prob (desired effect size ) is approx 0.3 SSP Core Facility

An Intro Stat Result real Pr{type I error} is more like 0.23 than 0.10!!! SSP Core Facility

Returning to All Scenarios NOTE dramatic impact of alpha-level when “prior” Pr { DES } is relatively low POWER role increases at Pr { DES } increases SSP Core Facility

Closing Comments • In case it’s not obvious • I’m not a fan of “painting by numbers” • Role of power analysis misunderstood & underappreciated • MOST of ALLit is an opportunity to explore and rehearse study design & planned analysis • Engage statistician as a participating member of research team • Give it the TIME it REQUIRES SSP Core Facility

Thanks ... for coming

Effective Research Design & Grant Planning: Power & Precision