460 likes | 557 Views
Learn about power analysis, sample size estimation, ethical considerations, software options, and design decisions for successful grant proposals. Discover the impact of power and precision on research outcomes and policy formulation.
E N D
Power (and Precision)Effective Research Design Planningfor Grant Proposals & More Walt Stroup, Ph.D. Professor & Chair, Department of Statistics University of Nebraska, Lincoln SSP Core Facility
Outline for Talk • What is “Power Analysis”? Why should I do it? • Essential Background • A Word about Software • Decisions that Affect Power – several examples • Latest Thinking • Final Thoughts SSP Core Facility
Power and Precision Defined • Precision a.k.a “Margin of Error” • In most cases, the standard error of relevant estimate • Power • Prob { reject H0 given H0 false } • Prob { research hypothesis statistically significant } • Power analysis • essentially, “If I do the study this way, power = ?” • Sample size estimation • How many observations required to achieve given power? SSP Core Facility
What’s involved in Power Analysis • WHAT IT’S NOT: • “Painting by numbers...” • IF IT’S DONE RIGHT • Power analysis should be • a comprehensive conversation to plan the study • a “dress rehearsal” for the statistical analysis once the data are collected SSP Core Facility
Why do a Power Analysis? • For NIH Grant Proposal • because it’s required • For many other grant proposals • because it gives you a competitive edge • Other reasons • practical: increases chance of success; reduces “we don’t have time to do it right, but lots of time to do it over” syndrome • ethical SSP Core Facility
Ethical??? • Last Ph.D. in U.S. Senate • Irritant to doctrinaire left and right • Keynote address to 1997 American Stat. Assoc. “... we can continue to make policy based on ‘data-free ideology’ on we can inform policy where possible by competent inquiry...” late U.S. Senator Daniel Patrick Moynihan SSP Core Facility
Ethical • Results of your study may affect policy • Well-conceived research means • better information • greater chance of sound decisions • Poorly-conceived research • lost opportunity • deprives policy-makers of information that might have been useful • or worse: bad information misinforms or misleads public SSP Core Facility
What affects Power & Precision? • A short statistics lesson • What goes into computing test statistics • What test statistics are supposed to tell us • A bit about the distribution of test statistics • Central and non-central t, F, and chi-square ( mostly F ) SSP Core Facility
What goes into a test statistic? Research hypothesis – motivation for study Assumed not true unless data show compelling evidence otherwise Research hypothesis: HA ; opposite: H0 SSP Core Facility
What goes into a test statistic? • Visualize using F • But same basic principles for t, chi-square, etc • F is ratio of variation attributable to factor under study vs. variation attributable to noise N of obs effect size variance of noise (i.e. among obs) SSP Core Facility
When H0 True – i.e. no trt effect SSP Core Facility
When H0 false (i.e. Research HA true) SSP Core Facility
What affects Power? N of obs effect size variance of noise (i.e. among obs) SSP Core Facility
What should be in a conversation about Power? • Effect size: what is the minimum that matters? • Variance: how much “noise” in the response variable (range? distribution? count? pct?) • Practical Constraints • Design: same N can produce varying Power N of obs effect size variance of noise (i.e. among obs) SSP Core Facility
About Software (part I) • Canned Software • lots of it • Xiang and Zhou working on report • “painting by numbers” • Simulation • most accurate; not constrained by canned scenarios • you can see what will happen if you actually do this... • “Exemplary data set” + modeling software • nearly as accurate as simulation • “dress rehearsal” for actual analysis • MIXED, GLIMMIX, NLMIXED: if you can model it you can do power analysis SSP Core Facility
Design Decisions – Some Examples • Main Idea: For the same amount of effort, or $$$, or # observations, power and precision can be quite different • Power analysis objective: Work smarter, not harder • Simple example – design of regression study • From STAT 412 exercise SSP Core Facility
Treatment Design Exercise • Class was asked to predict Bounce Height of basketball from Drop Heightandto see if relationship changes depending on floor surface • Decision: What drop heights to use??? SSP Core Facility
Objectives and Operating Definitions • Recall objective: does drop: bounce height relationship change with floor surface? operating definition SSP Core Facility
Consequences of Drop Height Decisions • Should we use fewer drops heights & more obs per drop height or vice versa? table from Stat 412 Avery archive SSP Core Facility
Simulation • CRD example: 3 treatments, 5 reps / treatment • Suspected Effect size:6-10% relative to control, whose mean is known to be ~ 100 • Standard deviation: 10 considered “reasonable” • Simulate 1000 experiments • Reject H0: equal trt means 228 times • power = 0.228 at alpha=0.05 • Ctl mean ranked correctly 820 times • (intermediate mean rankedcorrectly 589 times) SSP Core Facility
“Exemplary Data” • Many software packages for power & sample size • e.g SAS PROC POWER • for FIXED effect models only • “Exemplary Data” more general • Especially (but not only) when “Mixed Model Issues” • random effects • split-plot structure • errors potentially correlated: longitudinal or spatial data • any other non-standard model structure • Methods use PROC MIXED or GLIMMIX • adapted from Stroup (2002, JABES) • Chapter 12, SAS for Mixed Models • (Littell, et al, 2006) SSP Core Facility
“Exemplary Data” - Computing Power using SAS • create data set like proposed design • run PROC GLIMMIX (or MIXED) with variance fixed • =(F computed by GLIMMIX)rank(K) [or chi-sq with GLM] • use GLIMMIX to compute • critical F (Fcrit ) is value s.t. P{F(rank(K), υ, 0 ) > Fcrit}= [or chi-square] • Power = P{F[rank(K), υ, ] >Fcrit } • SAS functions can compute Fcrit & Power SSP Core Facility
Compute Power with GLIMMIX – CRD example /* step 1 - create data set with same structure as proposed design use MU (expected mean) instead of observed Y_ij values */ /* this example shows power for 5, 10, and 15 e.u. per trt */ data crdpwrx1; input trt mu; do n=5to15by5; do eu=1to n; output; end; end; cards; 1 100 2 94 3 90 ; SSP Core Facility
Compute Power with GLIMMIX – CRD example /* step 2 - use PROC GLIMMIX to compute non-centrality parameters for ANOVA tests & contrasts ODS statements output them to new data sets */ procsortdata=crdpwrx1; by n; procglimmix data=crdpwrx1; by n; class trt; model mu=trt; parms (100)/hold=1; contrast'et1 v et2' trt 01 -1; contrast'c vs et' trt 2 -1 -1; odsoutput tests3=b; odsoutput contrasts=c; run; SSP Core Facility
/* step 3: combine ANOVA & contrast n-c parameter data sets use SAS functions PROBF and FINV to compute power */ data power; set b c; alpha=0.05; ncparm=numdf*fvalue; fcrit=finv(1-alpha,numdf,dendf,0); power=1-probf(fcrit,numdf,dendf,ncparm); procprint; Note close agreement of Simulated Power (0.228) and “exemplary data” power (0.224) Obs Effect Label DF DenDF alpha nc fcrit power 1 trt 2 12 0.05 2.53333 3.88529 0.22361 2 et1 v et2 1 12 0.05 0.40000 4.74723 0.08980 3 c vs et 1 12 0.05 2.13333 4.74723 0.26978 SSP Core Facility
More Advanced Example • Plots in 8 x 3 grid • Main variation along 8 “rows” • 3 x 2 treatment design • Alternative designs • randomized complete block (4 blocks, size 6) • incomplete block (8 blocks, size 3) • split plot • RCBD “easy” but ignores natural variation SSP Core Facility
Picture the 8 x 3 Grid Gradient e.g. 8 schools, gradient is “SES”, 3 classrooms each SSP Core Facility
SAS Programs to Compare 8 x 3 Design data a; input bloc trtmnt @@; do s_plot=1to3; input dose @@; mu=trtmnt*(0*(dose=1)+4*(dose=2)+8*(dose=3)); output; end; cards; 1 1 1 2 3 1 2 1 2 3 2 1 1 2 3 2 2 1 2 3 3 1 1 2 3 3 2 1 2 3 4 1 1 2 3 4 2 1 2 3 ; Split-Plot procglimmix data=a noprofile; class bloc trtmnt dose; model mu=bloc trtmnt|dose; random trtmnt/subject=bloc; parms (4) (6) / hold=1,2; lsmeans trtmnt*dose / diff; contrast'trt x lin' trtmnt*dose 10 -1 -101; odsoutput diffs=b; odsoutput contrasts=c; run; SSP Core Facility
8 x 3 – Incomplete Block data a; input bloc @@; do eu=1to3; input trtmnt dose @@; mu=trtmnt*(0*(dose=1)+4*(dose=2)+8*(dose=3)); output; end; cards; 1 1 1 1 2 1 3 2 1 1 1 2 2 2 3 1 1 1 3 2 3 4 1 1 2 1 2 2 5 1 2 1 3 2 2 6 1 2 2 1 2 3 7 1 3 2 1 2 3 8 2 1 2 2 2 3 ; procglimmix data=a noprofile; class bloc trtmnt dose; model mu=trtmnt|dose; random intercept / subject=bloc; parms (4) (6) / hold=1,2; lsmeans trtmnt*dose / diff; contrast'trt x lin' trtmnt*dose 10 -1 -101; odsoutput diffs=b; odsoutput contrasts=c; run; SSP Core Facility
8 x 3 Example - RCBD data a; input trtmnt dose @@; do bloc=1to4; mu=trtmnt*(0*(dose=1)+4*(dose=2)+8*(dose=3)); output; end; cards; 1 1 1 2 1 3 2 1 2 2 2 3 ; procglimmix data=a noprofile; class bloc trtmnt dose; model mu=bloc trtmnt|dose; parms (10) / hold=1; lsmeans trtmnt*dose / diff; contrast'trt x lin' trtmnt*dose 10 -1 -101; odsoutput diffs=b; odsoutput contrasts=c; run; SSP Core Facility
How did designs compare? • Suppose main objective is compare regression over 3 levels of doses: do they differ by treatment? (similar to basketball experiment) • Operating definition is thus H0: dose regression coefficient equal • Power forRandomized Block:0.66 • Power for Incomplete Block: 0.85 • Power for Split-Plot: 0.85 • Same # observations – you can work smarter SSP Core Facility
But what if I don’t know Trt Effect Size or Variance? • “How can I do a power analysis? If I knew the effect size and the variance I wouldn’t have to do the study.” • What trt effect size is NOT: it is NOT the effect size you are going to observe • It is somewhere between • what current knowledge suggests is a reasonable expectation • minimum difference that would be considered “important” or “meaningful” SSP Core Facility
And Variance?? • Know thy relevant background / Do thy homework • Literature search: what have others working with similar subjects reported as variance? • Pilot study • Educated guess • range you’d expect 95% of likely obs? divide it by 4 • most extreme values you can plausibly imagine? divide range by 6 SSP Core Facility
Hierarchical Linear Models • From Bovaird (10-27-2006) seminar • 2 treatment • 20 classrooms / trt • 25 students / classroom • 4 years • reasonable ideas of classroom(trt), student(classroom*trt), within student variances as well as effect size • Implement via exemplary data + GLIMMIX SSP Core Facility
Categorical Data? • Example: Binary data • “Standard” has success probability of 0.25 • “New & Improved” hope to increase to 0.30 • Have N subjects at each of L locations • For sake of argument, suppose we have • 900 subjects / location • 10 locations SSP Core Facility
Power for GLMs • 2 treatments • P{favorable outcome} • for trt 1 p= 0.30; for trt 2 p=0.25 • power if n1=300; n2=600 data a; input trt y n; datalines; 1 90 300 2 150 600 ; proc glimmix; class trt; model y/n=trt / chisq; ods output tests3=pwr; run; data power; set pwr; alpha=0.05; ncparm=numdf*chisq; crit=cinv(1-alpha,numdf,0); power=1-probchi(crit,numdf,ncparm); proc print; run; exemplary data SSP Core Facility
Power for GLMM • Same trt and sample size per location as before • 10 locations • Var(Location)=0.25; Var(Trt*Loc)=0.125 • Variance Components: variation in log(OddsRatio) • Power? data a; input trt y n; do loc=1 to 10; output; end; datalines; 1 90 300 2 150 600 ; proc glimmix data=a initglm; class trt loc; model y/n = trt / oddsratio; random intercept trt / subject=loc; random _residual_; parms (0.25) (0.125) (1) / hold=1,2,3; ods output tests3=pwr; run; SSP Core Facility
GLMM Power Analysis Results Gives you expected Conf Limits for # Locations & N / Loc contemplated Gives you the power of the test of TRT effect on prob(favorable) SSP Core Facility
GLMM Power: Impact of Sample Size? • N of subjects per trt per location? • N of Locations? • Three cases • n-300/600 10 loc • n=600/1200, 10 loc • n=300/600, 20 loc data a; input trt y n; do loc=1 to 10; output; end; datalines; 1 90 300 2 150 600 ; data a; input trt y n; do loc=1 to 10; output; end; datalines; 1 180 600 2 300 1200 ; data a; input trt y n; do loc=1 to 20; output; end; datalines; 1 90 300 2 150 600 ; SSP Core Facility
GLMM Power: Impact of Sample Size? Recall, for 10 locations, N=300/600, CI for OddsRatio was (0.884, 1.871); Power was 0.274 For 10 locations, N=600 / 1200 N alone has almost no impact For 20 locations, N=300 / 600 SSP Core Facility
Recent developments • Continue binary example • Power analysis shows: what do you do? SSP Core Facility
More Information • Consider studies directed toward improving success rate similar to that proposed in study • Lit search yields 95 such studies • 29 have reported statistically significant gains of p1-p2>0.05 (or, alternatively, significant odds ratios of [(30/70)/(25/75)]=1.28 or greater) • If this holds, “prior” prob (desired effect size ) is approx 0.3 SSP Core Facility
An Intro Stat Result real Pr{type I error} is more like 0.23 than 0.10!!! SSP Core Facility
Returning to All Scenarios NOTE dramatic impact of alpha-level when “prior” Pr { DES } is relatively low POWER role increases at Pr { DES } increases SSP Core Facility
Closing Comments • In case it’s not obvious • I’m not a fan of “painting by numbers” • Role of power analysis misunderstood & underappreciated • MOST of ALLit is an opportunity to explore and rehearse study design & planned analysis • Engage statistician as a participating member of research team • Give it the TIME it REQUIRES SSP Core Facility
Thanks ... for coming