Problems with the Design and Implementation of Randomized Experiments

Problems with the Design and Implementation of Randomized Experiments ByLarry V. HedgesNorthwestern University Presented at the 2009 IES Research Conference

Hard Answers to Easy Questions ByLarry V. HedgesNorthwestern University Presented at the 2009 IES Research Conference

Easy Question Isn’t it ok if I just match (schools) on some variable before randomizing? (You know lots of people do it) This is a simple question, but giving it an answer requires serious thinking about design and analysis

What Does this Question Mean? Generally adding matching or blocking variables means adding another (blocking) factor to the design The exact consequences depend on the design you started with: • Individually randomized (completely randomized design) • Cluster randomized (hierarchical design) • Multicenter or matched (randomized blocks design)

Individually Randomized (Completely Randomized) Design In this case you are adding a blocking factor crossed with treatment (p blocks) In other words, the design becomes a (generalized) randomized block design

Individually Randomized (Completely Randomized) Design How does this impact the analysis? Think about a balanced design with 2n students per block and p blocks and the ANOVA partitioning of sums of squares and degrees of freedom Original partitioning SSTotal = SST + SSWT dfTotal = dfT + dfWT 2pn – 1 = 1 + 2pn – 2 Original test statistic F = SST/(SSWT/dfWT)

Individually Randomized (Completely Randomized) Design New partitioning SSTotal = SST + SSB + SSBxT + SSWC dfTotal = dfT + dfB + dfBxT + dfWC 2pn – 1 = 1 + (p – 1) + (p – 1) + 2p(n – 1) New test statistic ? F = SST/(SSWC/dfWC) Or F = SST/(SSBxT/dfBxT) It depends on the inference model

Individually Randomized (Completely Randomized) Design

Inference Models I will mention two inference models • Conditional inference model • Unconditional inference model These inference models determine the type of inference (generalization) you wish to make Inference model chosen has implications for the statistical analysis procedure chosen The inference model determines the natural random effects

Inference Models Conditional Inference Model Generalization is to the blocks actually in the experiment (or those just like them) Blocks in the experiment are the universe (population) Generalization to other blocks depends on extra-statistical considerations (which blocks are just like them? How do you know?) Generalization obviously cannot be model free

Inference Models Unconditional Inference model Generalization is to a universe (of blocks) including blocks not in the experiment Blocks in the experiment are a sample of blocks in the universe (population) If blocks in the experiment can be considered a representative sample, inference to the population of blocks is by sampling theory If blocks are not a probability sample, generalization gets tricky (what is the universe? How do you know?)

Inference Models You can think of the inference model as linked to the sampling model for blocks If the blocks observed are a (random) sample of blocks, then they are a source of random variation If blocks observed are the entire universe of relevant blocks, then they are not a source of random variation The statistical analysis can be chosen independently of the inference model, but if it doesn’t include all sources of random variation, inferences will be compromised

Inference Models and Statistical AnalysesIndividually Randomized Design Blocks are fixed effects under the conditional inference models In this case the correct test statistic is FC = SST/(SSWC/dfWC) and the F-distribution has 1 & 2p(n -1) df Block effects are random under the unconditional inference model In this case the correct test statistic is FU = SST/(SSBxT/dfBxT) and the F-distribution has 1 & (p -1) df

Inference Models and Statistical AnalysesIndividually Randomized Design You can see that the error term in the test has (a lot) more df under fixed effects model 2p(n – 1) versus (p – 1) What you can’t see is that (if there is a treatment effect) the average value of the F-statistic is typically also larger under the fixed effects model It is bigger by a factor proportional to where ω = σBxT2/σB2is a treatment heterogeneity parameter and ρ is the intraclass correlation and

Possible Statistical Analyses Individually Randomized Design Possible statistical analyses • Ignore the blocking • Include blocks as fixed effects • Include blocks as random effects Consequences depend on whether you want to make a conditional or unconditional inference

Making Unconditional Inferences Individually Randomized Design Possible statistical analyses • Ignore the blocking Bad idea: Will inflate significance levels of tests for treatment effects substantially • Include blocks as fixed effects Bad idea: Will inflate significance levels of tests for treatment effects substantially • Include blocks as random effects Correct significance levels (but less power than conditional analysis)

Making Conditional Inferences Individually Randomized Design Possible statistical analyses • Ignore the blocking Bad idea: May deflate actual significance levels of tests for treatment effects substantially (unless ρ = 0) • Include blocks as fixed effects Correct significance levels and more powerful test than for unconditional analysis • Include blocks as random effects Bad idea: May deflate significance levels and reduce power

Cluster Randomized (Hierarchical) Design The issues about blocking in the cluster randomized design are the same as in the individually randomized design The inference model will determine the most appropriate statistical analysis Examining the properties of the statistical analysis may also reveal the weakness of the design for a given inference purpose For example, a small number of blocks may provide only very uncertain inference to a universe of blocks based on sampling arguments

Cluster Randomized (Hierarchical) Design In this case you are adding a blocking factor crossed with treatment (p blocks) but clusters are still nested within treatments [here Cij is the jth cluster in the ith block] Note that there are m clusters in each treatment per block

Cluster Randomized (Hierarchical) Design How does this impact the analysis? Think about a balanced design with 2mn students per block and p blocks and the ANOVA partitioning of sums of squares and degrees of freedom Original partitioning SSTotal = SST + SSC+ SSWC:T dfTotal = dfT + dfC + dfWC:T 2mn – 1 = 1 + 2(m – 1) + 2m(n – 1) Original test statistic F = SST/(SSc/dfC)

Cluster Randomized (Hierarchical) Design New partitioning SSTotal = SST + SSB+ SSBxT + SSC:BxT + SSWC dfTotal = dfT + dfB + dfBxT + dfC:BxT + dfWC 2mpn – 1 = 1+ (p – 1) +(p – 1) +2p(m – 1) +2pm (n – 1) New test statistic ? F = SST/(SSWT/dfWT) F = SST/(SSC:BxT/dfC:BxT)

Inference Models and Statistical Analyses Cluster Randomized Design Blocks are fixed under the conditional inference model, but clusters are typically random In this case the correct test statistic is FC = SST/(SSC:BxT/dfC:BxT) and the F-distribution has 1 & 2p(m– 1) df Blocks are random under the unconditional inference model, but clusters are typically random In this case there is no exact ANOVA test if there are block treatment interactions, but a conservative test uses the test statistic FC = SST/(SSB/dfB) and the F-distribution has 1 & (p– 1) df (large sample tests, e.g., based on HLM, are available)

Inference Models and Statistical Analyses Cluster Randomized Design You can see that the error term has more df under fixed effects model If there is a treatment effect the average value of the F-statistic is also larger under the fixed effects model It is bigger by a factor proportional to where ωB = σBxT2/σB2is a treatment heterogeneity parameter and ρB and ρC are the block and cluster level intraclass correlations, respectively and

Possible Statistical AnalysesCluster Randomized Design Possible statistical analyses • Ignore the blocking • Include blocks as fixed effects • Include blocks as random effects Consequences depend on whether you want to make a conditional or unconditional inference

Making Unconditional InferencesCluster Randomized Design Possible statistical analyses • Ignore the blocking Bad idea: Will inflate significance levels of tests for treatment effects substantially • Include blocks as fixed effects Bad idea: Will inflate significance levels of tests for treatment effects substantially • Include blocks as random effects Correct significance levels but less power than conditional analysis

Making Conditional InferencesCluster Randomized Design Possible statistical analyses • Ignore the blocking Bad idea: May deflate actual significance levels of tests for treatment effects substantially • Include blocks as fixed effects Correct significance levels and more powerful test than for unconditional analysis • Include blocks as random effects Not such a bad idea: significance levels unaffected

Multi-center (Randomized Blocks) Design The issues about blocking in the multicenter (randomized blocks) design are the same as in the cluster randomized design The inference model will determine the most appropriate statistical analysis Examining the properties of the statistical analysis may also reveal the weakness of the design for a given inference purpose For example, a small number of blocks may provide only very uncertain inference to a universe of blocks based on sampling arguments

Multi-center (Randomized Blocks) Design In this case you are adding a blocking factor crossed with treatment (p blocks) and clusters, but clusters are still nested within blocks [here Cij is the jth cluster in the ith block] Note that there are m clusters in each treatment per block and n individuals in each treatment in each cluster

Multi-center (Randomized Blocks) Design How does this impact the analysis? Think about a balanced design with 2mn students per block and p blocks n individuals per cell and the ANOVA partitioning of sums of squares and degrees of freedom Original partitioning SSTotal = SST + SSC+ SSTxC+ SSWC dfTotal = dfT + dfC + dfTxC + dfWC 2pmn – 1 = 1 + (pm – 1) + (pm – 1) + 2pm(n – 1) Original test statistic F = SST/(SSTxC/dfTxC)

Multi-center (Randomized Blocks) Design New partitioning SSTotal = SST + SSB+ SSC:B + SSBxT + SSC:BxT + SSWC dfTotal = dfT + dfB+ dfC:B + dfBxT + dfC:BxT + dfWC 2mpn – 1 = 1+ (p – 1) + p(m – 1) + (p – 1) +2p(m – 1) +2pm (n – 1) New test statistic ? F = SST/(SSWC/dfWC) F = SST/(SSBxT/dfBxT) F = SST/(SSBxT/dfBxT)

Inference Models and Statistical Analyses Randomized Blocks Design Blocks are fixed under the conditional inference models, but clusters are typically random In this case the correct test statistic is FC = SST/(SSC:BxT/dfC:BxT) and the F-distribution has 1 & p(m– 1) df Blocks are random under the unconditional inference model, but clusters are typically random In this case the correct test statistic is FU = SST/(SSBxT/dfBxT) and the F-distribution has 1 & (p– 1) df

Inference Models and Statistical Analyses Randomized Blocks Design You can see that the error term has more df under fixed effects model If there is a treatment effect the average value of the F-statistic is also larger under the fixed effects model It is bigger by a factor proportional to where ωB = σBxT2/σB2and ωC = σCxT2/σC2 are treatment heterogeneity parameters and ρB and ρC are the block and cluster level intraclass correlations, respectively and

Possible Statistical AnalysesRandomized Blocks Design Possible statistical analyses • Ignore the blocking • Include blocks as fixed effects • Include blocks as random effects Consequences depend on whether you want to make a conditional or unconditional inference

Making Unconditional Inferences Randomized Blocks Design Possible statistical analyses • Ignore the blocking Bad idea: Will inflate significance levels of tests for treatment effects substantially • Include blocks as fixed effects Bad idea: Will inflate significance levels of tests for treatment effects substantially • Include blocks as random effects Correct significance levels but less power than conditional analysis

Making Conditional Inference Randomized Blocks Design Possible statistical analyses • Ignore the blocking Bad idea: May deflate actual significance levels of tests for treatment effects substantially • Include blocks as fixed effects Correct significance levels and more powerful test than for unconditional analysis • Include blocks as random effects Bad idea: May deflate significance levels and reduce power

Another Easy Question There was some attrition from my study after assignment. Does that cause a serious problem? This is another simple question, but the answer is far from simple. One answer can be framed using concepts of experimental design

Post Assignment Attrition A different question has a simple answer: Does that (attrition) cause a problem in principle? The simple answer to that question is YES! Randomized experiments with attrition no longer give model free, unbiased estimates of the causal effect of treatment Whether the bias is serious or not depends (on the model that generates the missing data)

Post Assignment Attrition The design is changed by adding a crossed factor corresponding to missingness like this Now we can see a problem with estimating treatment effect from only the observed part of the design: The observed treatment effect is only part of the total treatment effect

Post Assignment Attrition Suppose that the means are given by the μ’s and the proportions are given by the π’s

Post Assignment Attrition The treatment effect on all individuals randomized is When the proportion of dropouts is equal in T and C so that πT = πC = π The mean of the treatment effect on all individuals randomized is

Post Assignment Attrition Rewriting this we see that the average treatment effect for individuals assigned to treatment is where δO is the treatment effect among the individuals that are observed and δM is the treatment effect among the individuals that are not observed and δ is the treatment effect among all individuals assigned Thus bounds on δM imply bounds on δ l

Post Assignment Attrition No estimate of the treatment effect is possible without an estimate of the treatment effect among the missing individuals One possibility is to model (assume) that we know something about the treatment effect in the missing individuals We can assume a range of values to get bounds on the possible treatment effect

Post Assignment Attrition When attrition rate is not the same in the treatment groups (πT ≠ πC) the analysis is trickier One idea is to convince ourselves that the treatment effect for those who drop out is the same as those who do not

Post Assignment Attrition This does not assure that attrition has not altered the treatment effect l

Post Assignment Attrition This does not assure that attrition has not altered the treatment effect We have to know both μTM and μCM to identify the treatment effect, knowing δM = (μTM – μCM) is not enough

Post Assignment Attrition Suppose that BLTM and BLCM are lower bounds on the means for missing individuals in the treatment group and BUTM and BUCM are the upper bounds Then the upper and lower bounds on the treatment effect are Lower Upper

Post Assignment Attrition Note that none of the results on attrition involve sampling or estimation error Results get more complex if we take this into account, but the basic ideas are those here

Conclusions Many simple questions arise in connection with field experiments The answers to these questions often require thinking through complex aspects of • the design • the inference model • assumptions about missing data No correct answers are possible without recognizing these complexities

Problems with the Design and Implementation of Randomized Experiments

Problems with the Design and Implementation of Randomized Experiments

Presentation Transcript

The Design and Implementation of

The Randomized Block Design

Design of Experiments

Design of Experiments

The Design of Experiments

Rerandomization in Randomized Experiments

Randomized Experiments

Rerandomization in Randomized Experiments

Design and Analysis of Experiments Randomized Complete Block Experiments

Design of Engineering Experiments - Experiments with Random Factors

Randomized Comparative Experiments

Empirical Studies of Design Ideation: Alignment of Design Experiments with Lab Experiments

Problems with the PDA implementation

The why and the how of randomized experiments

Implementation and a Randomized Controlled Evaluation of

Design and Implementation Experiments of Scalable Socket Buffer Tuning

Implementation of Randomized Trials

Choice of study design: randomized and non-randomized approaches

Randomized comparative experiments; The principles of experimental design

Complete Randomized Design (Completely Randomized Design) With STATA 15

Problems with the Design and Implementation of Randomized Experiments

Design of Experiments with Several Factors