Assessing intervention fidelity in rcts concepts and methods l.jpg
This presentation is the property of its rightful owner.
Sponsored Links
1 / 54

Assessing Intervention Fidelity in RCTs: Concepts and Methods PowerPoint PPT Presentation


  • 155 Views
  • Uploaded on
  • Presentation posted in: General

Assessing Intervention Fidelity in RCTs: Concepts and Methods. Panelists: David S. Cordray, PhD Chris Hulleman, PhD Joy Lesnick, PhD Vanderbilt University Presentation for the IES Research Conference Washington, DC June 12, 2008. Overview.

Download Presentation

Assessing Intervention Fidelity in RCTs: Concepts and Methods

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Assessing intervention fidelity in rcts concepts and methods l.jpg

Assessing Intervention Fidelity in RCTs: Concepts and Methods

Panelists:

David S. Cordray, PhD

Chris Hulleman, PhD

Joy Lesnick, PhD

Vanderbilt University

Presentation for the IES Research Conference

Washington, DC

June 12, 2008


Overview l.jpg

Overview

  • Session planned as an integrated set of presentations

  • We’ll begin with:

    • Definitions and distinctions;

    • Conceptual foundation for assessing fidelity in RCTs, a special case.

  • Two examples of assessing implementation fidelity:

    • Chris Hulleman will illustrate an assessment for an intervention with a single core component

    • Joy Lesnick illustrates additional consideration when fidelity assessment is applied to intervention models with multiple program components.

  • Issues for the future

  • Questions and discussion


Definitions and distinctions l.jpg

Definitions and Distinctions


Dimensions intervention fidelity l.jpg

Dimensions Intervention Fidelity

  • Little consensus on what is meant by the term “intervention fidelity”.

  • But Dane & Schneider (1998) identify 5 aspects:

    • Adherence/compliance– program components are delivered/used/received, as prescribed;

    • Exposure – amount of program content delivered/received by participants;

    • Quality of the delivery – theory-based ideal in terms of processes and content;

    • Participant responsiveness – engagement of the participants; and

    • Program differentiation – unique features of the intervention are distinguishable from other programs (including the counterfactual)


Distinguishing implementation assessment from implementation fidelity assessment l.jpg

Distinguishing Implementation Assessment from Implementation Fidelity Assessment

  • Two models of intervention implementation, based on:

    • A purely descriptive model

      • Answering the question “What transpired as the intervention was put in place (implemented).

    • An a priori intervention model, with explicit expectations about implementation of core program components.

      • Fidelity is the extent to which the realized intervention (tTx) is “faithful” to the pre-stated intervention model (TTx)

      • Fidelity = TTx – tTx

      • We emphasize this model


What to measure l.jpg

What to Measure?

  • Adherence to the intervention model:

    • (1) Essential or core components (activities, processes);

    • (2) Necessary, but not unique to the theory/model, activities, processes and structures (supporting the essential components of T); and

    • (3) Ordinary features of the setting (shared with the counterfactual groups (C)

  • Essential/core and Necessary components are priority parts of fidelity assessment.


An example of core components bransford s hpl model of learning and instruction l.jpg

An Example of Core Components” Bransford’s HPL Model of Learning and Instruction

  • John Bransford et al. (1999) postulate that a strong learning environment entails a combination of:

    • Knowledge-centered;

    • Learner-centered;

    • Assessment-centered; and

    • Community-centered components.

  • Alene Harris developed an observation system (the VOS) that registered novel (components above) and traditional pedagogy in classes.

  • The next slide focuses on the prevalence of Bransford’s recommended pedagogy.


Challenge based instruction in treatment and control courses the vanth observation system vos l.jpg

Challenge-based Instruction in “Treatment” and Control Courses: The VaNTH Observation System (VOS)

Percentage of Course Time Using Challenge-based Instructional Strategies

Adapted from Cox & Cordray, in press


Implications l.jpg

Implications

  • Fidelity can be assessed even when there is no known benchmark (e.g., 10 Commandments)

    • In practice interventions can be a mixture of components with strong, weak or no benchmarks

  • Control conditions can include core intervention components due to:

    • Contamination

    • Business as usual (BAU) contains shared components, different levels

    • Similar theories, models of action

  • But to index “fidelity”, we need to measure components within the control condition


Linking intervention fidelity assessment to contemporary models of causality l.jpg

Linking Intervention Fidelity Assessment to Contemporary Models of Causality

  • Rubin’s Causal Model:

    • True causal effect of X is (YiTx – YiC)

    • RCT methodology is the best approximation to the true effect

  • Fidelity assessment within RCT-based causal analysis entails examining the difference between causal components in the intervention and counterfactual condition.

  • Differencing causal conditions can be characterized as “achieved relative strength” of the contrast.

    • Achieved Relative Strength (ARS) = tTx – tC

    • ARS is a default index of fidelity


Slide11 l.jpg

Treatment Strength

Outcome

.45

.40

.35

.30

.25

.20

.15

.10

.05

.00

TTx

100

90

85

80

75

70

65

60

55

50

Infidelity

t tx

(85)-(70) = 15

txC

“Infidelity”

TC

Achieved Relative Strength =.15

ExpectedRelative Strength =.25


In practice l.jpg

In Practice….

  • Identify core components in both groups

    • e.g., via a Model of Change

  • Establish bench marks for TTX and TC;

  • Measure core components to derive tTx and tC

    • e.g., via a “Logic model” based on Model of Change

  • With multiple components and multiple methods of assessment; achieved relative strength needs to be:

    • Standardized, and

    • Combined across:

      • Multiple indicators

      • Multiple components

      • Multiple levels (HLM-wise)

  • We turn to our examples….


Slide13 l.jpg

Assessing Implementation Fidelity in the Lab and in Classrooms: The Case of a Motivation Intervention

Chris S. Hulleman

Vanderbilt University


Slide14 l.jpg

The Theory of Change

INTEREST

MANIPULATED RELEVANCE

PERCEIVED UTILITY VALUE

PERFORMANCE

Adapted from:

Hulleman (2008); Hulleman, Godes, Hendricks, & Harackiewicz (2008);

Hulleman & Harackiewicz (2008); Hulleman, Hendricks, & Harackiewicz (2007); Eccles et al. (1983); Wigfield & Eccles (2002); Hulleman et al. (2008)


Methods l.jpg

Methods


Motivational outcome l.jpg

Motivational Outcome

?

g = 0.05

(p = .67)


Fidelity measurement and achieved relative strength l.jpg

Fidelity Measurement and Achieved Relative Strength

  • Simple intervention – one core component

  • Intervention fidelity:

    • Defined as “quality of participant responsiveness”

    • Rated on scale from 0 (none) to 3 (high)

    • 2 independent raters, 88% agreement


Quality of responsiveness l.jpg

Quality of Responsiveness


Indexing fidelity l.jpg

Indexing Fidelity

Absolute

  • Compare observed fidelity (tTx) to absolute or maximum level of fidelity (TTx)

    Average

  • Mean levels of observed fidelity (tTx)

    Binary

  • Yes/No treatment receipt based on fidelity scores

  • Requires selection of cut-off value


Fidelity indices l.jpg

Fidelity Indices


Indexing fidelity as achieved relative strength l.jpg

Indexing Fidelity as Achieved Relative Strength

Intervention Strength = Treatment – Control

Achieved Relative Strength (ARS) Index

  • Standardized difference in fidelity index across Tx and C

  • Based on Hedges’ g (Hedges, 2007)

  • Corrected for clustering in the classroom (ICC’s from .01 to .08)


Average ars index l.jpg

Average ARS Index

Group Difference

Sample Size Adjustment

Clustering Adjustment

Where,

= mean for group 1 (tTx )

= mean for group 2 (tC)

ST = pooled within groups standard deviation

nTx = treatment sample size

nC = control sample size

n = average cluster size

p = Intra-class correlation (ICC)

N = total sample size


Absolute and binary ars indices l.jpg

Absolute and Binary ARS Indices

Group Difference

Sample Size Adjustment

Clustering Adjustment

Where,

pTx = proportion for the treatment group (tTx )

pC = proportion for the control group (tC)

nTx = treatment sample size

nC = control sample size

n = average cluster size

p = Intra-class correlation (ICC)

N = total sample size


Slide24 l.jpg

Average ARS Index

Fidelity

Achieved Relative Strength = 1.32

Treatment Strength

100

66

33

0

3

2

1

0

TTx

Infidelity

t tx

(0.74)-(0.04) = 0.70

tC

“Infidelity”

TC


Achieved relative strength indices l.jpg

Achieved Relative Strength Indices


Linking achieved relative strength to outcomes l.jpg

Linking Achieved Relative Strength to Outcomes


Sources of infidelity in the classroom l.jpg

Sources of Infidelity in the Classroom

Student behaviors were nested within teacher behaviors

  • Teacher dosage

  • Frequency of responsiveness

    Student and teacher behaviors were used to predict treatment fidelity (i.e., quality of responsiveness).


Sources of infidelity multi level analyses l.jpg

Sources of Infidelity: Multi-level Analyses

Part I: Baseline Analyses

  • Identified the amount of residual variability in fidelity due to students and teachers.

    • Due to missing data, we estimated a 2-level model (153 students, 6 teachers)

      Student:Yij = b0j + b1j(TREATMENT)ij+ rij,

      Teacher:b0j = γ00 + u0j,

      b1j = γ10 + u10j


Sources of infidelity multi level analyses29 l.jpg

Sources of Infidelity: Multi-level Analyses

Part II: Explanatory Analyses

  • Predicted residual variability in fidelity (quality of responsiveness) with frequency of responsiveness and teacher dosage

    Student:Yij = b0j + b1(TREATMENT)ij+

    b2(RESPONSE FREQUENCY)ij+ rij

    Teacher:b0j = γ00 + u0j

    b1j = γ10 + b10(TEACHER DOSAGE)j+ u10j

    b2j = γ20 + b20(TEACHER DOSAGE)j+ u20j


Sources of infidelity multi level analyses30 l.jpg

Sources of Infidelity: Multi-level Analyses

* p < .001.


Case summary l.jpg

Case Summary

  • The motivational intervention was more effective in the lab (g = 0.45) than field (g = 0.05).

  • Using 3 indices of fidelity and, in turn, achieved relative treatment strength, revealed that:

    • Classroom fidelity < Lab fidelity

    • Achieved relative strength was about 1 SD less in the classroom than the laboratory

  • Differences in achieved relative strength = differences motivational outcome, especially in the lab.

  • Sources of fidelity: teacher (not student) factors


Slide32 l.jpg

Assessing Fidelity of Interventions with Multiple Components: A Case of Assessing Preschool Interventions

Joy Lesnick


What do we mean by multiple components in preschool literacy programs l.jpg

What Do We Mean By Multiple Components in Preschool Literacy Programs?

How do you define preschool instruction?

Academic content, materials, student-teacher interactions, student-student interactions, physical development, schedules & routines, assessment, family involvement, etc. etc.

How would you measure implementation?

Preschool Interventions:

Are made up of components (e.g., sets of activities and processes) that can be thought of as constructs;

These constructs vary in meaning, across actors (e.g., developers, implementers, researchers);

They are of varying levels of importance within the intervention; and

These constructs are made up of smaller parts that need to be assessed.

Multiple components makes assessing fidelity more challenging

33


Overview34 l.jpg

Overview

Four areas of consideration when assessing fidelity of programs with multiple components:

Specifying Multiple Components

Major Variations in Program Components

The ABCs of Item and Scale Construction

Aggregating Indices

One caveat: Very unusual circumstances

Goal of this work:

To build on the extensive evaluation work that had already been completed and use the case study to provide a framework for future efforts to measure fidelity of implementation.

34


1 specifying multiple components l.jpg

1. Specifying Multiple Components

Our Process

Extensive review of program materials

Potentially hundreds of components

How many indicators do we need to assess fidelity?

35


1 specifying multiple components36 l.jpg

1. Specifying Multiple Components

1234

Oral Language

Language, comprehension, response to text

Literacy

1234

Math

Interactions between teacher and child

Book and print awareness

1234

Social & Personal Development

Phonemic awareness

Physical Environment

1234

Healthful Living

Letter and word recognition

Routines and classroom management

Scientific Thinking

Materials

1234

Social Studies

Content

Instruction

Creative Arts

Assessment

Physical Development

Processes

Family Involvement

Technology

Structured lessons

Structured units

Writing

1234

36

Constructs Sub-Constructs Facets Elements Indicators


Slide37 l.jpg

Conceptual differences between programs may happen at micro-levels

Empirical differences between program implementation may happen at more macro levels

Theoretically expected differences vs. empirically observed differences

Must identify conceptual differences between programs at the smallest grain size at the outset, although may be able to detect empirical differences once implemented at higher macro levels

Grain Size is Important

37


2 major variations in program components l.jpg

2. Major Variations in Program Components

One program often has some combination of these different types of components:

Scripted (highly structured) activities

Unscripted (unstructured) activities

Nesting of activities

Micro-level (discrete) activities

Macro-level (extended) activities

What you’re trying to measure will influence how to measure it -- and how often it needs to be measured.

38


2 major variations in program components39 l.jpg

2. Major Variations in Program Components

Abs:“Absolute Fidelity” Index: what happened as compared to what should have happened – highest standard

Avg: Magnitude or exposure level; indicates what happened, but it’s not very meaningful – how do we know if level is good or bad?

Bin:Binary Complier: Can we set a benchmark to determine whether or not program component was successfully implemented? >30% for example? Is that realistic? Meaningful?

ARS :Difference in magnitude between Tx and C – relative strength – is there enough difference to warrant a treatment effect?

39


Slide40 l.jpg

Dots under a microscope – what is it???


Slide41 l.jpg

Starry Night, Vincent Van Gogh, 1889


We must measure the trees and also the forest l.jpg

We must measure the trees… and also the forest…

Micro-level (discrete) activities

Depending on the condition, daily activities (i.e. whole group time, small group time, center activities) may be scripted or unscripted and take place within larger structure of theme under study.

Macro-level (extended) activities

Month long thematic unit (is structured in treatment condition and unstructured in control) is underlying extended structure within which scripted or unscripted micro activities take place.

In multi-component programs, many activities are nested within larger activity structures. This nesting has implications for fidelity analysis – what to measure and how to measure it.

42


3 the abcs of item and scale construction l.jpg

3. The ABCs of Item and Scale Construction

Aim for one-to-one correspondence of indicators to component of interest

Balance items across components

Coverage and quality are more important than the quantity of items

43


A im for one to one correspondence l.jpg

Aim for one-to-one correspondence

Example of more than one component being assessed in one item:

[Does the teacher] Talk with children throughout the day, modeling correct grammar, teaching new vocabulary, and asking questions to encourage children to express their ideas in words? (Yes/No)

Example of one component being measured in each item:

Teacher provides an environment wherein students can talk about what they are doing.

Teacher listens attentively to students’ discussions and responses.

Teacher models and/or encourages students to ask questions during class discussions.

Diff bw T & C (Oral Lang)*:

T: 1.80 (0.32)

C: 1.36 (0.32)

ARS ES: 1.38

T: 3.45 (0.87)

C: 2.26 (0.57)

ARS ES: 1.62

*Data for the case study comes from an evaluation conducted by Dale Farran,

Mark Lipsey, Carol Blibrey, et al.

44


B alance items across components l.jpg

Balance items across components

How many items are needed for each scale?

Oral-language over represented

Scales with α<0.80 not reliable

45


C overage and quality more important than quantity l.jpg

Coverage and quality more important than quantity

Two scales each have 2 items, but very different levels of reliability

How many items are needed for each scale?

Oral Language: 20 items. Randomly selected items and recalculated alpha:

10 items: α = 0.92

8 items: α = 0.90

6 items: α = 0.88

5 items: α = 0.82

4 items: α = 0.73

46


Aggregating indices l.jpg

Aggregating Indices

To weight or not to weight? How do we decide?

Possibilities:

Theory

Consensus

$$ spent

Time spent

Case study example – 2 levels of aggregation within and between:

Unit-weight within facet: “Instruction – Content – Literacy”

Hypothetical weight across sub-construct: “Instruction – Content”

47


You are here l.jpg

YOU ARE HERE….

1234

Oral Language

Language, comprehension, response to text

Literacy

1234

Math

Interactions between teacher and child

Book and print awareness

1234

Social & Personal Development

Phonemic awareness

Physical Environment

1234

Healthful Living

Letter and word recognition

Routines and classroom management

Scientific Thinking

Materials

1234

Social Studies

Content

Instruction

Creative Arts

Assessment

Physical Development

Processes

Family Involvement

Technology

Structured lessons

Structured units

Writing

1234

UNIT WEIGHT

THEORY WEIGHT

HOW WEIGHT?

HOW WEIGHT?

48


Aggregating indices49 l.jpg

Aggregating Indices

  • Unit-weight within facet: Instruction – Content – Literacy

49

**clustering is ignored


Aggregating indices50 l.jpg

Aggregating Indices

  • Theory-weight across sub-construct (hypothetical)


You are here51 l.jpg

YOU ARE HERE …

1234

Oral Language

Language, comprehension, response to text

Literacy

1234

Math

Interactions between teacher and child

Book and print awareness

1234

Social & Personal Development

Phonemic awareness

Physical Environment

1234

Healthful Living

Letter and word recognition

Routines and classroom management

Scientific Thinking

Materials

1234

Social Studies

Content

Instruction

Creative Arts

Assessment

Physical Development

Processes

Family Involvement

Technology

Structured lessons

Structured units

Writing

1234

UNIT WEIGHT

THEORY WEIGHT

HOW WEIGHT?

HOW WEIGHT?

51


Key points and future issues l.jpg

Key Points and Future Issues

  • Identifying and measuring, at a minimum, should include model-based core and necessary components;

  • Collaborations among researchers, developers and implementers is essential for specifying:

    • Intervention models;

    • Core and essential components;

    • Benchmarks for TTx (e.g., an educationally meaningful dose; what level of X is needed to instigate change); and

    • Tolerable adaptation


Points and issues l.jpg

Points and Issues

  • Fidelity assessment serves two roles:

    • Average causal difference between conditions; and

    • Using fidelity measures to assess the effects of variation in implementation on outcomes.

  • Should minimize “infidelity” and weak ARS:

    • Pre-experimental assessment of TTx in the counterfactual condition…Is TTx > TC?

    • Build operational models with positive implementation drivers

  • Post-experimental (re)specification of the intervention: For example:

    • MAPARS = .3(planned prof.development)+.6(planned use of data for differentiated instruction)


Points and issues54 l.jpg

Points and Issues

  • What does an ARS of 1.20 mean?

  • We need experience and a normative framework:

    • Cohen defined a small effect on outcomes as 0.20; medium as 0.50, and large as 0.80

    • Overtime this may emerge for ARS


  • Login