Experimental, Quasi-experimental, and Single Subject Research

Experimental, Quasi-experimental, and Single Subject Research 774/801 Sept 1, 2004 John Hattie & Tony Hunt

It is simple: There is no perfect experiment in education There is nearly always a trade off between the Power to generalise - from sample to population - from items to the behaviour domain - from conditions in the study to all intended conditions and the Power to convince - there are many audiences PG PC

PG Power to Generalise PC • How confident can be generalise from the study to all “similar” situations • Is the design replicable/reproducible/exchangeable? • Is the evidence/conclusions unique to this study? • Have the generalisations taken into account all possible competing views – plausible alternative rival explanations (PARE)

PC PG Power to convince • Who are we trying to convince • If it is a colleague(s) then more situation specificity may be convincing (kids/classrooms/schools like mine) • If it is the educational community, then situation needs to be less critical

Resolution: Linking Power • Experimental design consists of a series of links: • It is as strong as the weakest link • Each link influences the next link • Desirable to have equal strength • Does each link have explanatory power • Are conclusions credible to the intended audience

History • Stanley and Campbell (1963) • Cook and Campbell (1979) • Shadish, Cook & Campbell (2002) Evidence based All based on designing studies that can lead to explanation and claims of causality

Explanation and Cause • Cause and effect must be related (e.g., self-concept & achievement) • There needs to be temporal order (cause before effect) • Need to rule out other explanations/ other Plausible Alternative Rival Explanations (PARE)

Campbell & Stanley (1963) Pretest-Posttest Control Group Design Pre Treatment Post R O X O R O O Randomisation – aiming for representativeness

But can we randomise • No Child Left Behind • Tennessee Class Size Study

Quasi-experimentation: • When you do not have so much control over allocation of treatment, conditions, sample • When you have non-equivalent groups In quasi-experimentation, the researcher has to enumerate alternative explanations one by one, decide which are plausible, and then use logic, design, and measurement to assess whether each one is operating in a way that might explain any observed effect (Shadish, Cook & Campbell, 2002, p. 14) • Relates to the Popper notion of falsification: What evidence would you accept that you are wrong?

Examples of Quasi-experimental designs Divorce Laws Ozdowski, S.A. & Hattie, J.A. (1981). The impact of divorce laws on divorce rate in Australia: A time series analysis. Australian Journal of Social Issues, 16, 3-17. a. Time Series O1 O2 O3 O4 O5 X O6 O7 O8 O9

ABA design A B A O1 O2 O3 X4 X5 X6 O7 O8 O9Le Fevre, et.al. (2002). Adequate Decoders

Multilevel Design: Hierarchical Linear Modelling Students within classes within schools E.g., Tracking/Streaming School 1 School 2… Teacher 1 Teacher 2 Class 1 Class 2 Class 1 Class 2

Structural Equation Modelling

Minimal requirements for Studies • Sampling • Items to behaviour domains • People to all possible people • Conditions to all possible conditions Representative sampling via • Random sampling • Stratified random sampling

Variables • At the end of your study, can I say “Aha, so that is what you mean, now I am clear” • Open constructs NOT Definitions • No such thing as immaculate perception • Dependent - Manipulable • Independent - Nonmanipulable

Dependability • How reliable/consistent/replicable are your measures/ observations

Validity = Interpretations Validity - "an integrated evaluative judgement of the degree to which empirical evidence and theoretical rationales support the adequacy and appropriateness of inferences and actions based on test scores or other modes of assessment". Not validity of a test, but validity of interpretations

Validity of your study … • Is related to having ruled out • Plausible Alternative Rival Explanations (PARE) CONTROL CONTROL CONTROL • … some examples

1. PARE: Power • Is your study POWERFUL enough to detect the effect you are investigating

1. PARE: Power • Is your study POWERFUL enough to detect the effect you are investigating Do chickens have lips?

2. PARE: Chance • Did the effect/conclusion occur by chance • E.g., That two means are the same – the hypothesis of no difference • Setting a rejection level, say a =.05

3. PARE: Type II errors Type I errors – Rejecting a claim when it is true (a =.05) Type II errors – Accepting a claim when it is false (e.g., chickens do not have lips, if it is indeed true)

4. PARE Reliability of your measures • If the reliability is low, then the scores “wobble” and no guarantee you will get same results using these instruments (tests, observations, interviews, etc.) • Was the treatment “consistent” in the various classes/implementations?

5. PARE: Was the treatment implemented? • Degree of implementation • The Hong Kong Practical Science Study (Cheung, Hattie, & Bucat, 1997)

6. PARE: Maturation • Showing change may not be enough as kids improve anyway (e.g., by maturation) • Method to measure change = Effect-sizes Post-Pre/spread = Effect-size X2 – X1 sddiff e.g., Before = 12, After = 15, spread = 6 15-12 = .5 6

Distribution of effects Average effect Zero achievement

Distribution of effects Maturation

The disasters …

The also rans …

Almost there …

In the middle …

Worth having …

The MAJOR Influences …

Identifying that what matters

7. PARE: Testing • People become test wise and/or may respond different when under test conditions • White space and testing in asTTle • Testwiseness

Test of Objective EvidenceEach of the questions in the following set has a logical or "best" answer from its corresponding multiple choice answer set. Please record your eight answers. • The purpose of the cluss in 2 Trassig is true when furmpaling is to remove A clump trasses the von A cluss-prags B the viskal flans, if the viskal is B tremails donwil or zortil C cloughs C the belgo fruls D pluomots D dissels lisk easily 3 The sigia frequently overfesks the 4. The fribbled breg will minter best trelsum because with an A all sigias are mellious A derst B all sigias are always votial B morst C the trelsum is usually tarious C sortar D no trelsa are feskable D ignu

Test of Objective Evidence, Part II • The reasons for tristal doss are 6 Which of the following is/are always present when trossels are being gruven? A the sabs foped and the doths tinzed A rint and vost B the dredges roted with the crets B vost C few rakobs were accepted in sluth C shum and vost D most of the polats were thonced D vost and plone 7 The mintering function of the ignu is most 8 effectively carried out in connection with A a razma toi A B the groshing stantol B C the fribbled breg C D a frailly sush D

8. PARE Statistical Regression • When taking extreme groups the means tend to move to the middle. • Why do the tallest fathers have shorter sons, and the shortest fathers have taller sons?

…. Regression to the Mean • Special Education (e.g., Sesame Street) • Effective schools • Gifted education

9. PARE Response rates • The returns of questionnaires/tests/interviews should be high What is typical?

Meta-analyses of Response Rates • Typical return is 50% Three major factors: • Salience (77% vs 42%) • Number of follow ups (halve each time) • Lack of clutter/ orderliness Not length (ave 7 pages, 72 questions), colour,

10. Change scores • The difference between post-pre scores Problems • Unreliable • Are you measuring same thing both times • Regression to the mean

11 PARE: Experimenter effects • Hawthorne effect: Because we know we are in an experiment this alters our responses • Hans the Horse • Pygmalion in the classroom • Christine Rubie’s thesis • Stanley Milgrim’s experiment

12. PARE: Restriction of range • When you choose/focus on a narrow range of abilities (etc.) this can be misleading • Picture …

13. PARE: Specification of target and accessible sample/population • Most experiments are highly local but have general aspirations • Often, there are two groups you are generalising to: e.g., all secondary students in NZ, and to all secondary students you have access -- from which to sample

Experimental, Quasi-experimental, and Single Subject Research