Critiquing Research in Educational Technology

1. Critiquing Research in Educational Technology Thomas C. Hammond TLT 470 Summer, 2008 Session 6

2. Housekeeping

3. The Big Question (impact of tech on learning) Ed tech research (good, bad, usefully bad) Research methods, paradigms Conceptual work for tonight Last session focused on overview of research methods, using two studies. Tonight we focus on critiquing the field of ed tech, again referring to those two studies, plus new reading for today (Boster, 2006) and other material (my dissertation research)Last session focused on overview of research methods, using two studies. Tonight we focus on critiquing the field of ed tech, again referring to those two studies, plus new reading for today (Boster, 2006) and other material (my dissertation research)

4. The Big Question (impact of tech on learning) Ed tech research (good, bad, usefully bad) Research methods, paradigms Conceptual work for tonight Last session focused on overview of research methods, using two studies. Tonight we focus on critiquing the field of ed tech, again referring to those two studies, plus new reading for today (Boster, 2006) and other material (my dissertation research)Last session focused on overview of research methods, using two studies. Tonight we focus on critiquing the field of ed tech, again referring to those two studies, plus new reading for today (Boster, 2006) and other material (my dissertation research)

5. Constructs, variables, research questions utos, UTOS, *UTOS Lab vs. field vs. manipulated field Experiment vs. non-experiment vs. quasi-experiment Varying treatment over groups and time Interpretation: Internal and external validity �Medical model� of research vs. patterns in ed research Snapshot review ? Boster, 2006 Let�s quickly review this using the Boster, 2006 pieceLet�s quickly review this using the Boster, 2006 piece

6. What we missed: Measures / Observations Classic errors / critiques �State of the field� in ed tech: Crisis? Today Again, I�m not putting myself above any of this � I make the exact same mistakes, and this stuff isn�t easy. I�m just hoping to build up (1) your ability to read research critically (through being aware of its limitations) and (2) your level of insight when talking about or using tech for instruction.Again, I�m not putting myself above any of this � I make the exact same mistakes, and this stuff isn�t easy. I�m just hoping to build up (1) your ability to read research critically (through being aware of its limitations) and (2) your level of insight when talking about or using tech for instruction.

7. Observations = Qualitative? Interview Document analysis Measures = Quantitative? Test Survey Observations / Measures These are actually equivalent, but I�ll spread them out to make a point about methodsThese are actually equivalent, but I�ll spread them out to make a point about methods

8. Why are these so critical? How do we know if they�re any good? What effects are they able to discern? Observations / Measures

9. Pre vs. post Tests of statistical significance Measures of effect size Suggestion of practical significance Measuring effects

10. Curriculum-focused tests vulnerable to Ceiling effects Low coefficients of reliability Diffusion of construct validity? Researcher-designed tests vulnerable to Lack of alignment to curriculum � = teachers unhappy! Demoralization / motivation challenges from students Measures of student learning as a �special problem�

11. Finding the right mix of quant and qual Quant observes an effect Measure students and teachers, run descriptive stats Run inferential stats: Which differences are significant? Qual lets you discuss why it took place Context of curriculum & instruction, teacher behaviors Context of student work Measures of student learning as a �special problem�

12. Mis-use of statistical tests Critiques / classic errors: Quant Analysis of 142 original articles appearing in AHA�s Circulation during 1975 (excluding certain types of studies) Chart from Glantz, S.A. (1980). Biostatistics: how to detect, correct and prevent errors in the medical literature. Circulation, 61, 1-7. (p. 2) Sampled all articles appearing in 1975 issues of the journal, excluded non-original pieces and some specialities (radiology, clincopathologic correlations, and case reports) Sample error: Using t-test to compare more than one group Point is: t-test is popular, but more often mis-used than not, at least in the sample. However, author offers lots of other studies that support same basic idea: In medical research, stats mis-applied as often as not. Analysis of 142 original articles appearing in AHA�s Circulation during 1975 (excluding certain types of studies) Chart from Glantz, S.A. (1980). Biostatistics: how to detect, correct and prevent errors in the medical literature. Circulation, 61, 1-7. (p. 2) Sampled all articles appearing in 1975 issues of the journal, excluded non-original pieces and some specialities (radiology, clincopathologic correlations, and case reports) Sample error: Using t-test to compare more than one group Point is: t-test is popular, but more often mis-used than not, at least in the sample. However, author offers lots of other studies that support same basic idea: In medical research, stats mis-applied as often as not.

13. Mis-use of statistical tests Failure to observe nested effects Critiques / classic errors: Quant This is a key aspect of doing large-scale education studies: Any student effect is nested in a classroom / school / district / state. Can�t just aggregate all the data.This is a key aspect of doing large-scale education studies: Any student effect is nested in a classroom / school / district / state. Can�t just aggregate all the data.

14. Mis-use of statistical tests Failure to observe nested effects Inattention to effect size / practical significance Critiques / classic errors: Quant Turning to Kingsley, 2005 � or, rather, the write-up provided by the vendorTurning to Kingsley, 2005 � or, rather, the write-up provided by the vendor

15. Failure to examine sub-groups Prior knowledge Tracking LEP SES / at-risk / under-served students Critiques / classic errors: Design

16. Failure to examine sub-groups Effects over long-term vs. short-term: Lee & Molebash, 2004 G1 = Google G2 = Archive G3 = selected documents and scaffold Critiques / classic errors: Design

17. Failure to examine sub-groups Effects over long-term vs. short-term Gaps in interpretability due to mix of quant and qual Boster, 2006 Dynarski et al., 2007 Contrast: Brush & Saye � 1999, 2001, 2002, 2004, 2005, 2006� Critiques / classic errors: Design I guess we also have a one-and-done-ness to that is a further limitation. Brush & Saye obviously are a counter-example, and when the second Dynarski et al comes out, it will add longitudinal observations� Note: A good mix of quant and qual, in my opinion, makes a study always at least usefully bad. Pure quant or pure qual�eh. Usually not good�but can be great, certainly. I guess we also have a one-and-done-ness to that is a further limitation. Brush & Saye obviously are a counter-example, and when the second Dynarski et al comes out, it will add longitudinal observations� Note: A good mix of quant and qual, in my opinion, makes a study always at least usefully bad. Pure quant or pure qual�eh. Usually not good�but can be great, certainly.

18. Example of tobacco company-funded research Accelerated Reader (Oppenheimer, 2003) Ignite!Learning studies Boster, 2006? Critiques / classic errors: Vendor-funded studies? I guess we also have a one-and-done-ness to that is a further limitation. Brush & Saye obviously are a counter-example, and when the second Dynarski et al comes out, it will add longitudinal observations�I guess we also have a one-and-done-ness to that is a further limitation. Brush & Saye obviously are a counter-example, and when the second Dynarski et al comes out, it will add longitudinal observations�

19. Carbon monoxide studies Purely attitudinal outcomes Survey research Easy-to-do? �State of the field� There�s some merit here, but the point is that too much carbon monoxide is harmful. Trace elements exist naturally, it is produced for industrial purposes�but too much of it and you asphyxiateThere�s some merit here, but the point is that too much carbon monoxide is harmful. Trace elements exist naturally, it is produced for industrial purposes�but too much of it and you asphyxiate

20. Carbon monoxide studies Boutique studies Early Logo studies Idiosyncratic OLEs PBL strategies �trade off of exploring bleeding edge vs. ecological validity (practicality) �State of the field� Lots of these, and I don�t want to name names. However, I find a lot of merit to these � someone has to play with the bleeding edge to find out what worksLots of these, and I don�t want to name names. However, I find a lot of merit to these � someone has to play with the bleeding edge to find out what works

21. Carbon monoxide studies Boutique studies Sprague, 2005: �Are we talking to ourselves?� �State of the field� �again, this is why I like TPaCK � it forces one to color outside the lines, cross the streams, what have you. �again, this is why I like TPaCK � it forces one to color outside the lines, cross the streams, what have you.

22. Carbon monoxide studies Boutique studies Sprague, 2005: �Are we talking to ourselves?� Scientifically-Based Research (NCLB) �Persuasive research that empirically examines important questions using appropriate methods that ensure reproducible and applicable findings� (Beghetto, 2003) �State of the field�

23. Carbon monoxide studies Boutique studies Sprague, 2005: �Are we talking to ourselves?� Scientifically-Based Research (NCLB) Persuasive Empirical Important questions (Does it work? What was it?) Appropriate methods (Privileging RCTs / medical model?) Replicable and applicable findings (Limitation of qual studies) �State of the field� From the article: Persuasive. This attribute refers to research that is moving from "tentative knowledge claims generated at local research sites to become stabilized and transformed into widely accepted facts" (Smith and others 2002). Appropriate research design, methods, and techniques; logic and reasoning; and replicable results can all help to establish persuasiveness. A critical element in persuasiveness is the peer-review process, in which researchers who have been trained in research methodology review and critique each other's work to help ensure that the methods used match the research questions and conclusions. Research findings published in a peer-reviewed journal can be assumed to have undergone careful scrutiny, been considered in light of alternative explanations, and deemed sufficiently "persuasive" by a panel of individuals with expertise in research methods. Empirical. Research that is empirical is based on measurement or observation, that is, experienced "through the senses" (NRC 2002). For example, research that measures or observes the impact of school vouchers on student achievement would be considered empirical. However, there are certain questions that cannot be addressed by empirical investigations (NRC ), such as "Should school vouchers programs be enacted in my state?" Questions involving "should" are typically addressed through means other than observation and measurement. Important Questions. This refers to questions addressed by research that build upon, add to, fill a void in, or otherwise clarify what is known and practiced. The NRC explains that the importance of a question is often determined by its relationship to prior research, theory, and relevance to policy and practice. Appropriate Methods. This refers to the use of designs, methods, and techniques that fit the nature of the question the study is attempting to answer. However no research design, method, or analytic technique on its own makes a study or program of research scientific (NRC). If the question pertains to "Does it work?," then randomized experiments or quasi-experiments are most appropriate (Raudenbush 2002, Coalition for Evidence-Based Policy). Simply stated, randomized experiments involve randomly assigning individuals, schools, or districts to a group that receives a particular intervention (such as class-size reduction) and to a group that does not. In contrast, if the question pertains to "What was the 'it'?," then qualitative methods (such as the case study) are most appropriate (Erickson and Gutierrez 2002). Among other things, qualitative methods provide "up-close descriptions" of what is, or is not, working; how interventions are working; and what might be facilitating or impeding the effectiveness of a particular intervention (Raudenbush). Replicable and Applicable Findings. In general, this attribute refers to consistent, meaningful findings. The research presents sufficient detail to allow for "replication or, at a minimum, ... the opportunity to build systematically on their findings" (NCLB 2002). Such findings are understandable, accessible, and applicable to a wide audience (Comprehensive School Reform Program Office). For example, a program of research should be designed and conducted to ensure that school leaders across the nation have a solid sense of whether they can expect to see similar results from implementing a school-reform program that has demonstrated increased student learning in another state. From the article: Persuasive. This attribute refers to research that is moving from "tentative knowledge claims generated at local research sites to become stabilized and transformed into widely accepted facts" (Smith and others 2002). Appropriate research design, methods, and techniques; logic and reasoning; and replicable results can all help to establish persuasiveness. A critical element in persuasiveness is the peer-review process, in which researchers who have been trained in research methodology review and critique each other's work to help ensure that the methods used match the research questions and conclusions. Research findings published in a peer-reviewed journal can be assumed to have undergone careful scrutiny, been considered in light of alternative explanations, and deemed sufficiently "persuasive" by a panel of individuals with expertise in research methods. Empirical. Research that is empirical is based on measurement or observation, that is, experienced "through the senses" (NRC 2002). For example, research that measures or observes the impact of school vouchers on student achievement would be considered empirical. However, there are certain questions that cannot be addressed by empirical investigations (NRC ), such as "Should school vouchers programs be enacted in my state?" Questions involving "should" are typically addressed through means other than observation and measurement. Important Questions. This refers to questions addressed by research that build upon, add to, fill a void in, or otherwise clarify what is known and practiced. The NRC explains that the importance of a question is often determined by its relationship to prior research, theory, and relevance to policy and practice. Appropriate Methods. This refers to the use of designs, methods, and techniques that fit the nature of the question the study is attempting to answer. However no research design, method, or analytic technique on its own makes a study or program of research scientific (NRC). If the question pertains to "Does it work?," then randomized experiments or quasi-experiments are most appropriate (Raudenbush 2002, Coalition for Evidence-Based Policy). Simply stated, randomized experiments involve randomly assigning individuals, schools, or districts to a group that receives a particular intervention (such as class-size reduction) and to a group that does not. In contrast, if the question pertains to "What was the 'it'?," then qualitative methods (such as the case study) are most appropriate (Erickson and Gutierrez 2002). Among other things, qualitative methods provide "up-close descriptions" of what is, or is not, working; how interventions are working; and what might be facilitating or impeding the effectiveness of a particular intervention (Raudenbush). Replicable and Applicable Findings. In general, this attribute refers to consistent, meaningful findings. The research presents sufficient detail to allow for "replication or, at a minimum, ... the opportunity to build systematically on their findings" (NCLB 2002). Such findings are understandable, accessible, and applicable to a wide audience (Comprehensive School Reform Program Office). For example, a program of research should be designed and conducted to ensure that school leaders across the nation have a solid sense of whether they can expect to see similar results from implementing a school-reform program that has demonstrated increased student learning in another state.

24. Carbon monoxide studies Boutique studies Sprague, 2005: �Are we talking to ourselves?� Scientifically-Based Research (NCLB) Are we in the wrong paradigm? (Reeves, 1993; Shaver, 2000) Scientific question: Why does this happen? Engineering question: Is it useful? �State of the field�

25. We see your gold standard and raise you a platinum standard (Schrum et al., 2005) Here are specific kinds of studies we need to do (Roblyer, 2006) Establish relative advantage Improve implementation strategies Monitor impact on important societal goals Monitor and report on common uses and shape desired directions Reading for next week Note that Roblyer laid out the idea in 2005 and is writing a series of �research highlights.� I�m asking you to read her example of a type 2 studyNote that Roblyer laid out the idea in 2005 and is writing a series of �research highlights.� I�m asking you to read her example of a type 2 study

26. Can you describe its design? Methods? How does this stack up? Quant fumbles Design fumbles More carbon monoxide? A boutique study? Talking to only IT people? SBR-ready? Hopelessly positivist? Time permitting: Tearing into my work

28. Treatment: Instruction, plus end-of-unit project; diff topics per projectTreatment: Instruction, plus end-of-unit project; diff topics per project

29. Quantitative data All are teacher-derived tests. Tests derived from CG, items aimed to mirror end-of-year SOLs Unfortunately, pre- and post- items are NOT included on the unit tests! Same content, but diff questions. On the positive side: really reduces the threat of carry-over. On the negative side: Assuming construct validity from pre/post to unit requires a leap of faith Variable coefficients of reliability: Sem pre-post ranges from .24 to .58; end-of-unit tests range from .4 to .8All are teacher-derived tests. Tests derived from CG, items aimed to mirror end-of-year SOLs Unfortunately, pre- and post- items are NOT included on the unit tests! Same content, but diff questions. On the positive side: really reduces the threat of carry-over. On the negative side: Assuming construct validity from pre/post to unit requires a leap of faith Variable coefficients of reliability: Sem pre-post ranges from .24 to .58; end-of-unit tests range from .4 to .8

30. Qualitative data Classroom obs = 80 class periods Student responses = approx 1000 total Student projects = approx 80 total Classroom obs = 80 class periods Student responses = approx 1000 total Student projects = approx 80 total

34. Link to 271Link to 271

35. Point of including this slide: As we can see on left, lots of off-task activity with backgrounds. As we can see from featured slide: content-inappropriate image. (Clip art suggests military circa Crimean War, or pre-Civil War; content = Spanish-American War, or shortly before World War I)Point of including this slide: As we can see on left, lots of off-task activity with backgrounds. As we can see from featured slide: content-inappropriate image. (Clip art suggests military circa Crimean War, or pre-Civil War; content = Spanish-American War, or shortly before World War I)

36. Point of this slide: info copied from wikipediaPoint of this slide: info copied from wikipedia

37. The especially interesting bit to me is the �info addressed on the semester exam� countThe especially interesting bit to me is the �info addressed on the semester exam� count

38. What�s due Wednesday? How is it going to get done? Where is it to be posted? What level of assistance / oversight will the instructor provide? Closure

Critiquing Research in Educational Technology

Critiquing Research in Educational Technology

Presentation Transcript

Critiquing Websites:

Research In Educational Technology: Expanding Possibilities

Rapid Research in the Educational Technology Development Process

Educational Technology in Content

Summarising and critiquing research papers

Research In Educational Technology: Expanding Possibilities

Educational Technology

Educational Technology: Research & Integration

Critiquing

Project in educational technology

ISTE Workshop Research Methods in Educational Technology

Educational Technology in Arkansas

Trends in Educational Technology

Educational Technology

Educational Technology:

Collaboration in Educational Technology

Critiquing Research Articles

ETEC 668 Quantitative Research in Educational Technology

Critiquing Quantitative Research

Educational Technology Research Seminar

Critiquing Assumptions

Critiquing Qualitative Research

Critiquing Research in Educational Technology