370 likes | 585 Views
. Housekeeping. . The Big Question (impact of tech on learning)Ed tech research (good, bad, usefully bad)Research methods, paradigms. Conceptual work for tonight. . . . The Big Question (impact of tech on learning)Ed tech research (good, bad, usefully bad)Research methods, paradigms. Concept
E N D
1. Critiquing Research in Educational Technology Thomas C. Hammond
TLT 470
Summer, 2008
Session 6
2. Housekeeping
3. The Big Question (impact of tech on learning)
Ed tech research (good, bad, usefully bad)
Research methods, paradigms Conceptual work for tonight Last session focused on overview of research methods, using two studies. Tonight we focus on critiquing the field of ed tech, again referring to those two studies, plus new reading for today (Boster, 2006) and other material (my dissertation research)Last session focused on overview of research methods, using two studies. Tonight we focus on critiquing the field of ed tech, again referring to those two studies, plus new reading for today (Boster, 2006) and other material (my dissertation research)
4. The Big Question (impact of tech on learning)
Ed tech research (good, bad, usefully bad)
Research methods, paradigms Conceptual work for tonight Last session focused on overview of research methods, using two studies. Tonight we focus on critiquing the field of ed tech, again referring to those two studies, plus new reading for today (Boster, 2006) and other material (my dissertation research)Last session focused on overview of research methods, using two studies. Tonight we focus on critiquing the field of ed tech, again referring to those two studies, plus new reading for today (Boster, 2006) and other material (my dissertation research)
5. Constructs, variables, research questions
utos, UTOS, *UTOS
Lab vs. field vs. manipulated field
Experiment vs. non-experiment vs. quasi-experiment
Varying treatment over groups and time
Interpretation: Internal and external validity
“Medical model” of research vs. patterns in ed research Snapshot review ? Boster, 2006 Let’s quickly review this using the Boster, 2006 pieceLet’s quickly review this using the Boster, 2006 piece
6. What we missed: Measures / Observations
Classic errors / critiques
“State of the field” in ed tech: Crisis? Today Again, I’m not putting myself above any of this – I make the exact same mistakes, and this stuff isn’t easy. I’m just hoping to build up (1) your ability to read research critically (through being aware of its limitations) and (2) your level of insight when talking about or using tech for instruction.Again, I’m not putting myself above any of this – I make the exact same mistakes, and this stuff isn’t easy. I’m just hoping to build up (1) your ability to read research critically (through being aware of its limitations) and (2) your level of insight when talking about or using tech for instruction.
7. Observations = Qualitative?
Interview
Document analysis
Measures = Quantitative?
Test
Survey Observations / Measures These are actually equivalent, but I’ll spread them out to make a point about methodsThese are actually equivalent, but I’ll spread them out to make a point about methods
8. Why are these so critical?
How do we know if they’re any good?
What effects are they able to discern?
Observations / Measures
9. Pre vs. post
Tests of statistical significance
Measures of effect size
Suggestion of practical significance
Measuring effects
10. Curriculum-focused tests vulnerable to
Ceiling effects
Low coefficients of reliability
Diffusion of construct validity?
Researcher-designed tests vulnerable to
Lack of alignment to curriculum
… = teachers unhappy!
Demoralization / motivation challenges from students
Measures of student learning as a “special problem”
11. Finding the right mix of quant and qual
Quant observes an effect
Measure students and teachers, run descriptive stats
Run inferential stats: Which differences are significant?
Qual lets you discuss why it took place
Context of curriculum & instruction, teacher behaviors
Context of student work
Measures of student learning as a “special problem”
12. Mis-use of statistical tests Critiques / classic errors: Quant Analysis of 142 original articles appearing in AHA’s Circulation during 1975 (excluding certain types of studies)
Chart from Glantz, S.A. (1980). Biostatistics: how to detect, correct and prevent errors in the medical literature. Circulation, 61, 1-7.
(p. 2)
Sampled all articles appearing in 1975 issues of the journal, excluded non-original pieces and some specialities (radiology, clincopathologic correlations, and case reports)
Sample error: Using t-test to compare more than one group
Point is: t-test is popular, but more often mis-used than not, at least in the sample. However, author offers lots of other studies that support same basic idea: In medical research, stats mis-applied as often as not.
Analysis of 142 original articles appearing in AHA’s Circulation during 1975 (excluding certain types of studies)
Chart from Glantz, S.A. (1980). Biostatistics: how to detect, correct and prevent errors in the medical literature. Circulation, 61, 1-7.
(p. 2)
Sampled all articles appearing in 1975 issues of the journal, excluded non-original pieces and some specialities (radiology, clincopathologic correlations, and case reports)
Sample error: Using t-test to compare more than one group
Point is: t-test is popular, but more often mis-used than not, at least in the sample. However, author offers lots of other studies that support same basic idea: In medical research, stats mis-applied as often as not.
13. Mis-use of statistical tests
Failure to observe nested effects Critiques / classic errors: Quant This is a key aspect of doing large-scale education studies: Any student effect is nested in a classroom / school / district / state. Can’t just aggregate all the data.This is a key aspect of doing large-scale education studies: Any student effect is nested in a classroom / school / district / state. Can’t just aggregate all the data.
14. Mis-use of statistical tests
Failure to observe nested effects
Inattention to effect size / practical significance Critiques / classic errors: Quant Turning to Kingsley, 2005 – or, rather, the write-up provided by the vendorTurning to Kingsley, 2005 – or, rather, the write-up provided by the vendor
15. Failure to examine sub-groups
Prior knowledge
Tracking
LEP
SES / at-risk / under-served students
Critiques / classic errors: Design
16. Failure to examine sub-groups
Effects over long-term vs. short-term: Lee & Molebash, 2004
G1 = Google
G2 = Archive
G3 = selected documents and scaffold Critiques / classic errors: Design
17. Failure to examine sub-groups
Effects over long-term vs. short-term
Gaps in interpretability due to mix of quant and qual
Boster, 2006
Dynarski et al., 2007
Contrast: Brush & Saye – 1999, 2001, 2002, 2004, 2005, 2006… Critiques / classic errors: Design I guess we also have a one-and-done-ness to that is a further limitation. Brush & Saye obviously are a counter-example, and when the second Dynarski et al comes out, it will add longitudinal observations…
Note: A good mix of quant and qual, in my opinion, makes a study always at least usefully bad. Pure quant or pure qual…eh. Usually not good—but can be great, certainly. I guess we also have a one-and-done-ness to that is a further limitation. Brush & Saye obviously are a counter-example, and when the second Dynarski et al comes out, it will add longitudinal observations…
Note: A good mix of quant and qual, in my opinion, makes a study always at least usefully bad. Pure quant or pure qual…eh. Usually not good—but can be great, certainly.
18. Example of tobacco company-funded research
Accelerated Reader (Oppenheimer, 2003)
Ignite!Learning studies
Boster, 2006?
Critiques / classic errors: Vendor-funded studies? I guess we also have a one-and-done-ness to that is a further limitation. Brush & Saye obviously are a counter-example, and when the second Dynarski et al comes out, it will add longitudinal observations…I guess we also have a one-and-done-ness to that is a further limitation. Brush & Saye obviously are a counter-example, and when the second Dynarski et al comes out, it will add longitudinal observations…
19. Carbon monoxide studies
Purely attitudinal outcomes
Survey research
Easy-to-do?
“State of the field” There’s some merit here, but the point is that too much carbon monoxide is harmful. Trace elements exist naturally, it is produced for industrial purposes…but too much of it and you asphyxiateThere’s some merit here, but the point is that too much carbon monoxide is harmful. Trace elements exist naturally, it is produced for industrial purposes…but too much of it and you asphyxiate
20. Carbon monoxide studies
Boutique studies
Early Logo studies
Idiosyncratic OLEs
PBL strategies
…trade off of exploring bleeding edge vs. ecological validity (practicality)
“State of the field” Lots of these, and I don’t want to name names. However, I find a lot of merit to these – someone has to play with the bleeding edge to find out what worksLots of these, and I don’t want to name names. However, I find a lot of merit to these – someone has to play with the bleeding edge to find out what works
21. Carbon monoxide studies
Boutique studies
Sprague, 2005: “Are we talking to ourselves?”
“State of the field” …again, this is why I like TPaCK – it forces one to color outside the lines, cross the streams, what have you.
…again, this is why I like TPaCK – it forces one to color outside the lines, cross the streams, what have you.
22. Carbon monoxide studies
Boutique studies
Sprague, 2005: “Are we talking to ourselves?”
Scientifically-Based Research (NCLB)
“Persuasive research that empirically examines important questions using appropriate methods that ensure reproducible and applicable findings” (Beghetto, 2003)
“State of the field”
23. Carbon monoxide studies
Boutique studies
Sprague, 2005: “Are we talking to ourselves?”
Scientifically-Based Research (NCLB)
Persuasive
Empirical
Important questions (Does it work? What was it?)
Appropriate methods (Privileging RCTs / medical model?)
Replicable and applicable findings (Limitation of qual studies)
“State of the field” From the article:
Persuasive. This attribute refers to research that is moving from "tentative knowledge claims generated at local research sites to become stabilized and transformed into widely accepted facts" (Smith and others 2002). Appropriate research design, methods, and techniques; logic and reasoning; and replicable results can all help to establish persuasiveness.
A critical element in persuasiveness is the peer-review process, in which researchers who have been trained in research methodology review and critique each other's work to help ensure that the methods used match the research questions and conclusions. Research findings published in a peer-reviewed journal can be assumed to have undergone careful scrutiny, been considered in light of alternative explanations, and deemed sufficiently "persuasive" by a panel of individuals with expertise in research methods.
Empirical. Research that is empirical is based on measurement or observation, that is, experienced "through the senses" (NRC 2002). For example, research that measures or observes the impact of school vouchers on student achievement would be considered empirical. However, there are certain questions that cannot be addressed by empirical investigations (NRC ), such as "Should school vouchers programs be enacted in my state?" Questions involving "should" are typically addressed through means other than observation and measurement.
Important Questions. This refers to questions addressed by research that build upon, add to, fill a void in, or otherwise clarify what is known and practiced. The NRC explains that the importance of a question is often determined by its relationship to prior research, theory, and relevance to policy and practice.
Appropriate Methods. This refers to the use of designs, methods, and techniques that fit the nature of the question the study is attempting to answer. However no research design, method, or analytic technique on its own makes a study or program of research scientific (NRC). If the question pertains to "Does it work?," then randomized experiments or quasi-experiments are most appropriate (Raudenbush 2002, Coalition for Evidence-Based Policy).
Simply stated, randomized experiments involve randomly assigning individuals, schools, or districts to a group that receives a particular intervention (such as class-size reduction) and to a group that does not. In contrast, if the question pertains to "What was the 'it'?," then qualitative methods (such as the case study) are most appropriate (Erickson and Gutierrez 2002). Among other things, qualitative methods provide "up-close descriptions" of what is, or is not, working; how interventions are working; and what might be facilitating or impeding the effectiveness of a particular intervention (Raudenbush).
Replicable and Applicable Findings. In general, this attribute refers to consistent, meaningful findings. The research presents sufficient detail to allow for "replication or, at a minimum, ... the opportunity to build systematically on their findings" (NCLB 2002).
Such findings are understandable, accessible, and applicable to a wide audience (Comprehensive School Reform Program Office). For example, a program of research should be designed and conducted to ensure that school leaders across the nation have a solid sense of whether they can expect to see similar results from implementing a school-reform program that has demonstrated increased student learning in another state.
From the article:
Persuasive. This attribute refers to research that is moving from "tentative knowledge claims generated at local research sites to become stabilized and transformed into widely accepted facts" (Smith and others 2002). Appropriate research design, methods, and techniques; logic and reasoning; and replicable results can all help to establish persuasiveness.
A critical element in persuasiveness is the peer-review process, in which researchers who have been trained in research methodology review and critique each other's work to help ensure that the methods used match the research questions and conclusions. Research findings published in a peer-reviewed journal can be assumed to have undergone careful scrutiny, been considered in light of alternative explanations, and deemed sufficiently "persuasive" by a panel of individuals with expertise in research methods.
Empirical. Research that is empirical is based on measurement or observation, that is, experienced "through the senses" (NRC 2002). For example, research that measures or observes the impact of school vouchers on student achievement would be considered empirical. However, there are certain questions that cannot be addressed by empirical investigations (NRC ), such as "Should school vouchers programs be enacted in my state?" Questions involving "should" are typically addressed through means other than observation and measurement.
Important Questions. This refers to questions addressed by research that build upon, add to, fill a void in, or otherwise clarify what is known and practiced. The NRC explains that the importance of a question is often determined by its relationship to prior research, theory, and relevance to policy and practice.
Appropriate Methods. This refers to the use of designs, methods, and techniques that fit the nature of the question the study is attempting to answer. However no research design, method, or analytic technique on its own makes a study or program of research scientific (NRC). If the question pertains to "Does it work?," then randomized experiments or quasi-experiments are most appropriate (Raudenbush 2002, Coalition for Evidence-Based Policy).
Simply stated, randomized experiments involve randomly assigning individuals, schools, or districts to a group that receives a particular intervention (such as class-size reduction) and to a group that does not. In contrast, if the question pertains to "What was the 'it'?," then qualitative methods (such as the case study) are most appropriate (Erickson and Gutierrez 2002). Among other things, qualitative methods provide "up-close descriptions" of what is, or is not, working; how interventions are working; and what might be facilitating or impeding the effectiveness of a particular intervention (Raudenbush).
Replicable and Applicable Findings. In general, this attribute refers to consistent, meaningful findings. The research presents sufficient detail to allow for "replication or, at a minimum, ... the opportunity to build systematically on their findings" (NCLB 2002).
Such findings are understandable, accessible, and applicable to a wide audience (Comprehensive School Reform Program Office). For example, a program of research should be designed and conducted to ensure that school leaders across the nation have a solid sense of whether they can expect to see similar results from implementing a school-reform program that has demonstrated increased student learning in another state.
24. Carbon monoxide studies
Boutique studies
Sprague, 2005: “Are we talking to ourselves?”
Scientifically-Based Research (NCLB)
Are we in the wrong paradigm? (Reeves, 1993; Shaver, 2000)
Scientific question: Why does this happen?
Engineering question: Is it useful?
“State of the field”
25. We see your gold standard and raise you a platinum standard (Schrum et al., 2005)
Here are specific kinds of studies we need to do (Roblyer, 2006)
Establish relative advantage
Improve implementation strategies
Monitor impact on important societal goals
Monitor and report on common uses and shape desired directions Reading for next week Note that Roblyer laid out the idea in 2005 and is writing a series of “research highlights.” I’m asking you to read her example of a type 2 studyNote that Roblyer laid out the idea in 2005 and is writing a series of “research highlights.” I’m asking you to read her example of a type 2 study
26. Can you describe its design? Methods?
How does this stack up?
Quant fumbles
Design fumbles
More carbon monoxide?
A boutique study?
Talking to only IT people?
SBR-ready?
Hopelessly positivist?
Time permitting: Tearing into my work
28. Treatment: Instruction, plus end-of-unit project; diff topics per projectTreatment: Instruction, plus end-of-unit project; diff topics per project
29. Quantitative data All are teacher-derived tests. Tests derived from CG, items aimed to mirror end-of-year SOLs
Unfortunately, pre- and post- items are NOT included on the unit tests! Same content, but diff questions. On the positive side: really reduces the threat of carry-over. On the negative side: Assuming construct validity from pre/post to unit requires a leap of faith
Variable coefficients of reliability: Sem pre-post ranges from .24 to .58; end-of-unit tests range from .4 to .8All are teacher-derived tests. Tests derived from CG, items aimed to mirror end-of-year SOLs
Unfortunately, pre- and post- items are NOT included on the unit tests! Same content, but diff questions. On the positive side: really reduces the threat of carry-over. On the negative side: Assuming construct validity from pre/post to unit requires a leap of faith
Variable coefficients of reliability: Sem pre-post ranges from .24 to .58; end-of-unit tests range from .4 to .8
30. Qualitative data Classroom obs = 80 class periods
Student responses = approx 1000 total
Student projects = approx 80 total
Classroom obs = 80 class periods
Student responses = approx 1000 total
Student projects = approx 80 total
34. Link to 271Link to 271
35. Point of including this slide: As we can see on left, lots of off-task activity with backgrounds. As we can see from featured slide: content-inappropriate image. (Clip art suggests military circa Crimean War, or pre-Civil War; content = Spanish-American War, or shortly before World War I)Point of including this slide: As we can see on left, lots of off-task activity with backgrounds. As we can see from featured slide: content-inappropriate image. (Clip art suggests military circa Crimean War, or pre-Civil War; content = Spanish-American War, or shortly before World War I)
36. Point of this slide: info copied from wikipediaPoint of this slide: info copied from wikipedia
37. The especially interesting bit to me is the “info addressed on the semester exam” countThe especially interesting bit to me is the “info addressed on the semester exam” count
38. What’s due Wednesday?
How is it going to get done?
Where is it to be posted?
What level of assistance / oversight will the instructor provide? Closure