1 / 40

How is Testing Supposed to Improve Schooling?

How is Testing Supposed to Improve Schooling?. Edward Haertel April 15, 2012 NCME Career Award Address Vancouver, British Columbia. How Many Purposes… ?. Purposes for Educational Testing. Measuring versus Influencing. Measuring

rachel
Download Presentation

How is Testing Supposed to Improve Schooling?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. How is Testing Supposed toImprove Schooling? Edward Haertel April 15, 2012 NCME Career Award Address Vancouver, British Columbia

  2. How Many Purposes… ?

  3. Purposes for Educational Testing

  4. Measuring versus Influencing • Measuring • Relies directly on informational content of specific test scores • Influencing • Effects intended to flow from testing per se, independent of specific test results • Deliberate efforts to raise test scores • Changing perceptions or ideas

  5. Example: Weekly Spelling Test • Measuring • Note words often missed (guides reteaching) • Assign grades • Guide students’ review following testing • Influencing • Motivate studying • Convey importance of spelling proficiency

  6. Leap from measuring to influencing Arguments … claim … program will lead to improvements in school effectiveness and student achievement by focusing … attention … on demanding content. Yet, the validity arguments … attend only to the descriptive part of the interpretive argument …. The validity evidence … tends to focus on scoring and generalization to the content domain for the test. The claim that the imposition of the accountability requirements will improve the overall performance of schools and students is taken for granted. Kane, M. T. (2006). Validation. In R. L Brennan (Ed.), Educational Measurement (4th ed., pp. 17-64)

  7. Interpretive Argument • Scoring • Alignment, DIF, scaling, norming, equating, … • Generalization • Score precision, reliability, generalizability, … • Extrapolation • Score as reflection of intended construct • Decision or Implication • Use in guiding action or informing description

  8. “Appropriate test use and sound interpretation of test scores are likely to remain primarily the responsibility of the test user.” Standards for Educational and Psychological Testing, p. 111 Not our concern?

  9. Process too linear? • Curriculum Framework • Test Specification • Item Writing • Forms Assembly • Tryout and revision • Administration • Scaling

  10. Today’s Focus • Achievement tests taken by students • Some attention to aptitude tests as well • Exclude tests taken by teachers • Include uses of student test scores to evaluate teachers • Exclude testing for individual diagnosis of special needs

  11. Curriculum-Dependent Test Question Curriculum-Neutral Test Question • May assume prior knowledge and skills • May probe reasoning with what is already known • May “drill deeper,” testing application of concepts • Must include requisite information with item • Must set up context in order to probe reasoning • Often limited to testing knowledge of concept definitions Testing and Prior Instruction

  12. Seven Broad Purposes of Testing

  13. Seven Broad Purposes of Testing

  14. Seven Broad Purposes of Testing

  15. Seven Broad Purposes of Testing

  16. Seven Broad Purposes of Testing

  17. Purposes for Educational Testing

  18. Seven Broad Purposes of Testing

  19. Seven Broad Purposes of Testing

  20. Seven Broad Purposes of Testing

  21. Purposes for Educational Testing

  22. Instructional Guidance • Formative Assessment (informal) • Scoring • Sound items adequately sampling domain? • Generalization • Test scores with adequate precision? • Extrapolation • Mastery extends beyond test per se? • Decision or Implication • Used to adapt teaching work to meet learning needs?

  23. • Scoring • Generalization • Extrapolation • Decision or Implication Instructional Guidance • Formative Assessment (highly structured) • Winnetka Plan • Programmed Instruction approaches • Benjamin Bloom’s Mastery Learning • Pittsburgh LRDC’s IPI Math Curriculum • Criterion-Referenced Testing movement

  24. Instructional Guidance • Formative Assessment (highly structured) • Scoring • Questions mapped well to behavioral objectives • Generalization • Multiple items highly redundant • Extrapolation • ??? Assume decomposability, decontextualization • Decision or Implication • Relied on cut scores, simple rules; insufficient attention to actual effects

  25. Student Placement and Selection • IQ-based tracking • GATE programs • English Learner status (Entry / Exit) • MCTs / HSEEs • Advanced Placement / International Baccalaureate • SAT / ACT • …

  26. IQ-Based Tracking • Rationale • Teachers deliver uniform instruction to all students in a classroom • Students learn at different rates • Or, have different “capacities” • Grouping students by ability will improve efficiency because all will receive content at a rate appropriate to their ability • This will reduce wasted effort and frustration

  27. IQ-Based Tracking • Context • Increasing immigration (since late 19th century) • Perceived success of Army Alpha • Scientific School Management movement • Prevailing hereditarian views

  28. IQ-Based Tracking • Scoring • Scores free from bias and distortion? • Generalization • High correlations across forms and occasions • Extrapolation • Assumed based on strong theory, some criterion-related validity evidence • Decision or Implication • Largely unexamined

  29. Student Placement and Selection • IQ-based tracking • GATE programs • English Learner status (Entry / Exit) • MCTs / HSEEs • Advanced Placement (AP) / International Baccalaureate (IB) • SAT / ACT • …

  30. Comparing Educational Approaches • ESEA-mandated Project Head Start evaluations • Evaluations of NSF-sponsored science curricula • National Diffusion Network • What Works Clearinghouse • Both RCTs and Quasi-experimental research

  31. Educational Management • Measuring Schools • NCLB • Adequate Yearly Progress (AYP) determinations • Intervention for schools “in need of improvement” • Measuring Teachers • “Value-Added” Models “Measuring” purpose (Educational Management) is only part of the story. “Influencing” interacts with “measuring.”

  32. “Value-Added” Models forTeacher Evaluation • Scoring • May require vertical scaling • Bias due to violations of model assumptions • Generalization • Extra error due to student sampling and sorting • Extrapolation • Score gains as proxy for teacher effectiveness / teaching quality broadly defined • Decision or Implication • Largely unexamined

  33. Influencing • Purposes of directing effort, focusing the system, and shaping perceptions rarely stand alone • Direct use of test scores for measuring is always included • Influencing purposes may nonetheless be more significant

  34. Shaping Public Perceptions "Test results can be reported to the press. … Based on past experience, policymakers can reasonably expect increases in scores in the first few years of a program … with or without real improvement in the broader achievement constructs that tests … are intended to measure." R. L. Linn (2000, p. 4)

  35. Attending to Influencing Purposes in Test Validation • Importance • Influence as ultimate rationale for testing • Place in the interpretive argument where unintended consequences arise • Challenge • Purposes not clearly articulated • Required data not available for years • Required research methods unfamiliar • Disincentives to look closely • Expensive, may not matter

  36. Clarity of Purpose SBAC and PARCC Consortia must have: “A theory of action that describes in detail the causal relationships between specific actions or strategies … and … desired outcomes …, including improvement in student achievement and college- and career-readiness.”

  37. Availability of Data • Familiar problem in literature on program evaluation • Plan ahead • Attend to implementation cycle • Do not ask for results too soon • Plan for “audit” tests? • Phased implementation?

  38. Expanded Methods and Theories • Can we view testing phenomena through other disciplinary lenses? • Validation requires both empirical evidence and theoretical rationales • Common sense gets us part way there • Where does theory for “Influencing” purposes come from? • What research methods can we borrow?

  39. Costs and Incentives • Need increased investment in comprehensive validation • Need help from agents, agencies beyond test makers, test administrators • Need more explicit press for comprehensive validation in RFPs, public discourse

  40. Thank you

More Related