1 / 27

Designing Large-Scale Assessment to Support Learning

Explore the purposes, structure, and format of large-scale assessments to support learning. Understand the role of teachers and the impact of contextual factors on assessment. Examine the Lake Woebegon effect and the challenges of reconciling different pressures in assessment.

lcroft
Download Presentation

Designing Large-Scale Assessment to Support Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Designing Large-Scale Assessment to Support Learning Dylan Wiliam Institute of Education, University of London

  2. Overview • The purposes of assessment • The structure of the assessment system • The locus of assessment • The extensiveness of the assessment • Assessment format • Scoring models • Quality issues • The role of teachers • Contextual issues

  3. Functions of assessment • For evaluating institutions • For describing individuals • For supporting learning • Monitoring learning • Whether learning is taking place • Diagnosing (informing) learning • What is not being learnt • Forming learning • What to do about it

  4. The Lake Woebegon effect All the women are strong, all the men are good-looking, and all the children are above average. Scores • Time

  5. Goodhart’s law • All performance indicators lose their usefulness when used as objects of policy • Privatization of British Rail • Targets in the Health Service • “Bubble” students in high-stakes settings

  6. Reconciling different pressures • The “high-stakes” genie is out of the bottle, and we cannot put it back • The only thing left to us is to try to develop “tests worth teaching to”

  7. Sensitivity to instruction 1 year Distribution of attainment on an item highly sensitive to instruction

  8. Sensitivity to instruction (2) 1 year Distribution of attainment on an item moderately sensitive to instruction

  9. Sensitivity to instruction (3) 1 year Distribution of attainment on an item relatively insensitive to instruction

  10. Sensitivity to instruction (4)

  11. Consequences (1)

  12. Consequences (2)

  13. Consequences (3)

  14. Insensitivity to instruction • A key, and under-investigated, idea • Strongly affected by the normal mechanisms of test development • Has profound consequences for the performance of cohorts over time • Social promotion vs. grade retention

  15. Curriculum design • Curricula must be designed vertically, not horizontally • Key idea: when someone gets better, what is it that gets better? • Learning progressions

  16. The locus of assessment • Authority • Who creates the assessment? • Resources • What resources does the student have? • Interactivity • Negotiation of meaning of response • Scoring • Externalization of standards increases accuracy

  17. The extensiveness of assessment • Are all students assessed on the same basis? • Notions of “fairness” • Adaptations and adjustments for some • But not all • Treating different people the same is no fairer that treating similar people differently

  18. Assessment formats • Multiple-choice • Larger (but unsystematic) coverage • Low cost • Negative backwash effects • Constructed response • Large item-student effects • High cost • Positive backwash effects

  19. Item format and sex-differences Boys do better Girls do better

  20. Scoring models • High cut-scores • Positive experiences for students • Strong warrants for particular skills • Low reliability • Moderate cut scores • Negative experiences for some students • Weak warrants for particular skills • High reliability

  21. Quality issues • Threats to validity • Construct under-representation • Construct-irrelevant variance • Unreliability • Dynamic trade-offs between these

  22. The role of teachers • Using teacher assessment in certification is attractive: • Increases reliability (increased test time) • Increases validity (addresses construct under-representation) • But problematic • Problems of bias (construct-irrelevant variance) • Lack of trust (“Fox guarding the hen house”)

  23. A possible model • All students are assessed at test time • Different students in the same class are assigned different tasks • The performance of the class defines an “envelope” of scores, e.g. • Advanced: 5 students • Proficient: 8 students • Basic: 10 students • Below basic: 2 students • Teacher allocates levels on the basis of whole-year performance

  24. Benefits and problems • Benefits • The only way to teach to the test is to improve everyone’s performance on everything (which is what we want!) • Validity and reliability are enhanced • Problems • Students’ scores are not “inspectable” • Assumes student motivation

  25. The effects of context • beliefs about what constitutes learning; • beliefs in the reliability and validity of the results of various tools; • a preference for and trust in numerical data, with bias towards a single number; • trust in the judgments and integrity of the teaching profession; • belief in the value of competition between students; • belief in the value of competition between schools; • belief that test results measure school effectiveness; • fear of national economic decline and education’s role in this; • belief that the key to schools’ effectiveness is strong top-down management;

  26. Conclusion • There is no “perfect” assessment system anywhere. • Each nation’s assessment system is exquisitely tuned to local constraints and affordances • Assessment practices have impacts on teaching and learning which may be strongly amplified or attenuated by the national context. • The overall impact of particular assessment practices and initiatives is determined at least as much by culture and politics as it is by educational evidence and values.

  27. Conclusion (2) • It is probably idle to draw up maps for the ideal assessment policy for a country, even although the principles and the evidence to support such an ideal might be clearly agreed within the ‘expert’ community. • Instead, focus on those arguments and initiatives which are least offensive to existing assumptions and beliefs, and which will nevertheless serve to catalyze a shift in them while at the same time improving some aspects of present practice.

More Related