1 / 30

Improving assessment: the key to education reform Daisy Christodoulou

Improving assessment: the key to education reform Daisy Christodoulou Director of Education, No More Marking Research Ed, Saturday September 9 th 2017. Improving assessment: the key to education reform.

ulf
Download Presentation

Improving assessment: the key to education reform Daisy Christodoulou

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Improving assessment: the key to education reform Daisy Christodoulou Director of Education, No More Marking Research Ed, Saturday September 9th 2017

  2. Improving assessment: the key to education reform National exams and schools' internal assessment systems have a big impact on what gets taught in the classroom, and often lead to unintended and damaging consequences. How can we change assessment so that it helps to improve education rather than distorting it?

  3. Why do we need to improve assessment? Better measurement leads to improvements and innovations Bad measurement leads to distortion and unintended consequences

  4. Do we even need assessment?

  5. Four assessment practices that are distorting • Using prose descriptors to grade work and give pupils feedback • Marking essays using absolute judgement • Viewing grades as discrete categories • Thinking that test scores matter!

  6. Prose descriptors, mark schemes, rubrics

  7. Prose descriptors aren’t accurate… ‘Can compare two fractions to identify which is larger’ 90% get this right Which is bigger: 3/7 or 5/7? Which is bigger: 3/4 or 4/5? 75% get this right 15% get this right Which is bigger: 5/7 or 5/9? Qtd in Wiliam, Principled Assessment Design, SSAT 2014

  8. …and they aren’t helpful! • ‘I remember talking to a middle school student who was looking at the feedback his teacher had given him on a science assignment. The teacher had written, • “You need to be more systematic in planning your scientific inquiries.” I asked the student what that meant to him, and he said, • “I don’t know. If I knew how to be more systematic, I would have been more systematic the first time.” • This kind of feedback is accurate—it is describing what needs to happen—but it is not helpful because the learner does not know how to use the feedback to improve. It is rather like telling an unsuccessful comedian to be funnier—accurate, but not particularly helpful, advice.’ • Dylan Wiliam, Embedded Formative Assessment

  9. Good and bad multiple-choice questions… What is the capital of Moldova? • Baku • Tbilisi • Chisinau • Minsk • Yerevan What is the capital of Moldova? • Paris • London • Chisinau • New York • Mexico City Unambiguously wrong... but still plausible! Unambiguously wrong... but notplausible! Create unambiguously wrong but plausible distractors!

  10. What is 20% of 300? Correct answer • 60 • 20 • 15 • 30 Common pupil misconception: if you work out 10% by dividing by 10, then you must work out 20% by dividing by 20

  11. Prose descriptors & written comments Bad practice • Using prose descriptors to judge work and give feedback Good practice • Define descriptors as questions and use those instead What’s the research? • Wolf, Alison. "Portfolio assessment as national policy: the National Council for Vocational Qualifications and its quest for a pedagogical revolution." Assessment in Education: principles, policy & practice 5.3 (1998): 413-445, p.442. • Polanyi, Michael. Personal knowledge. Routledge, 2012 • Sadler, D.R. 1987. ‘Specifying and promulgating achievement standards.’ Oxford Review of Education, 13: 191–209.

  12. Absolute judgement (1) Stealing a towel from a hotel(2) Keeping a dime you find on the ground(3) Poisoning a barking dog (1*) Testifying falsely for pay(2*) Using guns on striking workers(3*) Poisoning a barking dog Mozer, Michael C., et al. "Decontaminating human judgments by removing sequential dependencies." Advances in Neural Information Processing Systems 23 (2010).

  13. Comparative Judgement Normally, we ask: does this essay meet the criteria?

  14. Instead, we should ask, is this essay better than this essay?

  15. Absolute judgement Bad practice • Trying to mark essays absolutely Good practice • Use comparative judgement instead What’s the research? • Thurstone, Louis L. ‘A law of comparative judgment.’ Psychological review 34.4 (1927) • Laming, Donald. Human judgment: the eye of the beholder. Cengage Learning EMEA, 2003.

  16. Viewing grades as discrete categories

  17. Reporting Paul Ringo John George Expected Standard EXS Greater Depth GDS Working towards WTS

  18. Viewing grades as discrete categories • Grades ‘simply get layered on top of the scale’. • ‘The labels chosen for performance standards [such as working towards, expected standard] have their own meanings independent of their use with the standards, and these clearly influence how people interpret the results they are given.’ Koretz, Daniel M. Measuring up. Harvard University Press, 2008

  19. Viewing grades as discrete categories Bad practice • Viewing grades as discrete categories Good practice • Recognise that grades are lines on a continuum What’s the research? • Koretz, Daniel. Measuring up. Harvard University Press, 2008

  20. Thinking that test scores matter!

  21. Thinking that test scores matter! The really important idea here is that we are hardly ever interested in how well a student did on a particular assessment. What we are interested in is what we can say, from that evidence, about what the student can do in other situations, at other times, in other contexts. Some conclusions are warranted on the basis of the results of the assessment, and others are not. The process of establishing which kinds of conclusions are warranted and which are not is called validation. Wiliam, Dylan, Principled Assessment Design, SSAT: London, 2014 Test scores reflect a small sample of behaviour and are valuable only insofar as they support conclusions about the larger domains of interest. This is perhaps the most fundamental principle of achievement testing. Koretz, Daniel M. Measuring up. Harvard University Press, 2008.

  22. The sample and the domain The domain The sample

  23. In political polling… The domain – the entire electorate – 40m people The sample = 1,000 voters who are representative of the larger domain

  24. In TV advertising… The domain – number of people watching particular TV channels The sample – people watching at three specified weeks of the year

  25. In the postal service… The domain – delivery times to all addresses in US – c. 300 m people The sample – delivery to 1,000 addresses

  26. In a vocabulary test… The domain – all the words you know – approx 20,000 The sample – a 40 word vocab test

  27. Any exam All your skills and knowledge What can be assessed in 2-3 hours

  28. Goodhart’s Law / Campbell’s Law • "When a measure becomes a target, it ceases to be a good measure.“ • "The more any quantitative social indicator (or even some qualitative indicator) is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor."

  29. Thinking that test scores matter! Bad practice • Thinking that the test score by itself matters! Good practice • Recognise that what matters are the inferences we can make from the test score. What’s the research? • Wiliam, Dylan, Principled Assessment Design, SSAT: London, 2014 • Koretz, Daniel, Measuring up. Harvard University Press, 2008

  30. Four practical assessment errors • Using prose descriptors to grade work and give pupils feedback • Marking essays using absolute judgement • Viewing grades as discrete categories • Thinking that test scores matter!

More Related