1 / 33

Paul E. Newton

30 Years of Evidence on the Comparability of Exam Standards : Myths, Fiascos and Unrealistic Expectations. Paul E. Newton Centre for Evaluation & Monitoring, University of Durham , 30th Anniversary Conference: 30 Years of Evidence in Education . 23 September 2014. London.

Download Presentation

Paul E. Newton

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 30 Years of Evidence on the Comparability of Exam Standards: Myths, Fiascos and Unrealistic Expectations Paul E. Newton Centre for Evaluation & Monitoring, University of Durham, 30th Anniversary Conference: 30 Years of Evidence in Education. 23 September 2014. London.

  2. Statistics vs. Judgement:What Does 30 Years of Research Tell Us About the Best and Worst Way to Maintain Exam Standards?

  3. What does it mean to ‘maintain’ an exam standard? • Grade Awarding • The process of identifying: • which marks on this year’s exam • correspond to levels of attainment(i.e. levels of knowledge, skill and understanding) that were associated with • grade boundary markson last year’sexam.

  4. Why do exam boards need to move grade boundaries? • Because even exams that are designed to measure: • exactly the same kind ofattainment • in exactly the sameway • may end up beingslightly different in terms of the overall difficulty of their questions

  5. Have we always maintained exam standards like this? • 30 years ago – in 1984? • 60 years ago – in 1954?

  6. Have we always maintained exam standards like this? • 30 years ago – in 1984? • 60 years ago – in 1954? • … yes, pretty much!

  7. Attainment-referencing • From one examination to the next, corresponding grade boundaries should be located at marks associated with equivalent levels of attainment.

  8. The myth

  9. The myth… debunked

  10. How do you operationalise attainment-referencing?

  11. Scrutiny of scripts(undertaken by examiners) • Comparing levels of attainment ‘directly’ by inspecting performances in examination scripts • a.k.a. ‘Judgement’

  12. Scrutiny of data(undertaken by the Board) • Comparing levels of attainment indirectly by ‘modelling’ the causal determinants of attainment • a.k.a. ‘Statistics’

  13. Which is better –statistics or judgement?

  14. Which is better –statistics or judgement?

  15. The battle of grade awarding • Examiners • We are just so impressed by the quality of performances that we see in our French exams. • TheBoard • But do you really have enough evidence to justify raising the pass-rate yet again? • After all: • pass-rates haven’t been rising in German or Spanish • the French cohort is expanding massively

  16. What Does 30 Years of ResearchTell Us About the Best and Worst Way to Maintain Exam Standards?

  17. Evidence from Exam Boards

  18. Evidence from Academia

  19. Evidence from Regulators

  20. What have we learned since 1984?

  21. We shouldn’t put too much confidence in statistics

  22. 4 NEAB maths A levels • P&A, P&M, P&S, SMP • MLM to control for prior achievement, gender, etc. • even after control, SMP still appeared too lenient • However the SMP syllabus • more motivating • excellent support materials • more time-consuming

  23. We shouldn’t put too much confidence in judgement

  24. Grade boundaries set by examiner judgement alone • for two exam papers • same subject • different tiers • sat by same candidates • Many more students ended up with higher grades on the lower tier exam (than on the higher tier).

  25. Judgemental innovations • We have learned how to harness examiner judgement more effectively

  26. Statistical innovations • We have learned how to compute statistical analysesmore effectively

  27. It is extremely hard topredict and controlcomparability threats.

  28. The ‘fiascos’ • Summer 2002 • Curriculum 2000 anomaly • Summer 2012 • GCSE English anomaly

  29. January awarding, 2012 Clear tendency to ensure students marked ‘comfortably’ above historical boundaries

  30. June awarding, 2012Same tendency, but many students no longer ‘comfortably’ above the raised boundaries

  31. So, which is better –statistics or judgement?

  32. Unrealistic expectations • Three ‘stages’ in understanding comparability • statistical auditing • problems are routine • solutions require ‘back of the envelope’ sums • scientific research • problems are difficult • solutions require rigorous and objective investigations • art criticism • problems are perhaps insurmountable • solutions require value judgements • (Bardell, Forrest and Shoesmith, 1978)

  33. Realistic expectations +Persuasive justifications • Four ‘stages’ in understanding comparability • statistical auditing • scientific research • art criticism • engineering pragmatism • many comparability problems are technically insurmountable… but some are less insurmountable than others and should be prioritised • all comparability solutions are inevitably imperfect… but some are less imperfect than others and should be prioritised • technically insurmountable problems and inevitably imperfect solutions highlight the fundamental importance of strong arguments in defence of policy and practice

More Related