Certainty-Based Marking: Enhancing Learning Through Acknowledgment of Uncertainty

Cambridge Assessment June 2009 Marks for identifying uncertainty: Stimulation of learning through Certainty-Based Marking Tony Gardner-Medwin Physiology, University College London www.ucl.ac.uk/LAPT

The take home message: We should reward the acknowledgment of uncertainty Starting points (you may agree or disagree!) • The nature of assessment affects how students learn & think • Objective tests/exercises can stimulate learning & understanding • Formative assessment is more important than summative • Different Q types suit different situations, e.g. T/F, SBA, free text • Scaling to “% above chance” (%Knowledge) should be universal • Negative marking can be either really constructive or really awful • Students & kids can enjoy assessment if it is stimulating, fair, varied, challenging, immediately rewarding, not humiliating -- like a game.

How Certainty-Based Marking works • How it relates to probability & knowledge • How students react & use it • CBM as summative assessment • Why isn’t it used more?

Which Certainty Level is Best? C=3 High 3 C=2 Mid 2 C=1 Low 1 No Reply 0 -1 Mark expected on average 67% 80% -2 -3 -4 guessing -5 range -6 0% 50% 100% How likely is your answer to be correct?

How well do students discriminate reliability ?

knowledge • uncertainty • don't know • misconception • delusion Decreasing certainty about what is true. Increasing certainty about something false. Increasing "ignorance" "It's not ignorance does so much damage - it's knowin' so derned much that ain't so."attrib J. Billings “I was gratified to be able to answer promptly, and I did ! - I said I didn't know.” Mark Twain Ordinary words we use to describe Knowledge • Knowledge is a function of certainty (confidence, degree of belief) • There are states a lot worse than acknowledged ignorance

Student Learning: Principles they readily understand • You need to know the reliability of your knowledge to use it • Confident errors are serious, requiring attention to explanations • Expressing uncertainty when you are uncertain is a good thing • Confidence is about understanding why things cannot be otherwise, not about personality • if over- or under-confident, you must calibrate through practice • reflection and justification are essential study habits In evaluation surveys, a majority of students have always said they like CBM, finding it useful and fair. They asked to include it in exams, and after 5yrs exam use at UCL they voted 52% : 30% to retain it (in 2005/6), though this was rejected by the conservative medical establishment.

Why test knowledge? Google makes it so easy to find ! • Cheap information (& increased teamwork) require :- • Identifying things you will get wrong and not Google! “unknown unknowns” rather than “don't knows” • 2) Judging reliability and uncertainty correctly • .... setting a threshold for seeking help • .... evaluating conflicting and corroborating information In olden times, you had to rely on your own stored information .... you would make a best choice and “go for it” School leavers have more sparse (though broader) stored info, but still have a “go for it” culture - to a scary extent! .... responding with an immediate idea & not thinking much These lessons are core things that CBM teaches

Nuggets of knowledge ? ? ? ? ? ? ? ? EV I DENCE Inference CBM places greater demands on justification & stimulates connections Network of Understanding Thinking about uncertainty / justification develops understanding of relationships Certainty (Degree of Belief) Choice To understand = to link correctly the facts that bear on an issue.

Using CBM • With UCL LAPT software, online or from CD • 2. With Moodle - work in progress • 3. With commercial software – some progress, more needed! • 4. Secure exams, with OMR Cards [Speedwell]

CBM quite closely follows the ideal ignorance measure The student loses about 3 marks per 'bit' of ignorance- up to a maximum of 3 bits

Hassmen & Hunt ‘94 max 100 No negative marking Fixed negative marking: +/-1 high 80 mid 1 1 60 reply reply low 40 0.8 20 min 0.5 0.6 0 no reply Mark expected on average -20 0.4 0 Mark expected on average Mark expected on average -40 no reply 0.2 -60 no reply -0.5 -80 0 -100 35% 55% 67% 85% -0.2 -1 -120 0% 50% 100% 0% 50% 100% 0% 50% 100% Confidence (est'd prob'y correct) Confidence (est'd prob'y correct) Confidence (est'd prob'y correct) Davies 2002 Hevner 1932 high 3 Gardner-Medwin’06 high mid 2 3 mid 1 low 2 low 0 1 Mark expected on average Mark expected on average no reply -1 0 no reply -2 -1 50% -3 -2 0% 50% 100% 0% 50% 100% Confidence (est'd prob'y correct) Confidence (est'd prob'y correct) What’s a good mark scheme? The standard LAPT (1,2,3 / 0,-2,-6) scheme seems better than any of these.

CBM increases the reliability of exam data 'Reliability' indicates to what extent a score measures something about the student's ability, as opposed to 'luck' or chance.

CBM increases the effective test length With increased 'Reliability' you don't need so many exam questions to get data of equal quality.

CBM increases the reliability of exam data with True/False Questions 'Reliability' indicates to what extent a score measures something about the student's ability, as opposed to 'luck' or chance. To achieve these increases using only % correct would have required on average 58% more questions.

Reliability and efficiency of exams (Quality of data / number of questions) are enhanced with CBM Data from 6 medical student exams (250-300 T/F Qs each, >300 students).

Certainty-based scores predict the conventional score on different Qs better than conventional scores do.

How should one handle students with poor calibration? Significantly overconfident in exam: 2 students (1%) e.g. 50% correct @C=1, 59%@C=2, 73%@C=3 Significantly underconfident in exam: 41 students (14%) e.g. 83% correct @C=1, 89%@C=2, 99%@C=3 Maybe one shouldn’t penalise such students Adjusted confidence-based score: Mark the set of answers at each C level as if they were entered at the C level that gives the highest score**. mean benefit = 1.5% ± 2.1% (median 0.6%) ** (first combining sets if %correct is not in ascending order)

Scaling CBM scores to be directly comparable with conventional scores NCOR is based on number correct, scaled so guesses (50% prob’y correct) give on average 0%. (“% Knowledge”)

Equivalence of **scaled CBM scores and conventional scores for standard setting. True/False ♦ and SBA □(5 option) components of a formative test for 345 students were ranked by conventional scores. Then for each decile, mean CBS scores are plotted against % correct above chance (“% knowledge”). Gardner-Medwin & Curtin 2007 REAP conference, data from Imperial College **CBS = ( (Total-Chance)/(Max-Chance) )p × 100%, where p = 0.6 for TF, 0.48 for SBA (5opt)

Why doesn't everybody already use CBM ? - a puzzle • Enthusiasm was exhausted before the age of 'online' • Some CBM methods were complex, opaque or non-motivating • Reluctance to treat certainty as integral to knowledge • Mistaken worries about 'personality bias' • Under-rating of self-assessment & practice as learning tools • Worry that CBM would need new questions • Worry that CBM would upset standard-setting • Inertia and vested interests

A few of the names associated with confidence testing in education • Andrew Ahlgren • Jim Bruno • Confucius • Robert Ebel • Jack Good • Kate Hevner • Darwin Hunt • Dieudonné Leclercq • Emir Shuford “When you know a thing, to hold that you know it. And when you do not know a thing, to allow that you do not know it. This is knowledge.” “Learning without thought is a waste of time.” Confucius London Colleagues: • Mike Gahan • David Bender • Nancy Curtin

We fail if we mark a lucky guess as if it were knowledge. We fail if we mark misconceptions as no worse than ignorance. www.ucl.ac.uk/lapt

Lessons from experience with CBM • Practice is needed before use in exams • Exams should re-use questions from an open database only very sparingly • Over-confidence and diffidence are both unhealthy traits that can be moderated by practice to achieve good calibration • With multi-option questions, students tend (at least initially) to over-estimate reliability • Standard setting - it is easy (but important!) to scale CBM marks to match familiar scales based on number correct.

Some Questions about CBM ! • Are there problems using it? • Why doesn't my VLE support CBM? • Do students need practice? • Isn't computer marked assessment just factual? • Does CBM increase retention? • Do I need new questions? • What are the best Q types? • What about school education? • Is it relevant to my subject, where opinions differ? • Isn't it bad to encourage guessing? • What if my only assessments are exams? • How do I convince an exam board? • Isn't it right/wrong that really matters?

Response to LAPT numeracy exercises in medical 1st year

"I think about confidence assessment 50% 40% 30% 20% 10% 0% Every Time Most of the Rarely Never No reply time % "I sometimes change my answer while thinking about 30 confidence assessment" 25 20 15 10 5 0 Disagree 1 2 3 4 Agree 5

Students really take to confidence-based marking • Principles that students seem readily to understand :- • both under- and over- confidence are impediments to learning • confident errors are far worse than acknowledged ignorance and are a wake-up call (-6!) to pay attention to explanations • expressing uncertainty when you are uncertain is a good thing • thinking about the basis and reliability of answers can help tie bits of knowledge together (to form “understanding”) • checking an answer and rereading the question are worthwhile • sound confidence judgement is a valued intellectual skill in every context, and one they can improve • immediate feedback while still thinking about the basis of your answer is a hugely valuable study aid

No. Confidence scores are better than simple scores at predicting even the conventional scores on a different set of questions. This can only be because they are a statistically more efficient measure of knowledge. The correlation, across students, between scores on one set of questions and another is higher for confidence than for simple scores. But perhaps they are just measuring ability to handle confidence ?

Known Knowns ... things we know that we know.Known Unknowns ... things that we know that we don't know.Unknown Unknowns ... things we do not know we don't know. Donald Rumsfeld When you know a thing, to hold that you know it. And when you do not know a thing, to allow that you do not know it. This is knowledge. Confucius

Will it snow next weekend? Does a (good) weather forecaster have knowledge? - obviously yes, but expressed through a probability Does insulin raise blood glucose levels? Similar, even though the Q is not about a probability. - the probability is your certainty that your answer is right How can you measure and reward this knowledge? - the origin of CBM >100 years ago. The key is to have a "proper" or "motivating" reward scheme, which ensures that the person does best by expressing their true level of uncertainty

CBM data is a more valid measure of ability 'Validity' means it measures what you want, rather than just something easily measured.

SUMMARY Why CBM? • Get students to think more carefully • Reward recognition of uncertainty, either personal or in a group • Highlight misconceptions • Engage students more - the game element of CBM • Encourage criticism of Qs (intolerance of ambiguity or looseness) • In general: enhance self-assessment as a learning experience NB All of the above arise with little or no practice with CBM. The following do require practice : • More searching diagnostic data • More valid and reliable assessment data (But NB with CBM you have conventional assessment data too.)

Certainty-Based Marking: Enhancing Learning Through Acknowledgment of Uncertainty