Institute of Psychological Sciences FACULTY OF MEDICINE AND HEALTH

Institute of Psychological Sciences FACULTY OF MEDICINE AND HEALTH 4th Biennial Psychology Learning & Teaching Conference Assessors’ thinking: what goes through their minds when they mark undergraduate psychology essays? Dr. Siobhan Hugh-Jones Dr. Mitch Waterman Inga Wilson A UoL TQEF/Raising Professional Standards project

Introduction: Assessment in HE • Marking criteria, marking scales, assessment grids common features of assessment practice in HE. • Validity, reliability, usability and meaningfulness often disappointing for staff and students1. • May be different use of / interpretation of criteria2 or incompleteness of criteria as assessors often mark for features not specified in criteria3. • Many argue that poor validity / reliability of marking criteria stems from the influence of tacit knowledge on assessors4.

Tacit knowledge • Recent conceptualisations 5 • Tacit knowledge is to some extent accessible, but “difficult to formalise, not readily available to consciousness, and influences behaviour in ways that are not mediated by explicit knowledge” (Elander, 2004, p.117). • Personal to assessors and not in marking criteria6. • Develops through experiential learning and routinisation of professional activities into mental models, beliefs and perspectives deeply rooted in action7. • Crucial to align criteria (implicit and explicit), grades and feedback for meaningful assessment8.

Previous Work • With few exceptions 9, most work in this area uses retrospective reports or study of static materials10. • But we still do not understand the actual process of marking11 – what goes on in the minds of assessors when they mark? • Exploring a methodology: verbal (think aloud) protocol12 • Is it possible to externalise ones tacit knowledge (‘how am I doing this?) without altering it? • Verbalisations are disconnected and incomplete.

Our Aims • Phase 1: explore the nature of the marking process and to usethis to inform recruitment for Phase 2. • Phase 2: • pilot a verbal protocol approach to assess the conscious accessibility of influences on assessment • explore the nature and role of these influences (i.e. explicit and tacit knowledge) • examine the alignment between feedback, the mark awarded, and marking criteria.

Method: Phase 1 • Web-based questionnaire to staff in the Institute of Psychological Sciences exploring: • reported confidence and competence in different forms of assessment • usability of marking criteria • ways of marking different assessments • drivers in providing feedback • confidence in mapping of marks and feedback to marking criteria. • 20 staff completed questionnaire (7 males, 13 females): 6 novice (<2 yrs), 4 competent (2-5 yrs), 10 experienced (5 yrs+).

Results: Phase 1 • No differences between experienced / novice markers in reported usability of marking criteria. • Praise in feedback was reported to be more important for experienced markers, but they were less concerned with reference to criteria, AND expanding knowledge, in contrast to novices who were more neutral about these. • Experienced markers annotate, award mark and then provide final feedback. Novices deliver all feedback THEN award the mark.

Method: Phase 2 • 8 academic staff • 2 novice females, 2 novice males (1 subject expert) • 2 experienced females (both subject experts), 2 experienced males. • Completed verbal protocol whilst marking Level 1 UG essay : ‘What is meant by personality disorder and do psychologists have a good understanding of it?’ • Essay in original format, blind to name & original mark awarded; staff would have typically marked this essay as part of Level1. • Instructed to articulate every thought occurring to them during the process, and to refer to feedback given. • Reported that the task represented their usual marking style ‘quite well’.

Exercise – verbal protocol • Pair up – allocate as A and B. • An extract will be presented which derives from towards the end of the essay that we used in the study. • A - please read the following extract. As you read it, try to verbalise to your neighbour every thought that comes to mind. • B - please then try to code the nature of A’s comments (e.g. are they about content, or presentation, etc.).

Exercise – verbal protocol • Although there is little empirical research for treating personality disorders, psychologists are increasing their understanding of personality disorder, and developing new techniques for treating such disorders, for example, Linehan’s dialectical behaviour therapy, which is used to treat borderline personality disorders. This therapy combines cognitive-behavioural effects with interpersonal psychodynamic techniques (Nolan, 2004). It has been proved to increase functioning and has reduced suicidal behaviour in all borderline patients.

Results: Phase 2 • Length of recordings varied from [3 min. to 33 min.] • Digital recordings transcribed verbatim (2 had inaudible sections) • Analysis involved line-by-line coding by three researchers through which meaning units were identified: • Meaning unit = section of text that was deemed to have a discrete, discernible meaning or a discrete referent • e.g. “There doesn’t seem to be very much actual analysis” / “ It also sort of lacks flow between the different sections” • A coding framework was developed which captured the referents or processes in each meaning unit. Final coding framework was re-applied to all 8 transcripts.

Coding Framework Emerged that the data could be coded as: • Content (+ve / -ve) • Presentation (+ve / -ve) • Inferring something about the student (e.g. thought, reading) (+ve / -ve) • The assessors’ approach / intention in marking (e.g. to be fair/nice) • Reference to personal models (i.e. implication that assessor has an ideal answer in mind, or ideal way to produce essays) • Reflective processes (e.g. managing attention, suspending judgement) • Feedback triggers (e.g. for specific or general feedback) • Model building for the award of a mark (what increased / decreases it)

Results: Verbal Protocol • All participants could verbalise their thinking; fewer verbalisations of why particular mark was awarded. • Number of +ve comments ranged from 4 to 20. Number of -ve comments ranged from 9 to 45; novice markers tended to make more -ve comments. • Ratio of content : presentation codes was significantly higher for subject experts (i.e. they made more comments about content) (t(6) = 5.52, p < 0.001). • Expert markers awarded a significantly higher mark (62.5 vs 58.25) than did novices (p = 0.05).

Results: Verbal Protocol • No differences between participants in the number of reflective processes, feedback triggers or references to personal models. • Half the participants (3 experienced, 1 novice) attended to features not in marking criteria: use of references, quality of introduction and conclusions, and how interesting the work was. • Markers appeared to find marking sections of description difficult, and usually viewed it negatively, despite essay title demanding such description. They queried the extent to which such description was the students’ own work.

Results: Verbal Protocol • One notable finding was the extent to which markers shifted their impression of the essay’s quality (-ve to +ve), in response to a single positive element, or during final reviewing. • Suggests a fragility in the mental model derived of the essay’s quality, or differential weighting of criteria. • Alternatively, have to consider the possibility that the verbal protocol does not capture the true complexity of the experience, or itself intrudes into the normal experience.

Results: Feedback • Developed coding framework for in-text and coversheet feedback. • Codes were: • Directive (Use a subheading here / Cite more references) • Suggestive (Try to use more journal articles) • Negative (This isn’t clear) • Praise (Good point) • Challenging (Is this really the case?) • Explaining (Maybe this is due to the therapist?) • Ambiguous (underlining something) • General Encouragement (Good effort)

Results: Feedback • There was a significant negative correlation between the number of feedback triggers and the mark awarded (r = -0.641, N=8, p= 0.043). • There was a significant positive correlation between very informative suggestions and the mark awarded (r = 0.832, N=8, p= 0.01). • There were no statistically significant differences in number or type of feedback comments given between experienced and novice markers.

Results: Feedback • Females gave significantly more challenging comments than males (t(6) = 4.629, p= 0.004). • Females gave more negative comments than did males, and this approached statistical significance (t(6) = 4.42, p= 0.052).

Discussion • Understandably, subject experts made more comments about content. • Most assessors’ comments could be aligned with published criteria; however, more experienced markers seemed to rely less on these and 4 of the 8 assessors referred to features not in criteria. • We were surprised not to find more differences between novice and more experienced markers in terms of elements attended to (some evidence suggests experts attend to fewer cues as have greater cue sensitivity)13.

Discussion • Despite similarities in what attracted the markers’ attention, marks ranged from 55 to 65, with experienced markers awarding higher marks. So, what happens at the end ‘delayed judgement’ stage to account for this?14 • Novices may find it more tolerable to be seen as a ‘hard marker’. • Novices appeared to use the process of giving feedback to arrive at a mark but experienced markers seem less reliant upon this. Suggests difference in evaluative processes, with experts perhaps dynamically updating their evaluations as they read the essay. • Does this reflect a more developed personal model of content, or structure, or both?

Final Comments • With some modifications, the verbal protocol approach is viable as a means of accessing thoughts that occur to markers, for many different assessment formats. • There seems to be some alignment between criteria, feedback and noticed features. But these do not seem to align so well with marks awarded. Is this a calibration problem? • We are continuing this work, in five different Schools in the University of Leeds , and across different forms of assessments in a larger project.

References • O’Donovan et al., (2004); Read et al., (2005); Webster et al., (2000) • Elander & Hardman (2002) • Newstead & Dennis (1994); Norton et al., (1999) • Baird (2002); Elander (2004); Greatorex (2002); Holyroyd; (2000); Lea and Street (2000) • Molander (1992) • Higgins et al., (2001) • Eraut, (2000) • Elander (2004); O’Donovan et al., (2004), Price (2005) • Elander & Hardman (2002); Greatorex (2001) • Ecclestone, (2001); Price, (2005); Yorke, Bridges and Woolf, (2000); Webster et al., (2000) • Yorke et al., (2000) • 12. Ericsson and Simon (1980, 1998) • 13. Lefevre & Lories (2004) • 14. Lefevre & Lories (2004)

Reference List • Baird, J. (2000). Are examination standards all in the head? Experiments with examiners’ judgements of standards in A level examinations. Research in Education, 64(2), 91-100. • Ecclestone, K. (2001). ‘I know a 2:1 when I see it’: understanding criteria for degree classifications in franchised university programmes. Journal of Further and Higher Education, 25(3), 301-313. • Elander, J. (2004). Student assessment from a psychological perspective. Psychology Learning and Teaching, 3(2), 114-121. • Elander, J. and Hardman, D. (2002). An application of judgement analysis to examination marking in psychology. British Journal of Psychology, 93, 303-328. • Eraut, M. (2000) Non-formal learning and tacit knowledge in professional work. British Journal of Educational Psychology, 70, 113-136. • Ericsson, K.A. and Simon, H. (1998) How to study thinking in everyday life. Mind, Culture and Activity, 5 (3), 178-186. • Ericsson, K.A. and Simon, H. (1980) Verbal reports as data. Psychological Review, 87, 215-251.

Reference List • Greatorex, J. (2002). Making accounting examiners’ tacit knowledge more explicit: developing grade descriptors for an Accounting A-level. Research Papers in Education, 17(2), 211-226. • Greatorex, J (2001). Making the grade - How question choice and type affect the development of grade descriptors. Educational Studies, 27(4, 451-464. • Higgins, R., Hartley, P. and Skelton, A. (2001). Getting the message across: the problem of communicating assessment feedback. Teaching in Higher Education, 6(2), 269-274. • Holroyd, C. (2000). Are assessors professional? Active learning in higher education, 1(1), 28-44. • Lea, M.R. and Street, B.V. (2000). Student writing and staff feedback in higher education: An academic literacies approach. In M. Lea and B. Stierer (Eds.), Student Writingin Higher Education (pp. 32-46). Buckingham, England: Open University Press. • Lefevre, N. and Lories, G. (2004). Text cohesion and metacomprehension: immediate and delayed judgements. Memory and Cognition, 32(8), 1238-1254. • Molander, B. (1992) Tacit knowledge and Silenced knowledge: fundamental problems and controversies. In B. Goranzon and M. Florin (Eds.) Skills and Education. (pp9-31). NY: Springer-Verlag.

Reference List • Newstead, S. and Dennis, I. (1994). Examiners examined. The Psychologist, 7(5), 216-219. • Norton, L., Brunas-Wagstaff, J. and Lockley, S. (1999). Learning outcomes in the traditional coursework essay: do students and tutors agree? In C. Rust (Ed.), Improving Student Learning: Improving Student Learning Outcomes (pp.240 - 248). Oxford, England: The Oxford Centre for Staff and Learning Development. • O’Donovan, B., Price, M. and Rust, C. (2004). Know what I mean? Enhancing student understanding of assessment standards and criteria. Teaching in HigherEducation, 9(3), 325-334. • Price, M. (2005). Assessment standards: the role of communities of practice and the scholarship of assessment. Assessment and Evaluation in Higher Education, 30(3), 215-230. • Read, B., Francis, B. and Robson, J. (2005). Gender, ‘bias’, assessment and feedback: analyzing the written assessment of undergraduate history essays. Assessment and Evaluation in Higher Education, 30(3), 241-260. • Webster, F., Pepper, D. and Jenkins, A. (2000). Assessing the undergraduate dissertation. Assessment and Evaluation in Higher Education, 25(1), 71-80. • Yorke, M., Bridges, P. and Woolf, H. (2000). Mark distributions and marking practices in UK higher education. Active learning in higher education, 1(1), 7-27.

Institute of Psychological Sciences FACULTY OF MEDICINE AND HEALTH