Kevin Waugh, Neil Smith, Pete Thomas Department of Computing The Open University

DEAP: Diagrammatic Electronic Assessment Project Kevin Waugh, Neil Smith, Pete Thomas Department of Computing The Open University

Toward the automated assessment of ERDs

The investigators • Diagram Understanding • Neil Smith • Natural Language Processing • Kevin Waugh • Assessment, Teaching and Learning • Pete Thomas

What is a diagram? • A picture isn't

What is a diagram? • Free and structured text aren't "It _is_ a long tail, certainly," said Alice, looking down with wonder at the Mouse's tail; "but why do you call it sad?" And she kept on puzzling about it while the Mouse was speaking, so that her idea of the tale was something like this:----"Fury said to a mouse, That he met in the house, `Let us both go to law: _I_ will prose- cute _you_.-- Come, I'll take no de- nial: We must have the trial; For really this morn- ing I've nothing to do.' Said the mouse to the cur, Graham Joyce was sitting in one of the sunloungers. He leaned forward and gave Tim a firm handshake. 'Tim, greetings and salutations.' For a man in his eighties he retained a remarkably vigorous air, possessing a gaunt face that genoprotein treatments had never quite managed to soften and a shock of unruly snow-white hair. His voice was like a forceful foghorn.

These are diagrams….

and these are diagrams….

Traditional take on diagrams • Treated as formal "visual" languages • so, they're expected to be parsable • grammatical, correct and complete • But real diagrams aren't formal • they're not always grammatical • they're often incomplete, often incorrect • (we use the term imprecise) • they are not always parsable • (especially when drawn by students!)

Interesting question: What if we treat diagrams in the same way that we treat text?

Text and diagram - a simple correspondence • Characters/punctuation - segments • Words – features • Phrases - "minimal meaningful units“ • Sentences – mmu aggregations

Natural language • A grammar is an approximation to actual language use; do we even need a grammar? • Pragmatic - rather than correct/complete • Sub-languages • specific grammars for specific domains • stylistic conventions • novels • instruction manual • interpretation is domain specific • no "universal" solution

Research question: If we attempt to process diagrams in ways comparable to the ways we process formal, natural and sub-language texts……. (bag of words, syntactic ,semantic, statistical analysis) can we do useful things with diagrams? Things such as automated assessment?

Automated assessment

Automated assessment • Coursework and Examinations • Self-assessment and revision support • Grade + automated feedback • grading alone is not sufficient • directed, appropriate, focused feedback is a requirement • (multiple choice - not our concern)

Successful automated assessment: • Textual assessment (essay and short texts) • bag-of-words • bag-of-phrases • sequences (ordered-bag-of-words/phrases) • syntactic structure • abstracting and comparison (semantic-syntactic) • semantic analysis • Diagram assessment • restricted choice and "slot filling" • multiple choice • "Free" diagram assessment has not been successfully achieved

What if we assess diagrams the same way that we assess text? • What are the diagram assessment equivalents to • bag-of-words • bag-of-phrases • sequences • abstracting and comparison • syntactic structure • semantic analysis • Can we achieve automated assessment of diagrams comparable to that achieved by a human marker? • Can we provide focused feedback comparable to a human tutor?

Our initial experiment with ERDs

Feasibility experiment: pipelines • Approach: comparable to bag-of-words • Results (13 answers) • Human: Mean 2.78/5 StdDev 1.05 • Tool: Mean 2.73/5 StdDev 1.09 • Pearson correlation coefficient 0.75, (significant at the 0.01 level, two tailed), N=13

Why entity relationship diagrams? • Scope: right/wrong – interpretable • Range: small – large • Range: simple – complex • Correctness: notation – meaning • Format of question, sample solution, marking guide (and familiarity) • Interesting aggregations – m:n decomposition, relationship signatures, sub-typing ...

The question Give an E-R diagram that corresponds to the relational model given. [25] model BookGroup relation MemberNumber: MemberNumbersName: PeopleNamesAddress: AddressesIntroducedBy: MemberNumbersBorrowedBook: ISBNsBorrowedCopy: CopyNumbersprimary key Numberalternate key (BorrowedBook, BorrowedCopy) allowed null{relationship Introduces}foreign key IntroducedBy references Member not allowed null{relationship Borrows}foreign key (BorrowedBook, BorrowedCopy) references Copy …. <several relations omitted>

Solution and marking scheme: Marking scheme 1 mark for all three entities. ( zero if any more or less than three are shown) 6 marks for each relationship (6*4 = 24 marks) broken down as 1 mark for naming used in the relational model comments 1 mark for the relationship being between the right entity types 2 marks for the degree (1:1 or 1:m as per above figure – zero marks if incorrect) 1 mark for each participation condition correctly shown

On the risks of using a drawing tool: • Slot filling? • Prompting? • No segmentation or feature extraction? • Drawing "correct" diagrams because tool enforces correctness?

First results • 21 human marked answers (max. mark 25) • Human: Mean 21.29 StdDev 3.757 • Tool: Mean 22.24 StdDev 2.508 • Spearman rho correlation coefficient: 0.95 (significant at the 0.01 level, two-tailed), N=21 • Pearson correlation coefficient: 0.92 (significant at the 0.01 level, two-tailed), N=21

Simplistic? Yes - but .... • First step in our assessment of diagrams as text • comparable to bag-of-phrases processing • the pipeline experiment was bag-of-words • Essentially uses same algorithm as the marking of short answer texts • Gives us a baseline when investigating the addition of aggregation etc. • We are also aware of ... • need to investigate how to express complex marking schemes (if we need them) • the above assessment is not dependent on aggregation nor interpretation

Where next • Take what we have, add feedback and we have a revision support tool • More complex marking schemes inc. alternative solutions • Include aggregation and abstraction • ERD questions with scope for interpretation – scenario-based rather than translation based

DEAP: Diagrammatic Electronic Assessment Project • Thank you

Kevin Waugh, Neil Smith, Pete Thomas Department of Computing The Open University