Dr. Mark Gierl, Professor and Canada Research Chair Centre for Research in Applied Measurement and Evaluation Universit

“How You Can Learn To Love Large-Scale Assessment: Let Me Count the Ways” An Outline For Our Future At The University of Alberta Dr. Mark Gierl, Professor and Canada Research Chair Centre for Research in Applied Measurement and Evaluation University of Alberta Presentation at the Centre for Teaching and Learning (CTL) “Teaching Big” Symposium University of Alberta—August, 2012

TO BEGIN… • Educational measurement is a discipline and a profession focused on the use of methodologies for assigning test scores to examinees, typically on a numeric scale, so we can make inferences about their knowledge, skills, and competencies • Once a static and largely quantitatively-driven field, recent developments in the learning sciences, mathematical statistics, computer technology, educational psychology, and computing science are creating profound changes in educational measurement—as a result, our contemporary assessments barely resemble their predecessors of decade ago

OVERVIEW • BACKGROUND • Measurement, Evaluation, and Cognition (MEC) Program in the Department of Educational Psychology • Centre for Research in Applied Measurement and Evaluation (CRAME) • PRESENTATION • Four principles of testing in large classrooms • Two applications for putting principles into practice • Plea for our collective future • My presentation today will have four key messages

OVERVIEW • Measurement, Evaluation, and Cognition (MEC) is 1 or 8 areas in the Department of Educational Psychology • Graduate students (16 currently) who receive an MEd or PhD in MEC specialize in educational measurement, statistics, research methods, cognition applied to assessment, and/or program evaluation • Our graduates work in the private sector at testing companies like the Educational Testing Service (ETS) or in the public sector for different agencies (e.g., Alberta Education; Medical Council of Canada) • MEC has five faculty members: Drs. Mark Gierl, Jacqueline Leighton, Ying Cui, Cheryl Poth, and Sharla King • The Centre for Research in Applied Measurement and Evaluation (CRAME) is a centre within MEC focused on conducting research in the areas of educational measurement, cognitive psychology , and statistics with the goal of making assessment an integral part of learning and instruction

OVERVIEW MESSAGE #1: Educational measurement is a specialized discipline where you can earn a graduate degree at both the MEd and PhD levels—this indicates that testing is embedded in a discipline that requires rigorous and comprehensive training MESSAGE #2: You have colleagues at the University of Alberta who actually love to talk about tests and who train graduate students who also like and excel in our discipline [resources exist on campus]

“TESTING TIPS BY MARK”

OUR FOUR PRINCIPLES PRINCIPLE #1: We will shift from infrequent summative assessments (e.g., 2 midterms + final) to more frequent formative assessment (e.g., 8-10 exams or more per term) PRINCIPLE #2: Testing on-demand is required where students can write exams at any time and at any location PRINCIPLE #3: Assessments will be scored immediately and students will receive both instant and detailed feedback on their overall performance as well as their problem-solving strengths and weaknesses PRINCIPLE #4: You will spend less time and less effort implementing these principles in your large classes compared to the amount of time you currently spend on assessment-related activities—in fact, much less

COMPUTED-BASED TESTING APPLICATION #1: COMPUTER-BASED TESTING

PAPER-BASED TESTING Test Development Test Administration Test Reporting

COMPUTED-BASED TESTING

COMPUTED-BASED TESTING AUTOMATED

COMPUTED-BASED TESTING

COMPUTED-BASED TESTING • In short, computer-based testing is a very good thing and it is here to stay—computer-based testing either eliminates or automates 2/3 of the testing activities that, currently, you do manually • Admittedly, we are focusing on examples that use objectively-scored assessment items—but examples can also be cited for automated essay scoring of student-produced assessment tasks • The architecture for a computer-based testing system is feasible [PAPER –BASED TESTING IS DEAD] MESSAGE #3: The University of Alberta needs a computer-based testing system because YOU need this system for all of your classes, big and small

COMPUTED-BASED TESTING Test Development Test Administration *ELIMINATED* Test Reporting *AUTOMATED*

AUTOMATIC ITEM GENERATION APPLICATION #2: AUTOMATIC ITEM GENERATION

ONE WAY TO CREATE TEST ITEMS… Professor writing test items the day before the midterm exam…

AUTOMATIC ITEM GENERATION • Another way to address this item development challenge is with automatic item generation (AIG) • Automatic item generation is the process of using item models to generate test items with the aid of computer technology—with this approach, hundreds or even thousands of items can be generated with a single item model • While the idea of automatic item generation may be viewed as a “dream come true” —I am here to tell you that the dream is well within our reach because of developments in modern educational measurement theory

A 54-year-old woman has a laparoscopic cholecystectomy. On post-operative day 3 she has a temperature of 38.5c. Physical examination reveal a red and tender wound and calf tenderness. Which one of the following is the best next step? • a. Mobilize • b. Antibiotics • c. Anti coagulation • d. Reopen the wound

AUTOMATIC ITEM GENERATION

AUTOMATIC ITEM GENERATION • That ugly diagram is a cognitive model highlighting the knowledge, skills, and content required to make a medical diagnosis • The model includes three key outcomes: • Identify THE PROBLEM (i.e., Post-Operative Fever); • Specify Sources of information required to diagnose the problem (e.g., Type of Surgery); and • 3. DescribeKEY features within each information source (e.g., Guarding and Rebound) needed to create different instances of the problem

AUTOMATIC ITEM GENERATION

AUTOMATIC ITEM GENERATION • Next, an item models is created, where an item model is like a template or a mould of the assessment task (i.e., it’s a target where we want to place the content in the test item) A 54-year-old woman has a <TYPE OF SURGERY>. On post-operative day <Timing of Fever> the patient has a temperature of 38.5c. Physical examination reveal <Physical Examination>. Which one of the following is the best next step?

AUTOMATIC ITEM GENERATION • Finally, we combine this information systematically to produce new items • To accomplish this complex combinatoric task, we created software for item generation called IGOR (Item GeneratOR) • IGOR was programmed using JAVA

AUTOMATIC ITEM GENERATION • When we used our method with 5 different item models developed for the MCC QE Part I in surgery, more than 20,000 items were generated: • Item Model 1: Gallstones—288 • Item Model 2: Hernias—256 • Item Model 3: Aneurism—5,184 • Item Model 4: Post Operation Management—7,488 • Item Model 5: Post Operation Fever—7,680 • We have also developed item models at the K-12 levels in Language Arts, Social, Science, Math as well as AP Biology and Architecture in addition to 10 different content areas in Medicine producing millions of test items

AUTOMATIC ITEM GENERATION 16. A 60-year-old woman has been booked for a laparoscopic cholecystectomy for symptomatic gallstones. Prior to her surgery, she presents to the Emergency Department with a history of feeling faint and unwell. She has had rigors. On physical examination, her temperature is 40 C. Her white blood count is 22 x 109/L; aspartate aminotransferase 63 U/L; alanine aminotransferase 78 U/L; alkaline phosphatase 450 U/L; amylase level 200 U/L and bilirubin 50 µmol/L. Which one of the following is the most likely diagnosis? (a) Cholecystitis. (b) Cholangitis. (c) Pancreatitis. (d) Hepatic abscess. (e) Duodenal ulcer. 39. An obese 61-year-old male collapsed with sudden pain at a shopping center and is brought to hospital by ambulance. He is diaphoretic. His pulse is 96/minute; blood pressure 100/70 mm Hg; he complains of severe pain in his abdomen and left flank. Which one of the following is the most likely diagnosis? (a) Acute hemorrhagic pancreatitis. (b) Ruptured aortic aneurysm. (c) Mesenteric vascular occlusion. (d) Acute diverticulitis. (e) Volvulus of sigmoid colon.

CONCLUSION • Educational measurement is a specialized discipline requiring advanced graduate training—this implies that assessment contains many complex and thorny issues but please remember that you have colleagues on-campus who can help you deal with these issues • Our discipline is undergoing profound changes that will yield much better methods for evaluating students while at the same time requiring less time and effort for the examiner because much of the unpleasant work is being automated—computer-based testing and automatic item generation are but two examples from a list of many MESSAGE #4: There is no going back to the “good old days”…therefore, we must work together to structure our future at the University of Alberta by building and implementing these new assessment systems…but also recognize that this work is just getting started

THANK YOU Dr. Mark J. Gierl (mark.gierl@ualberta.ca) 6-110 Education Centre North

Dr. Mark Gierl, Professor and Canada Research Chair Centre for Research in Applied Measurement and Evaluation Universit

Dr. Mark Gierl, Professor and Canada Research Chair Centre for Research in Applied Measurement and Evaluation Universit

Presentation Transcript

Dr Debbie Faulkner and Professor Andrew Beer Southern Research Centre AHURI

Centre for Applied Research and Technology

Terry Anderson, Professor, Canada Research Chair in Distance Education

Centre for Applied Internet Research www.cair-uk.org

ARCC Canadian Centre for Applied Research in Cancer Control

Dr Kate Pangbourne and Dr Mark Beecroft Centre for Transport Research , University of Aberdeen

Centre for Human Services Research and Evaluation

Gary Miron, Professor of Evaluation, Measurement, and Research Western Michigan University

Centre for Research and Evaluation, Sheffield Hallam University and

Canada Research Chair in Citizenship and Governance

Canada Research Chair in Citizenship and Governance

Canada Research Chair in Food Microbiology and Probiotics

Canada Research Chair in Food Microbiology and Probiotics

Centre for Applied Internet Research cair-uk

Centre for Applied Internet Research cair-uk

Research and Evaluation

Research and Evaluation

Kent T. HayGlass, Ph.D. Canada Research Chair in Immune Regulation Professor and Head,

Dr. Yuri Sinyagin Professor and Director of the Research and Evaluation Centre

Dr. Mark Gierl, Professor and Canada Research Chair