1 / 45

Introduction to Test Development

Introduction to Test Development. Graham McMahon, MD, MMSc. Sarah E. Peyre, EdD Educational Research Methods Program. Learning Objectives. Understand the pros and cons to various testing questions for written examinations Learn how to determine Item difficulty and Item discrimination

marilu
Download Presentation

Introduction to Test Development

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Test Development Graham McMahon, MD, MMSc. Sarah E. Peyre, EdD Educational Research Methods Program

  2. Learning Objectives • Understand the pros and cons to various testing questions for written examinations • Learn how to determine • Item difficulty and • Item discrimination • Understand the psychometrics of a high stakes test • Validity • Reliability • Standard Setting

  3. Come to our Workshop! • Work in small groups to… • Review problematic multiple choice items • Establish validity and reliability for a test • Participate in standard setting exercise

  4. Question Types – Pros and Cons • Essay Items • Short Answer and Completion Items • Matching Items • True-False and Multiple-Choice Tests • Interviews • Portfolios ….all can be scored and can be subject to test development

  5. Stem Lead in Responses Correct response Distractors Multiple-Choice Items • An 85-year-old woman has difficulty raising her arms above her head and combing her hair. She has morning aches in her shoulders and neck. Her reflexes are symmetrical and normal. There is no muscle tenderness or joint swelling. Which one of following laboratory tests should be obtained to confirm the most likely diagnosis? • A. Anti-nuclear antibody. • B. Erythrocyte sedimentation rate. • C. Serum concentration of creatine kinase. • D. Serum concentration of angiotensin-converting enzyme. • E. Urine microscopy.

  6. Tips for writing discriminant MCQs • Be sure that each item reflects a clearly defined learning outcome • Stem • The stem of the item should be self-contained and written in clear and precise language. • Avoid ‘trigger’ words (e.g. pin-rolling tremor) • Negatives, excepts, absolutes and qualifiers in question stems are no-no’s. • Responses • All answers should be plausible and homogenous • Items need to be independent of one another • Answer choices should be similar in length and grammatical form • List answer choices in alphabetical or numerical order • Avoid ‘all of the above’ as a response • Avoid technical flaws (tense or plurality for example)

  7. Pros Useful for measuring learning outcomes at almost any level Easy to understand Easy to score Easily analyzed for effectiveness Allow broad coverage efficiently Cons Good questions Take a long time to write Are difficult to write Constrain creative responses from learners May have more than one correct answer Pros and Cons of MCQ’s

  8. Item Analysis • Qualitative: looks at whether the content matches the information, attitude, characteristic or behavior being assessed • Quantitative: • Item difficulty • Item discrimination

  9. Determining item difficulty • The percentage of participants who get that item correct • Item difficulty scores can range from 0 to 100% • Low value = high difficulty • High value = low difficulty 0 10 20 30 40 50 60 70 80 90 100

  10. Discrimination Index • The Discrimination Index distinguishes for each item between the performance of students who did well on the exam and students who did poorly. • Index of discrimination: • The difference in the % of people in one extreme group minus the % of people in the other extreme group • Item discrimination scores can range from -1.00 to +1.00 • Example • 100 test takers: 20 in top 25 were correct but only 5 in the lowest 25 students were correct. • DI = (20-5)/25 = 0.8

  11. Item Analysis Report • The left half shows percentages, the right half counts. • The correct option is indicated in parentheses. • Point Biserial is similar to the discrimination index, but is not based on fixed upper and lower groups. For each item, it compares the mean score of students who chose the correct answer to the mean score of students who chose the wrong answer. Order ID and group number percentages counts

  12. Test Validity • Validity: • The extent to which inferences made from a test are appropriate, meaningful, or useful. • Does my test measure what it is intended to measure? • Content validity • Expert review • Criterion validity – Predictive/Concurrent • Scores can be related to another known metric • Construct validity • Successfully differentiates between levels of learners

  13. Kissing Cousins • A test can not be valid until it is reliable:

  14. Test Reliability • Reliability: Measure the underlying construct consistently = trustworthiness/stability • Test-Retest Reliability • Alternate forms reliability • Internal consistency reliability (cronbach’s alpha) • Inter-rater reliability

  15. How do I set a passing grade? • Standard Setting • Norm referenced: Z-scores • Number of standard deviations below the mean • Criterion Referenced: Angoff Method • Panel of experts are asked to evaluate each item and estimate the number fraction of minimally competent students who would answer each item correctly • Ratings are averaged across the experts for each item, discussed and then summed to get panel raw cutscore

  16. Thank you!

  17. Welcome to Our Workshop on Test Development! Graham McMahon, MD, MMSc. Sarah E. Peyre, EdD Educational Research Methods The Academy at Harvard Medical School

  18. Outline • Learning Objectives • Creating MCQ Items • Item Template • Item Flaws • Tips for Success • Establishing Validity and Reliability for a Test • Mock Standard Setting

  19. Item Creation Learning Activities Objectives Evaluation • Consider beginning with the end in mind • What is it that you think the medical student should demonstrate that he/she knows or knows how to do? • This should be an objective from your lesson plan.

  20. Item Stems: Clinical Vignettes • Things to consider: • Patient description (46-year-old-female) • Functional disability (difficulty rising from a seated position, but has no difficulty flexing her legs) • The question based on this item template: • A 46-year-old-female has difficulty rising from a seated position, but has no difficulty flexing her legs. Which of the following muscles has been injured? [Objective: Identify and explain the function of the muscles in the…. ]

  21. Item Creation Lead-in:The most likely diagnosis is Options: disorders, diseases Objective: Describe the signs and symptoms of X. Compare and contrast the signs and symptoms of XY and Z. Lead-in: Which of the following additional symptoms would you expect to be present? Options: symptoms Objective: same as above Lead-in: The most likely cause is Options: bacteria, toxins, medications, metabolic defects Objective: List and explain the causes of X. Lead-in: The most likely mechanism is Options: disease mechanisms, pharmacologic mechanisms Objective: Diagram and explain the mechanism of drug X.

  22. Item Templates • Other considerations: • Age, gender, race, ethnicity • Site of care (ER, office visit) • Presenting complaint • presents for a routine physical exam • presents with a headache • Duration • Patient history, family history • There is no history of… • He has a history of… • Physical findings • Lab values, imaging studies, pathology reports • Treatment, subsequent findings

  23. Item Creation • Add the lead-in (question) and the options • Which of the following pulmonary variables is most likely to be lower than normal in this patient? A. Alveolar-arterial PO2 difference B. Compliance of the lung C. Oncotic pressure of the alveolar fluid D. Work of breathing E. Residual volume

  24. Item Creation: Taking Recall up to Another Level Recall question: What area is supplied with blood by the posterior inferior cerebral artery? [Objective: Identify the areas of the brain supplied by the major cerebral arteries.]

  25. Item Creation: Taking Recall up to Another Level Application question: A 62-year-old man develops left-sided limb ataxia, Horner’s syndrome, nystagmus and loss of facial pain and temperature. Which artery is most likely to be occluded? [Objective: Differentiate the signs and symptoms that would occur upon occlusion of each of the major cerebral arteries.]

  26. Your Turn!Review the distributed questions and identify strengths and weaknesses in each.

  27. Question • Acute intermittent porphyria is the result of a defect in the biosynthetic pathway for • A. collagen • B. corticosteroid • C. fatty acid • D. glucose • E. heme

  28. Rewritten…. • An otherwise healthy 33-year-old male has mild weakness and occasional episodes of steady, severe abdominal pain with some cramping but no diarrhea. One aunt and a cousin have had similar episodes. During an episode, his abdomen is distended, and bowel sounds are decreased. Neurological examination shows mild weakness in the upper arms. These findings suggest a defect in the biosynthetic pathway for: • A. collagen • B. corticosteroid • C. fatty acid • D. glucose • E. heme

  29. Question A 52-year-old male presents to the office with a one-week history of flank pain and hematuria. Past medical history is unremarkable. Physical examination reveals a left-sided abdominal mass. The greatest risk factor for renal cell carcinoma is A. diabetes B. female gender C. hyperlipidemia D. low body mass index E. smoking

  30. Question Which of the following is a correct statement about cystic fibrosis (CF)? A. The incidence of CF is 1:2000. B. Children with CF usually die in their teens. C. Males with CF are sterile. D. CF is an autosomal recessive disease. E. Symptoms of CF only appear in infancy. What other flaws can you detect in this question?

  31. Item Flaws: Unfocused items Which of the following is correct regarding [topic]? There is not enough information in the stem to answer the question without looking at the options. The responses are disparate. The distractors have to be 100% false. Thus, the question basically becomes a true/false question. Avoid these!

  32. A 45-year-old man comes to the physician because of a 6 week history of a non-productive cough. An X-ray film of the chest shows a 0.8 cm well circumscribed peripheral nodule in the right lung. Biopsy shows a necrotizing granuloma. Which of the following is the most likely diagnosis? Pulmonary embolus Small cell carcinoma Pseudomonas aeruginosa infection Histoplasma capsulatum Herpes pneumonitis Metastatic renal cell carcinoma

  33. A healthy 57-year-old woman comes to the physician because of 2 cm mass in her right breast. Biopsy reveals an invasive ductal carcinoma. Which of the following is the most important prognostic factor? High grade tumor cytology Infiltrative nature of tumor into benign breast Numerous mitotic figures Amount of tumor fibrosis Presence of Lymph node metastasis Number of plasma cells in tumor

  34. A 63-year-old man comes to the physician because of a 6-week history of progressive dyspnea on exertion, orthopnea, and ankle edema. He has received multiagent chemotherapy for Waldenström’s macroglobulinemia for the past year. Urinalysis shows proteinuria. A bone marrow biopsy shows a partial response to therapy with ongoing marrow involvement still identified. Which of the following is the most likely diagnosis? Cardiac amyloidosis Viral myocarditis Cardiac sarcoidosis Myocardial infarct Hypertrophic cardiomyopathy

  35. A question submitted In aortic stenosis what other abnormal heart sounds might accompany the resulting murmur? • Physiological splitting of S2 • An accentuated  S2 • Paradoxical splitting of S2 • A muffled S2

  36. Revised question A 60 year old patient with an active lifestyle is found to have a systolic murmur on a routine physical exam. He currently has no symptoms. If this were aortic stenosis, what other abnormal heart sounds might accompany the systolic murmur? A.) Physiological splitting of S2 B.) An accentuated S2 C.) Paradoxical splitting of S2 D.) A muffled S2

  37. Determining item difficulty The percentage of participants who get that item correct Item difficulty scores can range from 0 to 100% Low value = high difficulty High value = low difficulty 0 10 20 30 40 50 60 70 80 90 100

  38. Discrimination Index Index of discrimination: The difference in the % of people in one extreme group minus the % of people in the other extreme group Item discrimination scores can range from -1.00 to +1.00 Example 100 test takers: 20 in top 25 were correct but only 5 in the lowest 25 students were correct. DI = (20-5)/25 = 0.8 • The Discrimination Index distinguishes for each item between the performance of students who did well on the exam and students who did poorly.

  39. Item Analysis Report • The left half shows percentages, the right half counts. • The correct option is indicated in parentheses. • Point Biserial is similar to the discrimination index, but is not based on fixed upper and lower groups. For each item, it compares the mean score of students who chose the correct answer to the mean score of students who chose the wrong answer. Order ID and group number percentages counts

  40. Summary • Utilize action verbs to write objectives • Write your exam items based on the objectives • Tie the clinical vignette to the lead-in • Choose appropriate options with one best answer • Avoid technical flaws • Utilize an item checklist to ensure that you have done all you can to write the best items possible. • Pretest your items

  41. Establishing Validity and Reliability (Groups)

  42. Standard Setting (Groups)

  43. Graham McMahon gmcmahon@partners.org

  44. Item Discrimination: Examples 0.7 0.1 1 0 0 -0.4 Number of students per group = 100

  45. Distracter Analysis: Examples (*) marks the correct answer.

More Related