1 / 48

The Science and Art of Exam Development

The Science and Art of Exam Development. Paul E. Jones, PhD Thomson Prometric. What is validity and how do I know if my test has it?. Validity.

powa
Download Presentation

The Science and Art of Exam Development

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Science and Art of Exam Development Paul E. Jones, PhD Thomson Prometric

  2. What is validity and how do I know if my test has it?

  3. Validity “Validity refers to the degree to which evidence and theory support the interpretations of test scores entailed by the proposed uses of tests. Validity is, therefore, the most fundamental considerations in developing and evaluating tests.” (APA Standards, 1999, p. 9)

  4. A test may yield valid judgments about people… • If it measures the domain it was defined to measure. • If the test items have good measurement properties. • If the test scores and the pass/fail decisions are reliable. • If alternate forms of the test are on the same scale. • If you apply defensible judgment criteria. • if you allow enough time for competent (but not necessarily speedy) candidates to take the test. • If it is presented to the candidate in a standardized fashion, without environmental distractions. • If the test taker is not cheating and the test has not deteriorated.

  5. Is this a Valid Test? 1. 4 - 3 = _____ 6. 3 - 2 = _____ 2. 9 - 2 = _____ 7. 8 - 7 = _____ 3. 4 - 4 = _____ 8. 9 - 5 = _____ 4. 7 - 6 = _____ 9. 6 - 2 = _____ 5. 5 - 1 = _____ 10. 8 - 3 = _____

  6. The Validity = Technical Quality of the Testing System Design Item Bank

  7. Doc Doc Doc Doc Doc Doc Doc The Validity Argument is Part of the Testing System Design Item Bank

  8. How should I start a new testing initiative?

  9. A Testing System Begins with Design Design Item Bank

  10. Test Design Begins with Test Definition • Test Title • Credential Name • Test Purpose (“This test will certify that the successful candidate has important knowledge and skills necessary to…” ) • Intended Audience • Candidate Preparation • High-Level Knowledge and Skills Covered • Products or Technologies Addressed • Knowledge and Skills Assumed but Not Tested • Knowledge and Skills Related to the Test but Not Tested • Borderline Candidate Description • Testing Methods • Test Organization • Test Stakeholders • Other Information

  11. Test Definition Begins with Program Design

  12. Test Objective Test Definition Leads to Practice Analysis

  13. Practice Analysis Leads to Test Objectives

  14. Test Objectives are Embedded in a Blueprint

  15. Once I have a blueprint, how do I develop appropriate exam items?

  16. The Testing System Design Item Bank

  17. Creating Items Content Characteristics Response Modes Choose one Content Options Choose Many Text Graphics Audio Video Simulations Applications Item Single M/C Multiple M/C Single P&C Multiple P&C Drag & Drop Brief FR Essay FR Simulation/App Scoring

  18. Desirable Measurement Properties of Items • Item-objective linkage • Appropriate difficulty • Discrimination • Interpretability

  19. Item-Objective Linkage

  20. Good Item Development Practices • SME writers in a social environment • Industry-accepted item writing principles • Item banking tool • Mentoring • Rapid editing • Group technical reviews

  21. How can I gather and use data to develop an item bank?

  22. The Testing System Design Item Bank

  23. Classical Item Analysis: Difficulty and Discrimination

  24. Classical Option Analysis: Good Item n proportion discrim Q1 Q2 Q3 Q4 Q5 >

  25. n proportion discrim Q1 Q2 Q3 Q4 Q5 > Classical Option Analysis: Problem Item

  26. a=0.6 b=-1.5 c=0.4 a=1.2 b=-0.5 c=0.1 a=1.0 b=1.0 c=0.25 IRT Item Analysis: Difficulty and Discrimination

  27. Good IRT Model Fit

  28. How can I assemble test forms from my item bank?

  29. The Testing System Design Item Bank

  30. Reliability “Reliability refers to the degree to which test scores are free from errors of measurement.” (APA Standards, 1985, p. 19)

  31. More Reliable Test

  32. Less Reliable Test

  33. How to Enhance Reliability When Assembling Test Forms • Score reliability/generalizability • Select items with good measurement properties. • Present enough items. • Target items at candidate ability level. • Sample items consistently from across the content domain (use a clearly-defined test blueprint). • Score dependability • Same as above. • Minimize differences in test difficulty. • Pass-Fail consistency • Select enough items. • Target items at the cut score. • Maintain same score distribution shape between forms

  34. Building Simultaneous Parallel Forms Using Classical Theory

  35. Building Simultaneous Parallel Forms Using IRT

  36. Setting Cut Scores Why not just set the cut score at 75% correct?

  37. Setting Cut Scores Why not just set the cut score so that 80% of the candidates pass?

  38. The logic of criterion-based cut score setting • Certain knowledge and skills are necessary for practice. • The test measures an important subset of these knowledge and skills, and thus readiness for practice. • The passing [cut] score is such that those who pass have a high enough level of mastery of the KSJs to be ready for practice [at the level defined in the test definition], while those who fail do not. (Kane, Crooks, and Cohen, 1997)

  39. The Main Goal in Setting Cut Scores Meeting the “Goldilocks Criteria” “We want the passing score to be neither too high nor too low, but at least approximately, just right.” Kane, Crooks, and Cohen, 1997, p. 8

  40. Two General Approaches to Setting Cut Scores • Test-Centered Approaches:Modified Angoff • Bookmark • Examinee-Centered Approaches:Borderline • Contrasting Groups

  41. The Testing System Design Item Bank

  42. What should I consider as I manage my testing system?

  43. Security of a Testing System Design • Write more items!!! • Create authentic items. • Use isomorphs. • Use Automated Item Generation. • Use secure banking software and connectivity • Use in-person development Item Bank

  44. Security of a Testing System Design • Establish prerequisite qualifications. • Use narrow testing windows. • Establish test/retest restrictions. • Use identity verification and biometrics. • Require test takers to sign NDAs. • Monitor test takers on site. • Intervene if cheating is detected. • Monitor individual test center performance. • Track suspicious test takers over time. Item Bank

  45. Security of a Testing System • Perform frequent detailed psychometric review. • Restrict the use of items and test forms. • Analyze response times. • Perform DRIFT analyses. • Calibrate items efficiently. Design Item Bank

  46. Item Parameter Drift

  47. Security of a Testing System Design Item Bank • Many unique fixed forms • Linear on-the-Fly testing (LOFT) • Computerized adaptive testing (CAT) • Computerized mastery testing (CMT) • Multi-staged testing (MST)

  48. Item Analysis Activity

More Related