Fundamental Testing Assumptions Revisited: Examination Length and Number of Options. Karine Georges & Kelly Piasentin Assessment Strategies Inc. Overview.

## Fundamental Testing Assumptions Revisited: Examination Length and Number of Options

**Fundamental Testing Assumptions Revisited: Examination**Length and Number of Options Karine Georges & Kelly Piasentin Assessment Strategies Inc.**Overview**• Credentialing organizations seek to balance many factors such as program validity and credibility with more tangible aspects such as costs and ease of development. Two such aspects are investigated: • Method to reduce the total number of test questions while retaining validity and reliability. • The effects of reducing the typical number of options from four (4) to three (3).**Part I**Examination Length: A Case Study Karine Georges, MSc.**Case Study: Certification Program**• Tasked in 2007 to determine whether 180-item, 4-hour examinations could be shortened in light of a potential move to CBT.**Validity and Examination Length**• Content Validity: The number of items on an examination must be sufficient to ensure adequate representative coverage. • Face Validity: If shortened, perceptions of stakeholders need to be considered vis-a-vis comparable professions.**Examination Length and Reliability**• What is an acceptable reliability index for credentialing? • “ A reliability correlation coefficient should fall in the high .80s or above for longer examinations (e.g., 150 or more items)”. [NOCA, 2004]. • What is the range of reliability indices for the current 180-item certification examinations? • Average : .84 • Min: .78 • Max: .92**Examination Length and Practical Considerations**If reliability is related to item length why shorten the examination? Costs and efficiency • Each item costs between $300-$1000 to develop (Vale, 2006). • Need additional items for safeguard purposes, or ancillary materials such as prep guides or readiness tests. • Client’s intention to go to CBT makes it an advantage to have shorter examinations so seat time can be reduced and more candidates accommodated within the testing period.**Research Approaches**• Two approaches: • Classical Test Theory (CTT) approach Examining reliability coefficient using Spearman-Brown formula. • Item Response Theory (IRT) approach Examining the item information function using empirical data.**CTT Results for the Two Certification Programs**Spearman Brown Formulation: Pxx= Npxx 1+ (N-1) pxx • Results show that examinations can be lowered by 20-30 questions (or about 10%) and still remain above .80.**Limitations of CTT Results**• General Limitations of Spearman Brown: • Assumption that examinations are exactly parallel • Only one value for a range of abilities • Largely impacted by cohort**IRT Approach: Item Information Curve**• Research has shown that in higher stakes examinations with Pass/Fail decisions such as certification examinations, examinations can be shortened without impacting classification abilities (Schulz & Wang, 2001) • What would be the impact if the certification examinations had 10% fewer items? • How about 25% or 50%?**IRT - Item Information Curve**• IRT models specify the probability of a discrete outcome such as a correct response to an item, in terms of person and item parameters. • Person parameter: ability of a candidate (theta) • Item parameters: a: Discrimination (slope) b: Difficulty (location) c: Guessing**IRT - Test Information Curve**• All Item Information Curves add to a Test Information Curve • Amount of information scale differs based on length of examination and quality of the items • Pass/Fail decision must be made where error is minimal (ideally where the passmark is located) and where level of ability can be clearly differentiated**IRT - Results and Implications**• The examinations can be reduced by at least 10% without significantly impacting the pass/fail decision. • Other factors to take into consideration • Number of candidates • Robustness of item bank**Other Considerations**• What about face validity? • How would an examination with 90 items be viewed by other professionals compared to a comparable examination of 180 items?**Other Certification Programs**• Review of over 75 certification programs within the same profession. • The average number of items: 164 or between 150-175 items (including experimental items) • Minimum: 100 • Maximum: 250**Summary**• Data suggest that the number of items can be reduced by 10% with minimal impact on the validity and reliability.**Part II**How Many Options is Optimal in Multiple Choice Testing? Kelly Piasentin, PhD**Multiple Choice Testing**• Most common format used in Licensure and Certification examinations • Consists of a stem (i.e., the question being asked) and a series of options to choose from (usually 4) Example: • In which state is the 2008 CLEAR conference being held? • Arkansas • Alaska • Arizona • Alabama Stem Options**Advantages of Multiple Choice**• Versatility • Efficiency • Scoring accuracy and economy • Reliability • Diagnosis • Control of difficulty • Amenable to item analysis**Disadvantages of Multiple Choice**• Time consuming to write • Difficult to create effective distracters (i.e., options that are plausible, but incorrect)**Time Spent Writing MCQs**• Sample of 75 Item Writers for 3 different licensing/certification examinations • Average time spent writing an MCQ: 52 minutes • Percentage of time spent writing:**Effort Spent Writing Distracters**Of the 75 Item Writers… • 25% reported that it was difficult to write the 1st distracter • 40% reported that it was difficult to write the 2nd distracter • 75% reported that it was difficult to write the 3rd distracter**How many options should an MCQ have?**• 4-option MCQs are widely used in standardized testing everywhere • But, are 4 options ideal? • Some IW guidelines say, “develop as many options as feasible” (Haladyna & Downing, 1989) • More recently, “develop as many functional distractors as are feasible” (Haladyna, Downing, & Rodriguez, 2002) • Increasing emphasis on the quality of distractors as opposed to the quantity**Definition of a Functional Distracter**“A functional distracter is one that has (a) a significant negative point-biserial correlation with the total test score, (b) a negative sloping item characteristic curve, and (c) a frequency of response greater than 5% for the total group.” Haladyna & Downing (1988)**How does # options impact guessing?**• With 4 options, candidates have a 25% chance of getting any one question correct by simply guessing • Probability is reduced to 20% if there are 5 options • Probability is increased to 33% if there are 3 options • BUT…. if a typical examination has 25 items, each with 3-options, chance of getting at least a 70% on the examination by pure blind guessing is 1 in 25,000 • So, do you get more bang for your buck by having more options?**Are 4-option MCQs optimal?**Factors to consider: • Time and cost it takes to develop distracters • Time it takes for candidates to complete the examination • Psychometric properties of examination • Item difficulty • Item discrimination • Test reliability (Coefficient alpha)**Arguments in favour of 3-options:**• Less time is needed to develop two plausible distracters • More 3-option items can be administered without increasing testing time • Inclusion of additional high quality items per unit of time should improve test score reliability • Having fewer options decreases the likelihood of exposing additional aspects of the domain to candidates (e.g., context clues to other questions)**Data from a Licensing/Certification Examination**• Number of MCQs: 235 • Number of candidates: 5,393 • Mean item difficulty: .721 • Mean discrimination index: .166 • Test reliability: .88 • Most chosen distracter: .167 • 2nd most chosen distracter: .077 • Least chosen distracter: .035**Reducing Examination Items to 3 Options**What would be the effect on item difficulty, discrimination and reliability of reducing the items on the examination to 3 options if the least chosen distracter was: • Attributed to correct answer? • Attributed to 2nd least chosen distracter? • Randomly distributed to each of the other 3 choices?**Reducing Examination Items to 3 Options**If least chosen attributed to correct answer: • Item difficulty: .752 • Mean discrimination index: .136 • Coefficient Alpha: .834**Reducing Examination Items to 3 Options**If least chosen attributed to 2nd least chosen distracter: • Item difficulty: .720 • Mean discrimination index: .168 • Reliability: .881**Reducing Examination Items to 3 Options**If least chosen distributed randomly to each of the other 3 choices: • Item difficulty .731 • Mean discrimination index: .158 • Reliability : .868**4 Options vs. 3 Options**• Moving from 4 options to 3 options did not have a significant impact on average item difficulty, discrimination or test reliability.**Summary**• Two primary benefits of using 3 options (as opposed to 4 options) • Faster item writing • Better testing • Better quality items • Cost savings • Shorter test time • More questions in same amount of time (potential for increased reliability)**Conclusion**• These two presentations demonstrate that you can accrue some efficiencies from reducing test length and number of response options without compromising test validity. • Further research needed to confirm findings.**Contact Information**Assessment Strategies 1400 Blair Place, Suite 210 Ottawa, ON K1J 9B8 Canada. Telephone: 613-237-0241 E-mail: www.asinc.ca • Karine Georges, MSc kgeorges@asinc.ca • Kelly Piasentin, PhD kpiasentin@asinc.ca

