A Taxonomy of Adaptive Testing

A Taxonomy of Adaptive Testing Robert J. Mislevy Measurement, Statistics & Evaluation University of Maryland in collaboration with Presented at the Fifth Annual Technology for Second Language Learning Conference, September 21-22, 2007, Iowa State University, Ames, Iowa, USA

Terminology & Concepts for Adaptive Testing • Adaptive testing • Most familiar as item response-theory based computer-adaptive testing (IRT-CAT) • Can take a broader perspective of evidentiary reasoning • We will look at the interplay among inferences and data gathering • A taxonomy of configurations • IRT-CAT plus many others

Taxonomy based on three dimensions … • Claim status • Observation status • Locus of control

Background for the dimensions • Glenn Shafer’s “Frame of discernment” • Evidence–centered assessment design

“Frame of discernment” • From Shafer’s (1976) A mathematical theory of evidence. • It’s all the possible combinations of values of the variables your are working with. • “Frame” emphasizes how it effectively circumscribes a universe in which inference will take place • “Discern” = “detect, recognize, distinguish” • Property of you as much as property of world • Depends on what you know and what your purpose is

“Frame of discernment” Frames of discernment can evolve over time, • as beliefs, knowledge, and aims unfold over time. • E.g., dip for the party? medical diagnosis Move from one frame of discernment to another by • ascertaining values of some variables, dropping others, • adding new variables or refining current ones • constructing a different frame when observations cause rethinking of assumptions or goals

Evidence-Centered Design • Mislevy, Steinberg, & Almond (2003) “On the structure of educational assessments.” • Educational assessment as evidentiary argument: We reason from the things students say, do, or make in a handful of particular settings, to what they know, can do in various situations, or have accomplished, as more broadly construed. • All elements of an assessment, from analysis of domain, through design, to operation, are based on building then embodying such an argument in operational procedures.

Toulmin’s Argument Structure Claim unless Alternative explanation since Warrant so Backing Data

An Assessment Design Argument Information pertinent to addressing the claims is accumulated in terms of student-model variables (SMVs) Claim about student in some frame of discernent Aspects of performance that bear on claims is captured in terms of observable variables (OVs) Formative assessments often have highly specific claims, summative assessments tend to have broader claims. Warrant so What aspects of the situation are important for the possibility of inference about examinee? Data concerning performance Data concerning situation Backing Student acting in assessment situation What we actually see/hear the student say, do, or make

5. Somebody has choice about whether to refocus claim 4. Update belief about claim 3. Evaluation of performance in light of current targeted claim 2. Examinee acts Adaptive Testing Claim about student in some frame of discernent Warrant 1. Somebody selects situation for getting information so Data concerning performance Data concerning situation Backing Student acting in assessment situation

What is an adaptive test? • At a given time in an assessment system, The set of student-model variables and observable variables consitutes a frame of discernment. • An adaptive test is one in which the frame of discernment changes over time as a function of the values of observations. • Ways it might change are the basis of the taxonomy.

Claim Status Is the claim part of the frame of discernment, i.e., SMVs, fixed or evolving? • i.e., do the SMVs at issue stay the same or change (as opposed to knowledge about SMVs)?

Observation status Is the data part of the frame of discernment, i.e., OVs, fixed or evolving? • i.e., does the choice of OVs that can be made stay the same or change as more information is obtained?

Locus of Control If the claim part of the frame is changing as the test procedes, who decides how it should change: The examiner or the examinee? If the data part of the frame is changing as the test procedes, who decides how it should change: The examiner or the examinee?

“User friendly” testing

Guided / diagnostic

Self-guided / diagnostic

Cell 1: Fixed, examiner-controlled claim; Fixed, examiner-controlled observation Traditional assessments in which … • Same kind of claim(s) / inferences / SMVs for everyone • they were decided on by the examiner a priori, • tasks presented are determined by the examiner a priori, • the examiner determines the sequence of tasks a priori Neither the frame of discernment nor the gathering of evidence varies in response to values of observable variables or their impact on beliefs about SMVs.

Cell 2: Fixed, examiner-controlled claim; Adaptive, examiner-controlled observation • Same claims space (SMVs) for everyone • the claims (SMVs) were decided on by the examiner, • the tasks presented are determined by examiner a priori, But in light of unfolding pattern responses, examiner selects items, to maximize accuracy • IRT-CAT (Can be multivariate; Segall, 1996). • Binet’s original individually-administered intelligence test • Lord’s Flexi-level scheme

Cell 3: Fixed, examiner-controlled claim; Adaptive, examinee-controlled observation • Same claims space (SMVs) for everyone • the claims (SMVs) were decided on by the examiner. But examinee is able to determine tasks in light of how he/she chooses. “User friendly” • Pole-vaulting competition • Self-adaptive SAT (Wise et al, 1992): Student chooses items by page or bin, grouped by difficulty. IRT scoring takes difficulty into account. (also see Wright, 1977) • Guard against nonignorable missingness (free throws)

Cell 4: Adaptive, examiner-controlled claim; fixed, examiner-controlled observation • Same tasks (OVs) for everyone • Same presentation of tasks, determined a priori by examiner. But examiner determines claims (SMVs) for examinee in light of responses.E.g., • MMPI – same 100’s of items for everyone, but examiner may compute different scales for different patients. • Diagnostic “reading record” test in language testing Note: Need multidimensional claim space in Cells 4-9.

Cell 5: Adaptive, examiner-controlled claim; adaptive, examiner-controlled observation • Claims may diverge for different examinees in light of data • Different tasks for different examinees, to be optimal in light of the claims examiner wants to make about them as individuals E.g., • Triage in medicine, followed by different diagnostics • Adaptive MMPI – different items for everyone, adaptively selected for different scales for scales for different patients. • Differential strategies in math (Tatsuoka) • Adaptive diagnosis in language testing

Cell 6: Adaptive, examiner-controlled claim; adaptive, examinee-controlled observation • Examiners can home in on different claims for different examinees in light of data, but • Examinees have at least some control over task selection. E.g., • Self-adaptive tests, but along dimensions controlled by examiner. Mulivariate SA-SAT, examiner’s inferences. • Diagnostic / placement tests, homing in on different remedial needs of students, but allowing for lower-stress choices of groups/pages of tasks like in Cell 3. Thus examiner tailors claims part of frame of discernment, examinee tailors overvations part given claims.

Cell 7: Adaptive, examinee-controlled claims; fixed, examiner-controlled observations • Examinees all take same examiner-determined items in examiner-determined way, but … • Examinees can home in on different claims of their choosing in light of data. E.g., • MMPI, but examinee determines which scales to compute & analyze. • Oral reading of a fixed sample, automated parsing—student determines what to work on next (maybe could be done with Ordinate-like setup?)

Cell 8: Adaptive, examinee-controlled claims; adaptive, examiner-controlled observations • Examinee chooses the claim, at beginning or adaptively, • examiner controls tasks presentation for optimal precision. E.g., structured self-diagnosis: • MMPI, where examinee determines which scales to focus on and is presented items adaptively for those scales. • Oral readings w. automated parsing—student determines what to work on next, then examiner-selected samples to focus on what examinee wants to follow up on. • SIGI: Sequential exploration of career interests -- examinee chooses categories and system asks adaptive questions.

Cell 9: Adaptive, examinee-controlled claims; adaptive, examinee-controlled observations • Examinees control both the claims and the tasks to yield observations for those claims. • The examinee selects the claims to focus on and then has input into what data will be observed. • Feedback from system to help examinee figure out what they want to know, then offer them choices about directions to go to refine information they receive (continued)

Cell 9, continued: Adaptive, examinee-controlled claims; adaptive, examinee-controlled observations E.g., guided self-diagnosis: • Central challenge in retrieval systems in libraries -- organize materials and search terms to help patrons find the information they might want • Amazon: “Customers who looked at these books you selected also looked at…” • Multivariate SA-SAT practice exploration space • Language testing self-diagnosis: Start with common passage or list of areas, do diagnostics, use results to refine testing for areas you are interested in.

Conclusion • Assessments involving adaptive claims have yet to achieve the prominence of adaptive-observation assessments. • History, up-front work, solving known “centralized” problems • User-controlled assessment not seen as assessment • User modeling literature will be important • Cells 8 & 9 good for self-directed learning in a supported environment • Like user-modeling strategies for buying cars, choosing movies, finding information in library systems.

A Taxonomy of Adaptive Testing