Technical Considerations in Alignment for Computerized Adaptive Testing

Technical Considerations in Alignment for Computerized Adaptive Testing Liru Zhang, Delaware DOE Shudong Wang, NWEA 2014 CCSSO NCSA New Orleans, LA June 25-27, 2014

Background of Alignment • Alignment is an important attribute in the standards-based educational reform. Peer Review requires states to submit evidence demonstrating alignment between the assessment and its content standards (Standards and Assessments Peer Review Guidance, 2009) in addition to validity evidence. • For the Next Generation Assessment, alignment is defined as the degree to which expectations specified in the Common Core State Standards and the assessment are in agreement and serve in conjunction with one another to guide the system toward students learning what they are suppose to know and be able to do. • With new and innovative technology and the great potential of online tests, computerized adaptive testing (CAT) has been increasingly implemented in K-12 assessment systems.

Alignment for Linear Tests Different approaches have been employed in the past decade for evaluating the alignment of state assessment programs. Their process is similar in four ways. They are: (1) use approved content standards and implemented tests; (2) review item-by-item from a developed linear test form; (3) use professional judgments for alignment; and (4) evaluate the degree that each item matches the claim, standard, objective and/or topic, and performance expectations (e.g., cognitive complexity and depth of knowledge) in the standards.

Linear Test vs. CAT Linear tests Adaptive tests Many unique test forms are assembled during testing for individual students. Test difficulty for unique test forms vary greatly to match estimated student ability level. How to determine the degree of alignment for adaptive testing? • Test form(s) are assembled prior to operation based on the test specifications. • The fixed form(s) are used for all students. • Linear tests normally target at a medium difficulty level. • The degree of alignment is determined based on the content of test form(s) compared with the predetermined criteria.

Alignment for CAT In adaptive testing, the item pool is the source for assembling test forms for individual students, therefore the alignment should be considered as the relationship between the item pool and the standards that intended to measure. With the adaptive nature, technical issues must be considered in the design, process, and evaluation in alignment for CAT. • What kinds of technical issues should be considered? • Should the alignment be based on the item pool or based on a sample of items? • How should a representative sample of items be selected so that it can be used to make for a fair inference about the item pool? • Is it appropriate to use the same procedure and criteria for linear tests to evaluate the alignment for an adaptive test?

Technical Considerations To demonstrate the possible impact, such as the characteristics of the item pool, student proficiency levels, and item exposure rate, on the alignment results, samples of items and individual test forms were selected from the grade 3 mathematics assessment. Step One: Seven item samples (50 each) were selected from the pool: • Two random samples • A weighted sample based on the four content strains and their relative proportions in the pool • Two samples selected based on the frequency distributions of person- and item-parameters as typical items • Two samples based on item exposure rate by excluding never-used items

Sample Selection

Test Specifications and Constraints

Comparison 1: Content Balance

Comparison 2: Balance of Item Type

Comparison 3: Item Difficulty Distribution

Comparison 3: Item-Parameter Distribution

Comparison 4: Item Exposure Rate

Technical Considerations 2 Step Two: Fourteen individual test forms were selected based on the frequency distributions of estimated student ability (theta) from the grade 3 mathematics.

Comparisons 1: Content and Item Types

Comparison 2: Item Exposure Rate

Comparison 3: Use of Off-Grade Items

Comparison 4: Content Balance of Item Pool

Comparison 5: Item-Parameter in Item Pool

Mapping of Person- and Item-Paramters

Technical Issues in Alignment for CAT • In adaptive testing, each successive item is chosen by a set of constraints to maximize information on estimated ability or to minimize the deviation of the information from a target value. • Among the many particular requirements for CAT, a sizeable and well-balanced item pool with regard to content and psychometrics characteristics is a fundamental condition for success. • To realize many advantages in CAT, the item pool must contain high-quality items that match the criteria of the item selection algorithm for many different levels of proficiency to provide adequate information. • In addition to content constraints, constraint on item exposure rate is essential for item utility and test security.

Technical Issues in Alignment for CAT • The deficit and excess of items for certain assessed content standard(s) consequently created the issue of under- and over-exposure of test items in the pool. • Under-exposed items are often overrepresented in the pool or lack desirable characteristics to meet these constraints for item selection on the test. The presence of unused items in the pool is an unfortunate waste of resources. Over-exposed items not only jeopardize test security, but may result in positive bias of ability estimation, which seriously threatens the validity of high-stakes assessments. • In K-12 assessments, the large populations, wide range of proficiency levels among students, broader content coverage, and its high-stakes nature introduces tremendous technical challenges in the development and implementation of CAT.

Criteria Commonly Used in Alignment

A Brief Summary 1. Technical issues must be taken into account in the design, process for item review, and evaluation of alignment for computerized adaptive testing, such as test specifications, algorithm and constraints for item selection, item exposure rate. 2. No matter the alignment is based on the entire item pool or based on a sample of items, appropriate reference about the alignment of an adaptive test must be supported with evidence from both content/curriculum and technical perspectives 3. The criteria that are commonly used in evaluating the alignment for linear tests may not be suitable for evaluating the alignment for computerized adaptive tests.

Technical Considerations in Alignment for Computerized Adaptive Testing

Technical Considerations in Alignment for Computerized Adaptive Testing

Presentation Transcript

Authoring environments for adaptive testing

Considerations for Technical Assistance

Evaluating the Technical Quality of Computerized Adaptive Tests

Computerized Adaptive Testing: What is it and How Does it Work?

Confirmatory Testing Considerations

Computer Adaptive Testing

Technical considerations

Making Computerized Adaptive Testing Diagnostic Tools for Schools

A New Stopping Rule for Computerized Adaptive Testing

The Influence of Item calibration Error on variable-Length Computerized Adaptive testing

Security Considerations in Adaptive Middleware

Considerations for Testing GLONASS

Considerations for Testing GLONASS

“ Computerized Mastery Testing A Testing Architecture ”

Testing Considerations

Considerations for Technical Assistance

Quality Attributes for Technical Testing

Classical and Bayesian Computerized Adaptive Testing Algorithms

Amalgam Technical considerations

Technical and Economic Considerations In

Considerations for Testing GLONASS

Technical Considerations for Successful SAP Carve