1 / 24

Technical Considerations in Alignment for Computerized Adaptive Testing

Technical Considerations in Alignment for Computerized Adaptive Testing. Liru Zhang, Delaware DOE Shudong Wang, NWEA 2014 CCSSO NCSA New Orleans, LA June 25-27, 2014. Background of Alignment.

Download Presentation

Technical Considerations in Alignment for Computerized Adaptive Testing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Technical Considerations in Alignment for Computerized Adaptive Testing Liru Zhang, Delaware DOE Shudong Wang, NWEA 2014 CCSSO NCSA New Orleans, LA June 25-27, 2014

  2. Background of Alignment • Alignment is an important attribute in the standards-based educational reform. Peer Review requires states to submit evidence demonstrating alignment between the assessment and its content standards (Standards and Assessments Peer Review Guidance, 2009) in addition to validity evidence. • For the Next Generation Assessment, alignment is defined as the degree to which expectations specified in the Common Core State Standards and the assessment are in agreement and serve in conjunction with one another to guide the system toward students learning what they are suppose to know and be able to do. • With new and innovative technology and the great potential of online tests, computerized adaptive testing (CAT) has been increasingly implemented in K-12 assessment systems.

  3. Alignment for Linear Tests Different approaches have been employed in the past decade for evaluating the alignment of state assessment programs. Their process is similar in four ways. They are: (1) use approved content standards and implemented tests; (2) review item-by-item from a developed linear test form; (3) use professional judgments for alignment; and (4) evaluate the degree that each item matches the claim, standard, objective and/or topic, and performance expectations (e.g., cognitive complexity and depth of knowledge) in the standards.

  4. Linear Test vs. CAT Linear tests Adaptive tests Many unique test forms are assembled during testing for individual students. Test difficulty for unique test forms vary greatly to match estimated student ability level. How to determine the degree of alignment for adaptive testing? • Test form(s) are assembled prior to operation based on the test specifications. • The fixed form(s) are used for all students. • Linear tests normally target at a medium difficulty level. • The degree of alignment is determined based on the content of test form(s) compared with the predetermined criteria.

  5. Alignment for CAT In adaptive testing, the item pool is the source for assembling test forms for individual students, therefore the alignment should be considered as the relationship between the item pool and the standards that intended to measure. With the adaptive nature, technical issues must be considered in the design, process, and evaluation in alignment for CAT. • What kinds of technical issues should be considered? • Should the alignment be based on the item pool or based on a sample of items? • How should a representative sample of items be selected so that it can be used to make for a fair inference about the item pool? • Is it appropriate to use the same procedure and criteria for linear tests to evaluate the alignment for an adaptive test?

  6. Technical Considerations To demonstrate the possible impact, such as the characteristics of the item pool, student proficiency levels, and item exposure rate, on the alignment results, samples of items and individual test forms were selected from the grade 3 mathematics assessment. Step One: Seven item samples (50 each) were selected from the pool: • Two random samples • A weighted sample based on the four content strains and their relative proportions in the pool • Two samples selected based on the frequency distributions of person- and item-parameters as typical items • Two samples based on item exposure rate by excluding never-used items

  7. Sample Selection

  8. Test Specifications and Constraints

  9. Comparison 1: Content Balance

  10. Comparison 2: Balance of Item Type

  11. Comparison 3: Item Difficulty Distribution

  12. Comparison 3: Item-Parameter Distribution

  13. Comparison 4: Item Exposure Rate

  14. Technical Considerations 2 Step Two: Fourteen individual test forms were selected based on the frequency distributions of estimated student ability (theta) from the grade 3 mathematics.

  15. Comparisons 1: Content and Item Types

  16. Comparison 2: Item Exposure Rate

  17. Comparison 3: Use of Off-Grade Items

  18. Comparison 4: Content Balance of Item Pool

  19. Comparison 5: Item-Parameter in Item Pool

  20. Mapping of Person- and Item-Paramters

  21. Technical Issues in Alignment for CAT • In adaptive testing, each successive item is chosen by a set of constraints to maximize information on estimated ability or to minimize the deviation of the information from a target value. • Among the many particular requirements for CAT, a sizeable and well-balanced item pool with regard to content and psychometrics characteristics is a fundamental condition for success. • To realize many advantages in CAT, the item pool must contain high-quality items that match the criteria of the item selection algorithm for many different levels of proficiency to provide adequate information. • In addition to content constraints, constraint on item exposure rate is essential for item utility and test security.

  22. Technical Issues in Alignment for CAT • The deficit and excess of items for certain assessed content standard(s) consequently created the issue of under- and over-exposure of test items in the pool. • Under-exposed items are often overrepresented in the pool or lack desirable characteristics to meet these constraints for item selection on the test. The presence of unused items in the pool is an unfortunate waste of resources. Over-exposed items not only jeopardize test security, but may result in positive bias of ability estimation, which seriously threatens the validity of high-stakes assessments. • In K-12 assessments, the large populations, wide range of proficiency levels among students, broader content coverage, and its high-stakes nature introduces tremendous technical challenges in the development and implementation of CAT.

  23. Criteria Commonly Used in Alignment

  24. A Brief Summary 1. Technical issues must be taken into account in the design, process for item review, and evaluation of alignment for computerized adaptive testing, such as test specifications, algorithm and constraints for item selection, item exposure rate. 2. No matter the alignment is based on the entire item pool or based on a sample of items, appropriate reference about the alignment of an adaptive test must be supported with evidence from both content/curriculum and technical perspectives 3. The criteria that are commonly used in evaluating the alignment for linear tests may not be suitable for evaluating the alignment for computerized adaptive tests.

More Related