Developing the Tests for NCLB: No Item Left Behind

Developing the Tests for NCLB:No Item Left Behind Steve Dunbar Iowa Testing Programs University of Iowa

Test Development: A Technical Concern • Procedures are well-established – it’s sortof a ‘rocket-art’ • Aspects of ‘quality’ that seem distinct to an observer are inseparable to a developer • Quality control requires resources – talent, time, and money – to do well • TD is the grunt work of assessment

Best Practice in Test Development • Interpret content standards; translate intotest specifications • Search for stimulus material; draft items • Do the 3Rs: REVIEW-REVISE-REPLACE • Prepare material for field testing • Oops – we forgot about finding the kids to participate in field testing, many comparable samples of them

More Best Practice in TD • Administer, retrieve, and score tryout materials; get item analysisresults to TDers • Do the 3Rs: REVIEW-REVISE-REPLACE • Prepare more material for field testing • Oops – more kids for field testing, more comparable samples

What do we get from Best Practice? • Something elusive (important content, interesting materials, good questions, cognitive complexity, comparability) • Something intangible (fairness, alignment with standards, intended consequences) • Something concrete (coverage, rater reliability, a validity or generalizability coefficient, acceptable cost)

Some TD Half Truths • Multiple Choice ItemsDevelopment is hard Scoring is easy (and public)Quality Control built in to TD process • Open-ended ItemsDevelopment is easyScoring is hard (and private)Quality Control elusive due to scoring

Comparability in Test Materials • Test form as the unit for judging comparability • Easy to achieve with many items on the test and many potential throwaways in the pool • Experienced test development staff • Good field testing and scoring needed

Group Differences and Fairness • TD seeks a balance • Tension is that balance requires questions, lots of them • Instructional influences confounded with group effects • DIF requires good matching questions

Cost Factors in Large-Scale Testing • Development CostsRecur with each test formAre fixed by instrument design • Scoring CostsRecur with each test administrationMay change because of ‘unexpected’ circumstances

Validity in Test Development • Best practice ensures content quality, balance, and alignment with standards – critical aspects of validity & reliability • TD is predicated on anticipated use • Other aspects of validity & reliability aren’t understood until it’s too late, i.e. when the test is operational

Validity & Capacity in NCLB • NCLB is census testing • Census testing places heavy demands on TD and other aspects of an accountability system • Limit on capacity in TD meansonly 1R, or 2Rsfewer rounds of field testing dwindling pools of test materials • No item left behind

Developing the Tests for NCLB: No Item Left Behind