Comments & Suggestions for the RTTT Assessment Competition

Comments & Suggestions for the RTTT Assessment Competition Scott Marion National Center for the Improvement of Educational Assessment Race to the Top Assessment Public and Expert Input Meeting Washington, DC January 20, 2010

Introductory Comments Ends, not means Tough choices Theory of action Structure response as an RFP Curriculum and instruction Marion. Center for Assessment. Jan. 20, 2010

Ends and Means • Questions and other USED documents either imply or propose to require a specific way of doing things • Unless, USED is absolutely sure that this is the only or even the best way (for all contexts) of accomplishing the goals, then: • Be exceptionally clear about the goals of the proposed system(s) • Like we discussed last fall, to the extent possible, clarify the purposes and uses of the assessment system, AND • Allow the smart proposal writers to be creative (innovative) about the means • If you are vague, the writers will be even more vague and you lower the chances of getting what you want Marion. Center for Assessment. Jan. 20, 2010

Tough choices • Again, trying to read into the questions and other documents about the forthcoming NIA, it appears that USED will be asking consortia for… • Innovative assessment practices • Broad implementation • Fast timelines • (otherwise known as “all of the above”) • Something will give! • Follow Lorrie Shepard’s guidance from December 2009 • Allow consortia to propose to do one (relatively) small thing well, e.g., create an innovative assessment system for grades 4-8 mathematics in only a few states Marion. Center for Assessment. Jan. 20, 2010

Tough choices & ends/means • Again, trying to read into the questions for today, several are asking for things where both tough choices and ends/means come together…. • Question 2 asks about increasing the rigor and quality of high school assessments • Question 3 asks about moving to computer-based testing • Requiring #2 might hinder #3 and visa versa Marion. Center for Assessment. Jan. 20, 2010

A Theory of Action • Before finalizing the RFP, USED should articulate a clear and explicit theory of action • All respondents, MUST articulate an explicit theory of action evident in their proposal • Describes how the particular CLEAR goals will be achieved as a result the particular assessment system(s) • Specific mechanisms—how does USED/states/consotia expect we will get from A to B? What is the evidence to support this expectation? • Explicitly describes prioritized design choices, e.g.: • Influence and shape teaching and learning, OR • Measuring existing knowledge, OR • Making cross-state comparisons • The theory of action is a check on the logic of the underlying assumptions Marion. Center for Assessment. Jan. 20, 2010

Response as an RFP • Operational requirements of any multi-state consortium are overwhelming • No state or set of states has the capacity to design and implement a multi-state assessment system • Consortium grantees will have to issue RFPs to support the design, development, and piloting of the many assessments • Therefore, I suggest requiring the response from potential consortia as an RFP • A well-written RFP will make the goals, rationale, and design clear to potential bidders • It will reveal to USED reviewers the extent to which the proposers have thought through the many aspects of the proposed assessment system. • An alternative to this would be to provide enough clarity and lead time (I doubt you can) so that consortia can actually include contractors in their proposals • Clarity on cost and evaluation • What can RTTT dollars be spent on and what’s off limits? Marion. Center for Assessment. Jan. 20, 2010

Curriculum and Instruction? • USED has managed to avoid any mention of curriculum and instruction • While that might be a political necessity, it doesn’t make any practical or conceptual sense if the goal is to really move our educational system forward • I recommend requiring all proposals to at least address how their assessment model addresses considerable differences in curriculum and instruction across districts and states • How do to the proposers think these differences will affect the validity of their assessment results? • How do they propose to deal with these differences if at all? • How do they think their assessment model will further meaningful goals if they do not deal with curriculum and instruction? Marion. Center for Assessment. Jan. 20, 2010

Question 1 (through course) • While I find some aspects of this approach appealing, I would not require this specific approach in the response • What are we (USED, potential consortia) trying accomplish with the through course approach? • All proposals (whether using through course or not) should be required to submit evidence/rationale in all six categories outlined in the question • The “through-course” system carries with it some unique considerations/sources of evidence compared to a more traditional summative assessment • Inter-rater reliability is NOT one of these additional concerns Marion. Center for Assessment. Jan. 20, 2010

Consortia proposing a through-course approach should have to explain/provide additional evidence for: • Construct validity: How would this approach enhance the validity of the score interpretations? • Aggregation: We know very little—other than trying to maximize reliability—about how to best aggregate the scores from the multiple events for both students and schools • OTL: How will the states/consortia deal with potential increased effects due to differences in OTL • Security: Assuming the through course components are used for accountability • Consequences: How will the consortium deal with the potential (likely) negative effects when educators are restricted from using the full potential of the through course components for instructional improvement? • Equating: How will the consortium overcome the significant challenges to valid equating of scores from year-to-year? Marion. Center for Assessment. Jan. 20, 2010

Question 2 (HS EOC rigor) • Pleased that USED is thinking about quality and rigor in high school • Why “common end-of-course summative exams”? • What is the unit for “common”? The school, district, state or consortium? • The through-course approach from Q1 can potentially enhance rigor and validity. Why limit to EOC? • More important to focus on rigor and less on “consistent” or “common” • We have written about the tradeoffs between standardization and flexibility in other contexts (Gong & Marion, 2006) and many of these considerations can be applied here • Consider Amy Guttman’s conception of “threshold” Marion. Center for Assessment. Jan. 20, 2010

Requiring evidentiary support for “rigor” • Provide evidence that students’ performance meets a meaningful threshold • What is the system of review for rigor and technical quality—how does the consortium propose to make this work within and across states? • Has the consortium addressed the balance between flexibility and standardization and offered a convincing case for where they stand? • It would help if USED would clearly signal what they think is the right balance between standardization and flexibility • How will the state/consortia ensure that students have a fair opportunity to meet rigorous thresholds? Validity is threatened if OTL is not provided Marion. Center for Assessment. Jan. 20, 2010

Question 3 (CBT/CAT) • Again, don’t require! What are we trying to accomplish with CBT/CAT? • If USED focuses too much on comparability (e.g., CBT/PBT), they will stifle innovation • We can’t even ensure comparability of computer-to-computer comparability within a single state! • Infrastructure issues are daunting • CBT offers considerable potential for enhancing access for SWD, but it could also increase construct-irrelevant variance • Nimble Tools and others have demonstrated the potential of doing it well Marion. Center for Assessment. Jan. 20, 2010

Evidence to support CBT • Are the items designed (or at least working towards a design) to take advantage of technological capability or are they simply saving paper? • How will the consortium states move to full implementation of CBT so they can begin using innovative item types? • What type of designed-in (not add-on) approaches is the consortium proposing for increasing access for SWD and ELL beyond what is available with paper? • How will the consortium states avoid the negative consequences of the loss of computers and computer time for instructional purposes? Marion. Center for Assessment. Jan. 20, 2010

Additional evidence for CAT • How will the consortium determine the optimal size of the potential item bank? How is this concern influenced when designing for multiple instead of single states? • How will the consortium monitor potential parameter and scale drift over time? • How will the CAT be designed to provide instructionally useful information (beyond a scale score)? • How will the technical aspects of the item selection algorithms be monitored across multiple states? • Will “out-of-grade” level items be allowed? • If not, the potential of CAT is limited considerably, at least for one purpose • Of course, this must be balanced with social justice concerns Marion. Center for Assessment. Jan. 20, 2010

Question 4 (innovation & timeline) • Don’t encourage (or fund) grants that do not move down a path toward innovation, therefore require… • States/consortia to clearly articulate a vision of what they hope to accomplish with their educational system in 10 years and provide evidence/justification for how the proposed assessment system will support this vision • A “map” and a “route” • Theory of action that describes how the states/consortium will be able to stay on the route • Evidence that states have taken steps to avoid “painting themselves into a corner” • Another reasons for funding multiple consortia-- tackling manageable programs! Marion. Center for Assessment. Jan. 20, 2010

Question 5 (research priorities) • The statistical machinery of VAM has been well-studied and does not need special funding (more research won’t correct for non-random assignment!) • However, related areas do need funding… • The design and validity of learning progressions to support both formative assessment and measuring growth with summative assessments • Assessment designs that allow for meaningful depictions of student progress (particularly related to such learning progressions) • How to improve the quality and usefulness of VAM and growth results • For instructional improvement and accountability • How to integrate VAM results with observational evidence to make valid judgments about educator quality • How to better deal with “attribution” challenges, particularly in secondary school Marion. Center for Assessment. Jan. 20, 2010

Question 5 (research priorities #2) • We learned a lot about the generalizability of performance assessments during the 1990s • We could certainly stand to learn a lot more about: • Integrating performance assessments scores (especially if given at a different time of year) with a range of summative assessment type scores • Equating designs with performance or mixed assessment type items • If we can learn to do equating well, we can then more readily include performance assessments as part of growth measures • Design specifications/requirements for rich and engaging performance tasks Marion. Center for Assessment. Jan. 20, 2010

Question 5 (additional research priorities) External accountability systems: Do they work to achieve policy priorities? Do other forms of school reform work better? Equating test scores when so much is changing Validating “college ready” measures—how do we know when we’ve reached “good enough?” Learning progressions—will require a massive development and validation program Marion. Center for Assessment. Jan. 20, 2010

For more information Formal comments will be submitted by January 29, 2010 and available on request: smarion@nciea.org www.nciea.org Marion. Center for Assessment. Jan. 20, 2010

Comments & Suggestions for the RTTT Assessment Competition