Joseph A. Martineau, Ph.D. Vincent J. Dean, Ph.D. Michigan Department of Education

A State Perspective on Enhancing Assessment & Accountability Systems through Systematic Integration of Computer Technology Joseph A. Martineau, Ph.D. Vincent J. Dean, Ph.D. Michigan Department of Education Presentation at the tenth annual Maryland Assessment Conference October 2010

The Michigan Stage • Michigan offers an interesting perspective • Pilot in 2006 • Pilot in 2011 (English Language Proficiency) • Pilot in 2012 (Alternate Assessments) • Pilots leading up to operational adoption of SMARTER/Balanced Assessment Consortium products in 2014/15 • Constitutional amendment barring unfunded mandates

The National Stage • Survey of state testing directors (+D.C.) • 50 responses + one investigation via state department of education website • 7 of 51 states have no CBT initiatives • 44 of 51 states have current CBT initiatives, including: • Operational online assessment • Pilot online assessment • Plans for moving online

The National Stage, continued… • Survey of state testing directors (+D.C.) • CBT initiatives include • Teacher entry of student responses online • Student entry of responses online • P&P replication • CAT • AI scoring • MC via internet, CR via paper and pencil • General populations (grade level and end of course) • Special populations (eases infrastructure concerns) • Modified • Alternate • English language proficiency • Online repository and scoring of portfolio materials • Item banks for flexible unit-specific interim assessment • Initiatives are all over the board, piecemeal for the most part

The National Stage, continued… • Survey of state testing directors (+D.C.) • Of 44 states with some initiative • 26 states currently administer large-scale general populations assessments online • 15 states have plans to begin (or expand) online administration of large-scale general populations assessments • 12 states currently administer special populations assessments online • 3 states have plans to begin (or expand) online administration of special populations assessments

The National Stage, continued… • Survey of state testing directors (+D.C.) • Of 44 states with some initiative • 7 states currently use Artificial Intelligence (AI) scoring of constructed response items • 4 states currently use Computer Adaptive Testing (CAT) technology for general populations assessment, with one more moving in that direction soon • 0 states currently use CAT technology for special populations assessment • 10 states offer online interim/benchmark assessments • 10 states offer online item banks accessible to teachers for creating “formative”/interim/benchmark assessments tailored to unique curricular units

The National Stage, continued… • Survey of state testing directors (+D.C.) • Of 44 states with some initiative • 6 states offer computer based testing (CBT) options on general populations assessment as an accommodation for special populations • 4 states report piloting and administration of innovative item types (e.g. flash-based modules providing mathematical tools such as protractors, rulers, compasses) • 16 states offer End of Course (EOC) tests online, or are implementing online EOC in the near future • 6 states report substantial failure of a large-scale online testing resulting in cessation of computer based testing • Some have recovered and are moving back online • Others have no plans to return to online testing

The National Stage, continued… • Development of the Common Core of State Standards (CCSS) • Content standards (not a test) • English Language Arts (K-12) • Mathematics (K-12) • Developed with backing from 48 states • Adoption tally • Adopted in full by 39 states • Adoption declined in 5 states • Adoption expected by remaining 6 states by end of 2011

The National Stage, continued… • Assessment Consortia • Race to the Top Assessment Competition • Development of an infrastructure and content for a common assessment in measuring CCSS in English Language Arts and Mathematics • Two consortia • SMARTER/Balanced Assessment Consortium (SBAC) • Partnership for the Assessment of Readiness for College and Career (PARCC)

The National Stage, continued… • The consortia: • SMARTER/Balanced • 31 states • 17 governing states • CAT beginning in 2014-2015 • PARCC • 26 states • 11 governing states • CBT beginning in 2014-15

Consortia Membership

The National Stage, Summary • State efforts have been, with few exceptions, piecemeal by… • Program • Content area • Grade level • Type of assessment (summative, interim, formative) • Population (general, modified, alternate) • Most states are… • Involved in some kind of pilot or operational use • Intending to be operational on a large scale by 2014-2015 • Experiencing budget crises… • That make transitions difficult • That make efficiencies of technology integration critical • A strong need to take a systems look at how to integrate computer technology into assessment and accountability systems • Technology integration is a significant opportunity to provide a platform that connects all initiatives

The Organizing Framework for this Paper • From… • Martineau, J. A., & Dean, V. J. (in press). Making Assessment Relevant to Students, Teachers, and Schools. In V. Shute & Becker, B.J. (Eds.). Innovative Assessment for the 21st Century: Supporting Educational Needs. Springer-Verlag, NY. • Figure 1

Entry Points

Outcomes

The Organizing Framework for this Paper, continued… • With a comprehensive system in place, it is possible to identify comprehensively where integration of technology will enable and enhance the system • Components identified with bold outlines on the next slide

Starting from the Bottom Up • Professional Development • Current lack of pre-service and in-service balanced assessment training • Need for rapid scale up to millions of educators on a small budget

Technology Integration into Pre- and In-Service Professional Development • Scaling up is only feasible with integral use of technological tools • High-quality online courses • Social networking among educators • Live tele-coaching • Electronic (graphic, audio, video) capture for distance streaming of materials, plans, and instructional practice vignettes over high speed networks • To facilitate discussion regarding instructional practice between • Candidates and instructor/coach • Candidates and mentor • Mentors and instructor/coach • For example, repurposing Idaho’s special portfolio submission system for educator training

Moving to Content & Process Standards • Start a limited set of high school exit standards based on college and career readiness • From that, develop K-12 content/process standards in a logical progression to college and career readiness • Based on the learning progressions and K-12 content/process standards, develop model instructional materials

Model Instructional Materials Clearinghouse • Develop online clearinghouse of materials for model curriculum and instructional units • Lesson plans • Lesson materials • Video vignettes of high quality instructional practices based on those units • Flexible platform to accept user submission in a variety of formats • User moderated ratings of submission quality

Moving to Assessment Practices • Before actually moving into assessment practices, it is important to classify content standards in three ways: • Timing • On-demand, time limited • On-demand, not time limited • Feedback-looped • Task type • Selected response • Short constructed response • Extended constructed response • Performance events • Setting • Classroom only • Classroom and secure • Based on these classifications, several types of assessment take place

Assessment Practices, continued… • Start with model classroom materials and tools • Initial development of model materials, vignettes, strategies, and tools sets the stage for…

Educator submissions to • Populate online clearinghouse of materials for model classroom assessment practice units • Summative assessment materials • Formative assessment vignettes, strategies, and tools • Flexible platform to accept user submission in a variety of formats • User moderated ratings of submission quality • Non-secure item bank generated by educators • Platform support various item types • User moderated ratings of submission quality • Large enough that security is not a concern • Empirically designed MC items • Fully customizable

Which in Turn Leads to… • Implementation of formative assessment practices enhanced by technological aids, such as • Response devices (e.g., clickers, tablet computers, phones) • Rapid response to teacher queries over online systems • Remote response to formative queries (e.g. rural areas and cyberschools)

Which in Turn Leads to… • Selection or development of summative classroom assessments • On-demand micro-benchmark (small unit) assessments • From non-secure item bank generated by educators • Customizable to fit specific lesson plans/curricular documents • Instant reporting for diagnostic/instructional intervention purposes • Inform targeted professional development in real time • RESULTS NOT used for large-scale accountability purposes (belongs to the schools and teachers)

With High-Quality Classroom Assessment Practices in Place • Large-scale assessment now makes sense, with three types of large-scale assessment

Large-Scale Assessment, continued… • Start with classroom-based • For content standards best measured using “feedback-looped” tasks • Meaning content standards (likely higher order) that are best accomplished with a feedback cycle between teacher and student

Portfolio Development & Submission, continued… • Creation of portfolio includes scannable materials, electronic documents, and/or audio/video of student performance • Submitted via a secure online portfolio repository (e.g., Idaho’s alternate assessment portfolio submission site) • Unlikely to be scorable using AI, therefore, scored on a distributed online scoring system that prevents teachers from scoring their own students’ portfolios (e.g., Idaho’s alternate assessment portfolio scoring site • Can be scored both for final product and development over time

Moving to Secure Online Testing • For content standards that do not require “feedback-looped” tasks • Dynamic online CAT assessments • Based on dynamically selected clusters of content standards covered in instructional units • Scaled to the same scale as the end-of-year assessment, with cut scores for mastery/proficiency • Can move students on to higher grade level content once mastery/proficiency of all grade level content is demonstrated through unit assessments • What Race to the Top Assessment Competition calls “Through-Course Assessment”

Moving to Secure Online Testing • What Race to the Top Assessment Competition calls “Through-Course Assessment” • Provides advance look at trajectory toward proficiency • Provides multiple opportunities to demonstrate proficiency • More equitable for high-stakes accountability purposes • Useful for mid-year correction in instructional practice (e.g. Response to Intervention) • Useful for placement purposes of newly arrived students • Useful for differentiated instruction • Anticipate increase educator motivation (because of timely information)

Moving to Secure Online Testing • Beyond traditional CAT/CBT • AI Scoring of constructed response items • Technology enhanced items • Performance tasks/events (through simulations) • Gaming type items

Moving to Secure Online Testing • For three groups of students… • Initial scaling and calibration group • Ongoing randomly selected validation groups (to validate that students proficient on all required unit tests retain proficiency at the end of the year) • Students who do not achieve proficiency on all required unit tests • Final opportunity to demonstrate overall proficiency if proficiency was in question on any single unit assessment • Allows for the elimination of a single end-of-year test for most students

Scoring • Maximize objective scoring by • Automated scoring of objective items • AI scoring of extended written response items, technology enhanced items, and performance tasks wherever possible • Distributed hand-scoring of tasks not scorable using AI

Distributed Scoring as Professional Development • Human scorers taken from ranks of educators • Online training on hand-scoring • Online certification as a hand-scorer • Online monitoring of rater performance • Validation hand-scoring of samples of AI-scored tasks • Our experience with teacher-led scoring and range-finding indicates that it is some of the best professional development that we provide to educators

Reporting • For the most part, reports are difficult to read and poorly used • Need online reporting of all scores for all stakeholders, including: • Policymakers (aggregate) • Administrators (aggregate and individual) • Teachers (aggregate and individual) • Parents (aggregate and individual) • Students (individual)

Reporting Portal • Reporting portal needs to be able to integrate reports from classroom metrics all the way to large-scale secure assessment metrics

Reporting Portal • Reporting cycles depend on the item types and application of AI scoring. • Immediate where possible • Expedited hand-scoring (shifting funding focus from printing, shipping, and scanning to on-demand hand-scoring)

Where the Rubber Hits the Road • This is a nice system design (if we do say so ourselves), but what are the impediments to implementation? • Infrastructure • LEA hardware and bandwidth capacity • Assessment vendor capacity • Moving from piecemeal components to an integrated, coherent system • Development of educator-moderated clearinghouses • Development of educator-moderated item bank

Where the Rubber Hits the Road • Security • The more high-stakes the system, the more likely security breaches become • Critical need for training on user roles • Critical need for training on data use, since data will become much more readily available across the board • Security controls versus open-source and maximal access

Where the Rubber Hits the Road • Funding • Very high initial startup investment • Dual systems during development and initial implementation • Ramping up LEA technology systems to be capable of working within the system

Where the Rubber Hits the Road • Sustainability • Requires perpetual investment in administration • Development is only the start (e.g. sustainability concerns regarding RTTT-funded assessment consortia) • Requires early success and public understanding of the benefits of the system weighed against ongoing costs • Recurring hardware/software technology upgrade costs for LEAs • Recurring hardware/software technology maintenance costs for central IT systems

Where the Rubber Hits the Road • Local Control • This kind of system is only possible to create with significant funding and local buy-in • No single state (let alone district) could afford the cost of development and implementation • Consortia are imperative to creating such a system • Consortia can tend toward self-perpetuation rather than serving their members • Consortia cannot ignore local nuances • Consortia cannot ignore reasonable needs for flexibility • Consortia must monitor and maximize member investment

Where the Rubber Hits the Road • Building an appetite for online systems • Implementation may occur piecemeal, but should be undertaken within a framework for a coherent and complete system • Each piece when implemented needs to be implemented in such a way that local educators and policymakers see a positive impact on the educational system, e.g., • Immediate turnaround of results • Connection between family and school • Improved instructional practice • Facilitation of differentiated instruction

Recommendations for Future Directions • System has the potential to make us data-rich and analysis-poor • Build local (SEA and LEA) capacity for appropriate analysis (possibly through re-defining positions that might be eliminated through consortia services) • New practices (e.g. through-course, innovative items types, AI scoring) will require a significant research and validation agenda, including • Equating • Comparability • Standard setting

Recommendations for Future Directions • System has the potential to make educators and students data rich • Portfolios of assessment results and products as evidence of students’ college and career readiness • Portfolios of assessment results and products as evidence of teacher classroom practices and effectiveness

Recommendations for Future Directions • Financial incentives from ARRA/RTTT have provided the impetus for some of these initiative to get started • Sustainability needs to be a focus both within and across states • To maximize cross-state focus, we recommend continued significant funding of initiatives through ESEA reauthorization, Enhanced Assessment Grants, and other competitive/formula funding opportunities

Joseph A. Martineau, Ph.D. Vincent J. Dean, Ph.D. Michigan Department of Education