1 / 24

Designing and Implementing School Level Assessments with District Input

I would like to acknowledge:. Strategic Educational Research Partnership (SERP) Institute BPS Design Team (esp. David Francis), Boston University (Gloria Waters and David Caplan). Harvard University (Catherine Snow, Sky Marrietta, Claire White, Joshua Lawrence) Boston Public SchoolsBro

mora
Download Presentation

Designing and Implementing School Level Assessments with District Input

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. Designing and Implementing School Level Assessments with District Input John Sabatini, Kelly Bruce, and Srinivasa (Pavan) Pillarisetti Educational Testing Service This research is funded in part by grants by the Institute of Education Sciences (R305G04065). Any opinions expressed in this article are those of the authors and not necessarily of Educational Testing Service. Email: jsabatini@ets.org Introduce and explain partnerships with CFL, Lindy Boggs Alliance, other literacy providers. Introduce and explain partnerships with CFL, Lindy Boggs Alliance, other literacy providers.

    2. I would like to acknowledge: Strategic Educational Research Partnership (SERP) Institute BPS Design Team (esp. David Francis), Boston University (Gloria Waters and David Caplan). Harvard University (Catherine Snow, Sky Marrietta, Claire White, Joshua Lawrence) Boston Public Schools Brockton Public Schools Introduce and explain partnerships with CFL, Lindy Boggs Alliance, other literacy providers. Introduce and explain partnerships with CFL, Lindy Boggs Alliance, other literacy providers.

    3. 3 Design Team Design Team Assessment Subgroup -- Drs. David Francis, Univ. of Houston; Gloria Waters, Boston University; and John Sabatini, ETS Charge -- advise SERP and BPS in the ways in which the assessment of reading could be made more efficient and productive in the district. To get at fluency in students, our idea is to see what works best for whom. To get at fluency in students, our idea is to see what works best for whom.

    4. 4 Needs Assessment In initial Design team meetings, we learned that district leaders had made significant investments in Reading intervention products Teacher professional development to support literacy. State test results Students took lots of tests (mostly mandated) To get at fluency in students, our idea is to see what works best for whom. To get at fluency in students, our idea is to see what works best for whom.

    5. 5 Problem Definition However, no consistent reading/literacy instruments for Determining the nature or severity of reading problems Identifying the prevalence and profiles of struggling readers Receiving timely results Hence, Inefficient placement of students into intervention programs Weak/insensitive measures of effectiveness of interventions that target subskills To get at fluency in students, our idea is to see what works best for whom. To get at fluency in students, our idea is to see what works best for whom.

    6. 6 Aims Short term Build a battery of screening/diagnostic assessments for school-wide use Estimate prevalence and nature of student reading difficulties Long-term goals Replace other redundant assessments Triage students for specialized testing, thus, Reducing the total time spent on assessment Use instruments to evaluate intervention programs For testing, explain that we meet for about 2 1/2 - 3 hours- so it can be exhausting- and give them $25 for their time for the pre testing and then $25 for post (a shorter, 45 minute, battery). For testing, explain that we meet for about 2 1/2 - 3 hours- so it can be exhausting- and give them $25 for their time for the pre testing and then $25 for post (a shorter, 45 minute, battery).

    7. 7 The challenge was to build a battery that screened for reading difficulties across a wide range of skills from decoding through vocabulary; had acceptable psychometric properties; was compact (i.e., could be administered in about one 40-50 minute session); could feasibly be implemented school wide; and rapid turnaround of score reports (i.e., within 2-3 weeks) useful at multiple stakeholder levels (e.g., teacher, school, districts) Computerized delivery and scoring could make it feasible to meet most all of the above design constraints, and was viewed as desirable by BPS. For testing, explain that we meet for about 2 1/2 - 3 hours- so it can be exhausting- and give them $25 for their time for the pre testing and then $25 for post (a shorter, 45 minute, battery). For testing, explain that we meet for about 2 1/2 - 3 hours- so it can be exhausting- and give them $25 for their time for the pre testing and then $25 for post (a shorter, 45 minute, battery).

    8. 8 Rationale & Background Literature Theoretical perspective grounded in componential approaches to reading assessment (Cain, Bryant, & Oakhill, 2004; Oakhill, Cain, & Bryant, 2003; Perfetti, Landi, & Oakhill, 2005). Although skilled, proficient readers are characterized by the integrative, interactive nature of processing during any reading task, there is nonetheless evidence for subcomponent skills. Component reading measures can be used as indicators of skill profiles of struggling readers, adding value over and above the types of off-the-shelf comprehension tests the district was using (Sabatini, 2009). For testing, explain that we meet for about 2 1/2 - 3 hours- so it can be exhausting- and give them $25 for their time for the pre testing and then $25 for post (a shorter, 45 minute, battery). For testing, explain that we meet for about 2 1/2 - 3 hours- so it can be exhausting- and give them $25 for their time for the pre testing and then $25 for post (a shorter, 45 minute, battery).

    9. 9 Background Literature & Rationale As a general principal, test designed to align with empirical research on struggling reader difficulties and effective instructional programs (e.g., NRC; 1998; NICHD, 2000) As well as with cognitive and linguistic theories of the skills underlying reading development and difficulty (e.g., Kintsch, 2000; Perfetti, Landi, & Oakhill, 2005; Perfetti, Van Dyke, & Hart, 2001; Rayner et al, 2007; Vellutino, Tunmer, Jaccard, & Chen, 2007). For testing, explain that we meet for about 2 1/2 - 3 hours- so it can be exhausting- and give them $25 for their time for the pre testing and then $25 for post (a shorter, 45 minute, battery). For testing, explain that we meet for about 2 1/2 - 3 hours- so it can be exhausting- and give them $25 for their time for the pre testing and then $25 for post (a shorter, 45 minute, battery).

    10. 10 Method/Approach: Typical Test Design Steps Step 1: Construct definition Step 2: Design Specifications/Test Blueprint Step 3: Test construction Step 4: Conduct pilot Step 5: Conduct field trial Step 6: Go operational For testing, explain that we meet for about 2 1/2 - 3 hours- so it can be exhausting- and give them $25 for their time for the pre testing and then $25 for post (a shorter, 45 minute, battery). For testing, explain that we meet for about 2 1/2 - 3 hours- so it can be exhausting- and give them $25 for their time for the pre testing and then $25 for post (a shorter, 45 minute, battery).

    11. 11 Method/Approach: ‘Use-inspired’ approach Step 1: Define an assessment problem and information need/criteria for success Step 2: Get research assessment team(s) to ‘volunteer’ to commit time and resources to problem (in return for data) Step 3: Cobble together funding to accomplish initial aims (e.g., SERP foundation support; researcher grants) Step 4: Get district approval; find some schools willing (and able) to work with you on pilot implementation Step 5: Design/adapt items; conduct pilot/field studies; analyze data, report back to district and SERP Step 6: Rinse and repeat as necessary. For testing, explain that we meet for about 2 1/2 - 3 hours- so it can be exhausting- and give them $25 for their time for the pre testing and then $25 for post (a shorter, 45 minute, battery). For testing, explain that we meet for about 2 1/2 - 3 hours- so it can be exhausting- and give them $25 for their time for the pre testing and then $25 for post (a shorter, 45 minute, battery).

    12. 12 Method/Approach: ‘Use-inspired’ process In sum: Process is variable and complex Involves multiple, iterative pilots Each pilot designed to investigate different research and practical questions, ultimately moving the team towards an assessment solution that met the needs of both district and research stakeholders. For testing, explain that we meet for about 2 1/2 - 3 hours- so it can be exhausting- and give them $25 for their time for the pre testing and then $25 for post (a shorter, 45 minute, battery). For testing, explain that we meet for about 2 1/2 - 3 hours- so it can be exhausting- and give them $25 for their time for the pre testing and then $25 for post (a shorter, 45 minute, battery).

    13. 13 Pilot 1, June 2006 – September 2006 Participants: two middle schools Summer 2006, follow up Sept 2006. Objectives/Questions: How prevalent were basic reading skill difficulties -- basic decoding, word recognition, and reading fluency? Can we implement this without schools and districts mutinying? Results: Yes, at least in these schools, significant numbers Yes. [We have a great team.] Conclusion: So far so good, let’s try again. For testing, explain that we meet for about 2 1/2 - 3 hours- so it can be exhausting- and give them $25 for their time for the pre testing and then $25 for post (a shorter, 45 minute, battery). For testing, explain that we meet for about 2 1/2 - 3 hours- so it can be exhausting- and give them $25 for their time for the pre testing and then $25 for post (a shorter, 45 minute, battery).

    14. 14 Pilot 2, June 2007 Participants: three Middle and three High schools. CORE battery; random half take ETS or BU subtests Objectives/Questions: Confirm basic reading difficulties finding with externally valid tests. Begin exploring relationship of subtests to external test criteria Results: Substantial numbers of students with word reading difficulties on (TOWRE) (Torgesen, Wagner, & Rashotte, 1999); both BU and ETS tests Moderate to strong correlations with MCAS and other external tests. Conclusion: evidence supported the directions chosen by the intervention design teams to develop vocabulary and basic skills programs; but how to reduce battery? For testing, explain that we meet for about 2 1/2 - 3 hours- so it can be exhausting- and give them $25 for their time for the pre testing and then $25 for post (a shorter, 45 minute, battery). For testing, explain that we meet for about 2 1/2 - 3 hours- so it can be exhausting- and give them $25 for their time for the pre testing and then $25 for post (a shorter, 45 minute, battery).

    15. 15 Pilot 3, September 2007 - December 2007 Participants: two Middle and two High schools from previous; two new middle schools. CORE battery; random half take ETS or BU subtests Objectives/Questions: Feasible scoring: test multiple choice vs. oral response measures. How best to combine measures into a feasible, parsimonious mixture of measures that spanned the range of reading skills, Results: Multiple choice can work Indeterminate; total battery too long, but no clear path for simplifying. Conclusion: Rinse and repeat. For testing, explain that we meet for about 2 1/2 - 3 hours- so it can be exhausting- and give them $25 for their time for the pre testing and then $25 for post (a shorter, 45 minute, battery). For testing, explain that we meet for about 2 1/2 - 3 hours- so it can be exhausting- and give them $25 for their time for the pre testing and then $25 for post (a shorter, 45 minute, battery).

    16. 16 Pilot 4, Fall and Spring 2008 Participants: two middle schools and a follow up with one school in the Spring. Six subtest battery Objectives/Questions: Improve psychometric and scale qualities of subtests Gather evidence of added value in subtests over total scores. Results: Reliability and other test properties showed improvement, - cross grade performance levels in predicted ranges. - sentence and comprehension tests need improvement. 2 Evidence that subscores were contributing added value over and above total scores (Sabatini, Bruce, & Sinharay, 2009). Conclusion: Given the success of the battery so far, it seems appropriate to implement a larger-scale trial. [Repeat] For testing, explain that we meet for about 2 1/2 - 3 hours- so it can be exhausting- and give them $25 for their time for the pre testing and then $25 for post (a shorter, 45 minute, battery). For testing, explain that we meet for about 2 1/2 - 3 hours- so it can be exhausting- and give them $25 for their time for the pre testing and then $25 for post (a shorter, 45 minute, battery).

    17. 17 Pilot 5, Fall 2009 Participants: Field test with over 4000 6th- 8th graders (Form 6) and 500 4th-5th graders (Form 4, which was new). Forms shared 50% of their content. Objectives/Questions: Improve item and form psychometrics Build scales linked to previous year MCAS scores and refine the score reporting. Pilot versions designed for grades 4 and 5. Results: Reliability and other test properties showed improvement Created a scale for each subtest, aligned with MCAS: Warning, Needs Improvement; Proficient level. Presented with SERP in district meeting so that individual schools could use the data to plan for future literacy needs. Initial data promising, but needs further work. Conclusion: Now have scaled test that is functional for operational needs at 6-8 range. For testing, explain that we meet for about 2 1/2 - 3 hours- so it can be exhausting- and give them $25 for their time for the pre testing and then $25 for post (a shorter, 45 minute, battery). For testing, explain that we meet for about 2 1/2 - 3 hours- so it can be exhausting- and give them $25 for their time for the pre testing and then $25 for post (a shorter, 45 minute, battery).

    18. 18 Summary: Pilot Site Information

    19. 19 Summary: Reliability Estimates

    20. 20 Challenges and Lessons Learned Designing for multiple purposes and stakeholders Adapting to the fits and starts of district and school level decision-making Sharing actionable results with stakeholders Technological infrastructure of schools and districts Collaborating with other research groups For testing, explain that we meet for about 2 1/2 - 3 hours- so it can be exhausting- and give them $25 for their time for the pre testing and then $25 for post (a shorter, 45 minute, battery). For testing, explain that we meet for about 2 1/2 - 3 hours- so it can be exhausting- and give them $25 for their time for the pre testing and then $25 for post (a shorter, 45 minute, battery).

    21. 21 Contact Information John Sabatini Jsabatini@ets.org

    22. 22 Method/Approach: Typical Test Design Steps Step 1: Construct definition defines the target population, the content and constructs to be measured, and the inferences or claims that test scores are intended to be used to make. Step 2: Design Specifications specification process which includes defining a test blueprint, test administration and scoring logistics, and constraints. Step 3: Test construction Generate and review items, develop test forms drafting of administration and scoring guidelines. For testing, explain that we meet for about 2 1/2 - 3 hours- so it can be exhausting- and give them $25 for their time for the pre testing and then $25 for post (a shorter, 45 minute, battery). For testing, explain that we meet for about 2 1/2 - 3 hours- so it can be exhausting- and give them $25 for their time for the pre testing and then $25 for post (a shorter, 45 minute, battery).

    23. 23 Method/Approach: Typical Test Design Steps Step 4: Conduct pilots Assess basic administration and scoring assumptions and revise accordingly. identify poorly performing items, which are then revised or replaced. Step 5: Conduct field trial sample the target population. Stat analysis/psychometrics Scales, norming, equating (as needed) Validity studies Step 6: Go operational that is, they are administered (or sold) for use under test conditions such that score reports are used to inform educational decisions. For testing, explain that we meet for about 2 1/2 - 3 hours- so it can be exhausting- and give them $25 for their time for the pre testing and then $25 for post (a shorter, 45 minute, battery). For testing, explain that we meet for about 2 1/2 - 3 hours- so it can be exhausting- and give them $25 for their time for the pre testing and then $25 for post (a shorter, 45 minute, battery).

    24. 24 Challenges and Lessons Learned Designing for multiple purposes and stakeholders Adapting to the fits and starts of district and school level decision-making Sharing actionable results with stakeholders Technological infrastructure of schools and districts Collaborating with other research groups For testing, explain that we meet for about 2 1/2 - 3 hours- so it can be exhausting- and give them $25 for their time for the pre testing and then $25 for post (a shorter, 45 minute, battery). Multiple purposes The tradition of assessments from which the RISE is derived stem from clinical, psychological measures that are typically administered one to one by trained specialists. It has been a long-term aim of the ETS team’s research agenda to design assessments that are a bridge between these labor-intensive, clinical instruments and the general off-the-shelf standardized tests used in most group test settings. However, different stakeholders use data and student scores to make different types of educational decisions and bring different levels of expertise to the interpretive process. This is an ongoing challenge with respect to preparing training, professional development, score reports, and interpretive guides for teachers, specialists, school administrators, and district professionals, but perhaps is one of the most rewarding aspects of the SERP model. One is forced to confront and adapt to the varied ‘use cases’ of assessment information, as well as the varying conceptions of what kinds of information test scores can provide. For testing, explain that we meet for about 2 1/2 - 3 hours- so it can be exhausting- and give them $25 for their time for the pre testing and then $25 for post (a shorter, 45 minute, battery). Multiple purposes The tradition of assessments from which the RISE is derived stem from clinical, psychological measures that are typically administered one to one by trained specialists. It has been a long-term aim of the ETS team’s research agenda to design assessments that are a bridge between these labor-intensive, clinical instruments and the general off-the-shelf standardized tests used in most group test settings. However, different stakeholders use data and student scores to make different types of educational decisions and bring different levels of expertise to the interpretive process. This is an ongoing challenge with respect to preparing training, professional development, score reports, and interpretive guides for teachers, specialists, school administrators, and district professionals, but perhaps is one of the most rewarding aspects of the SERP model. One is forced to confront and adapt to the varied ‘use cases’ of assessment information, as well as the varying conceptions of what kinds of information test scores can provide.

    25. 25 Background Literature & Rationale Assessment of components skills useful in screening struggling readers who may have failed to acquire efficient fundamental skills in the elementary school years. measures of fluency and word reading efficiency are common in research and classrooms across grade levels (e.g., Deno & Marsten, 2006; Wayman et al., 2007). Reading component proficiency is typically characterized by increased automatic and efficient processing important in the middle grades and beyond in handling the increasing quantity and complexity of texts (ACT Inc., 2009; Adlof, Catts, & Little, 2006; Jenkins et al., 2003; Kuhn et al., 2010; Rayner et al., 2003; Torgesen, Wagner, & Rashotte, 1999). For testing, explain that we meet for about 2 1/2 - 3 hours- so it can be exhausting- and give them $25 for their time for the pre testing and then $25 for post (a shorter, 45 minute, battery). For testing, explain that we meet for about 2 1/2 - 3 hours- so it can be exhausting- and give them $25 for their time for the pre testing and then $25 for post (a shorter, 45 minute, battery).

More Related