Classroom Assessments in Large Scale Assessment Programs

Classroom Assessments in Large Scale Assessment Programs PowerPoint PPT Presentation


  • 156 Views
  • Uploaded on
  • Presentation posted in: General

History of Criterion-Referenced Assessment Models. ?Measurement driven instruction" (e.g., Popham, 1987) emerged during the 1980's A process wherein the tests are used as the driver for instructional change. ?If we value something, we must assess it." Minimum-competency movement of the 1980's ?D

Download Presentation

Classroom Assessments in Large Scale Assessment Programs

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


1. Classroom Assessments in Large Scale Assessment Programs Catherine Taylor University of Washington/OSPI Lesley Klenk OSPI

2. History of Criterion-Referenced Assessment Models “Measurement driven instruction" (e.g., Popham, 1987) emerged during the 1980’s A process wherein the tests are used as the driver for instructional change. “If we value something, we must assess it.” Minimum-competency movement of the 1980's “Drive" instructional practices toward teaching of basic skills Movement was successful - Teachers did teach to the tests. Unfortunately, teachers taught too closely to tests (Smith, 1991; Haladyna, Nolen, Hass, 1991). The tests were typically multiple-choice tests of discrete skills Instruction narrowed to the content that was tested in the same form that it was tested.

3. History of Criterion-Referenced Assessment Models Large-scale achievement tests came under wide spread criticism Negative impacts on the classroom (Darling-Hammond & Wise, 1985; Madaus, West, Harmon, Lomax, & Viator, 1992; Shepard & Dougherty, 1991). Lack of fidelity to valued performances

4. History of Criterion-Referenced Assessment Models Studies compared indirect and direct measures of: writing (Stiggins, 1982) mathematical problem-solving (Baxter, Shavelson, Herman, Brown, & Valadez, 1993) science inquiry (Shavelson, Baxter, and Gao, 1993): Demonstrated that some of the knowledge and skills measured in each assessment format overlap Moderate to low correlations between different assessment modes Questions about the validity of multiple-choice test scores. Other studies (Haladyna, Nolen, and Haas, (1991) Shepard and Dougherty (1991), and Smith (1991)) showed: pressure to raise scores on large scale tests narrowing of the curriculum to the specific content tested substantial classroom time spent teaching to the test and item formats.

5. History of Criterion-Referenced Assessment Models In response to criticisms of multiple-choice tests assessment reformers (e.g., Shepard, 1989; Wiggins, 1989) pressed for: Different types of assessment Assessments that measure students' achievement of new curriculum standards Assessment formats that more closely match the ways knowledge, concepts and skills are used in the world beyond tests Encourage teachers to teach higher order thinking, problem-solving, and reasoning skills rather than rote skills and knowledge.

6. History of Criterion-Referenced Assessment Models In response to these pressures to improve tests LEAs, testing companies, and projects (e.g., New Standards Project) incorporated “performance assessments” into testing programs “Performance assessments" included: Short-answer items similar to multiple-choice items Carefully scaffolded, multi-step tasks with several short-answer items (e.g., Yen, 1993) Open-ended performance tasks (California, 1990; OSPI, 1997).

7. History of Criterion-Referenced Assessment Models Still, writers criticized these efforts Tasks are contrived and artificial (see, for example, Wiggins, 1992) Teachers complain that standardized tests don’t assess what is taught in the classroom Shepard (2000) indicated that the promises of high quality performance-based assessments have not been realized. Authentic tasks are costly to implement, time-consuming, and difficulty to evaluate Less expensive performance assessment options are less authentic

8. Impact of National Curriculum Standards Knowledge is expanding rapidly Education must shift away from knowledge dissemination Students must learn how to: Gather information Comprehend, analyze, interpret information Evaluate the credibility of information Synthesize information from different sources Develop new knowledge

9. Early Attempts to Use Portfolios for State Assessment Three states attempted to use collections of classroom work for state assessment: California (Kirst & Mazzeo, 1996; Palmquist, 1994) Kentucky (Kentucky State Department of Education, 1996) Vermont (Fontana, 1995; Forseth, 1992; Hewitt, 1993; Vermont State Department of Education, 1993, 1994a, 1994b).

10. Early Attempts to Use Portfolios for State Assessment Initial efforts were fraught with problems: Inconsistency of raters when applying scoring criteria (Koretz, Stecher, & Deibert, 1992b; Koretz, Stecher, Klein, & McCaffrey, 1994a), Lack of teacher preparation in high quality assessment development (Gearhart & Wolf, 1996), Inconsistencies in the focus, number, and types of evidence included in portfolios (Gearhart & Wolf, 1996; Koretz, et al 1992b), and Costs and logistics associated with processing portfolios (Kirst & Mazzeo, 1996).

11. Research on Large Scale Portfolio Assessment Research on impact of portfolios showed mixed results: Teachers and administrators have generally positive attitudes about use of portfolios (Klein, Stecher, & Koretz, 1995; Koretz, et al 1992a; Koretz, et al 1994a) Positive effects on instruction (Stecher, & Hamilton, 1994) Teachers develop a better understanding of mathematical problem-solving (Stecher & Mitchell, 1995) Too much time spent on the assessment process (Stecher, & Hamilton, 1994; Koretz et al, 1994a) Teachers work too hard to ensure that portfolios "look good" (Callahan, 1997).

12. Advantages to using classroom evidence in large-scale assessment program Evidence that teachers are preparing students to meet curriculum and performance standards (opportunity to learn), Broader evidence about student achievement Opportunity to assess knowledge and skills difficult to assess via standardized tests (e.g., speaking and presenting, report writing, scientific inquiry processes) Opportunity to include work that more closely represents the real contexts in which knowledge and skill are applied

13. Opportunity to Learn Little evidence is available about whether teachers are actually teaching to curriculum standards. Claims about positive impacts of new assessments on instructional practices are largely anecdotal or based on teacher self-report Legal challenges to tests for graduation, placement, and promotion demand evidence that students have had the opportunity to learn tested curriculum (Debra P. v. Turlington, 1979). There is no efficient method to assess students’ opportunity to learn the valued concepts and skills Collections of classroom work provide a window into the educational experiences of students Collections of classroom work provide window into the educational practices of teachers Collections of classroom work could help administrators evaluate the effectiveness of in-service teacher development programs Classroom assessments could be used in court cases to provide evidence of individual students’ opportunity to learn

14. Broader Evidence of Student Learning Some students function well in the classroom but do not perform well on tests. “Stereotype threat" research - fear of negative stereotype can lead minority students and girls to perform less well than they should on standardized tests (Aronson, Lustin, Good, Keough, Steele, Brown, 1999; Steele, 1999; Steele & Aronson, 2000). Students may have cultural values or language development issues that inhibit performance on timed, standardized tests These factors threaten the validity of large-scale test scores. Classroom work can be more sensitive to students’ cultural and linguistic backgrounds Collections of classroom work can be more reliable than standardized test scores

15. Including Standards that are Difficult Measure on Tests Some desirable curriculum standards are too unwieldy to measure on large-scale tests (e.g., scientific inquiry, research reports, oral presentations) Historically, standardized tests measure complex work by testing knowledge of how to conduct the work. Examples Knowing where to locate sources for reports Knowing how to use tables of contents, bibliographies, card catalogues, and indexes Identifying control or experimental variables in a science experiment knowing appropriate strategies for oral presentation Knowing appropriate ways to use visual aids Critics often note that knowing what to do doesn't necessarily mean one is able to do.

16. Authenticity Frederickson (1984) question of authenticity in assessment due to misrepresentation of domains by standardized tests. Wiggins (1989) claimed that in every discipline there are tasks that are authentic to the given discipline. Frederickson (1998) stated that authentic achievement is “significant intellectual accomplishment” that results in the “construction of knowledge through disciplined inquiry to produce discourse, products, or performances that have meaning or value beyond success in school.” (p. 19, italics added). Examples of performances: Policy analysis Historical narrative and evaluation of historical artifacts Geographic analysis of human movement Political debate Story and poetry writing Literary analysis/critique Mathematical modeling Investment or business analyses Geometric design and animation Written report of a scientific investigations Evaluation of the health of an ecosystem

17. Authenticity Some measurement specialists question the use of the terms “authentic” and “direct” measurement All assessments are indirect measures from which we make inferences about other, related performances (Terwilliger, 1997)) However: Validity is related to the degree of inference necessary from scores on a standardized tests to valued work Authentic classroom work requires less inference than multiple choice test scores

18. Challenges with Inclusion of Classroom Work in Large Scale Programs Limited teacher preparation in classroom-based assessment (which can limit the quality of classroom-based evidence), Selections of evidence (which can limit comparisons across students), Reliability of raters (which can limit the believability of scores given to student work) Construct irrelevant variance (which can limit the validity of scores)

19. Solving Teacher Preparation Issues Teachers must be taught how to: Select, modify, and develop assessments Score (evaluate) student work Write scoring (marking) rules for assessments that align to standards Significant, ongoing professional development in assessment is essential. Teachers need to re-examine: Important knowledge and skills within each discipline How to teach so that students are more independent learners

20. Selection of Evidence "For which knowledge, concepts, and skills do we need classroom-based evidence?" Koretz, et al (1992b) claimed that, when teachers are free to select evidence, there is too much diversity in tasks Diversity may cause low inter-judge agreement among raters of the portfolios. Koretz and his colleagues recommended placing some restrictions on the types of tasks considered acceptable for portfolios. Teachers need guidance in terms of what constitutes appropriate types of evidence.

21. Improving Selections of Evidence Provide guidelines for what constitutes an effective collection of evidence Provide models for the types of assignments (performances) that will demonstrate the standards. Provide blueprints for tests that can assess that EALRs assessed by WASL Provide guides for writing test questions and scoring rubrics Provide guides for writing directions and scoring rubrics for assignments (performances)

22. Guidelines for Collections Include Lists of important work samples to collect (e.g., research reports, mathematics problems) Number and types of evidence for each category Outline of steps in performances and work samples Tools for assessment of students’ performances and work samples

23. Example Lists of Number and Types of Work Samples to Collect Writing Performances At least 2 different writing purposes At least 3 different audiences Some examples from courses other than English Science Investigations: At least 3 investigations (physical, earth/space, life) Observational assessments of hands-on work Lab books Summary research reports

24. Develop “Benchmark” Performance Assessments Benchmark performances are performances that: Have value in their own right Are complex and interdisciplinary Students expected to do by the end of some defined period of time (e.g., the end of middle school). Performance may require: Application of knowledge, concepts and skills across subject disciplines (e.g., survey research) Authentic work within one subject discipline (e.g., scientific investigations, expository writing)

25. Example Description of a Benchmark Performance in Reading By the end of middle school, students will select one important character from a novel, short story, or play and write a multi-paragraph essay describing a character, how the character's personality, actions, choices, and relationships influence the outcome of the story, and how the character was affected by the events in the story. Each paragraph will have a central thought that is unified into a greater whole supported by factual material (direct quotations and examples from the text) as well as commentary to explain the relationship between the factual material and the student's ideas.

26. Example Description of a Benchmark Performance in Mathematics By the end of high school, students will investigate and report on a topic of personal interest by collecting data for a research question of personal interest. Students will construct a questionnaire and obtain a sample a relevant population. In the report, students will report the results in a variety of appropriate forms (including pictographs, circle graphs, bar graphs, histograms, line graphs, and/or stem and leaf plots and incorporating the use of technology), analyze and interpret the data using statistical measures (central tendency, variability, and range) as appropriate, describe the results, make predictions, and discuss the limitations of their data collection methods. Graphics will be clearly labeled (including name of data, units of measurement and appropriate scale) and informatively titled. References to data in reports will include units of measurement. Sources will be documented.

27. Example of the Process of Developing Benchmark Performances Select work that would be familiar or meaningful: Purchasing decision Describe the performance in some detail: A person plans to buy a ___ on credit. The person figures out how much s/he can spend (down-payment and monthly payments), does research on the different types of ___, reads consumer reports or product reviews, compares costs and qualities, and makes a final selection. The person then locates the chosen product and purchases it or finances the purchase.

28. Example of the Process (continued) Define the steps adults take to complete the performance: A person plans to buy a ___ on credit for ____ purpose. The person figures out how much s/he can spend: Determines money available for down-payment Compares income and monthly expenses to determine cash available for monthly payment Does research on the different types of ___ including costs and finance options. Reads consumer reports or product reviews Compares costs, qualities, and finance options Makes a final selection. Locates the chosen product and finances the purchase.

29. Example of the Process (continued) Create grade level appropriate steps The student plans to buy a ___ on credit for _____ purpose. The student: Figures out how much s/he can spend: Determines money available for down-payment Compares income and monthly expenses to determine cash available for monthly payment Does research on the at least 3 types of ______ Determines costs and finance options. Reads consumer reports or product reviews Compares costs, qualities, and finance options Makes a final selection that is optimal for cost, quality and finance options within budget.

30. Example of the Process (continued) Identify the EALRs demonstrated at each step: The student plans to buy a ___ on credit for _____ purpose. The student: Figures out how much s/he can spend (EALR 4.1): Determines money available for down-payment (EALR 4.1) Compares income and monthly expenses to determine cash available for monthly payment (EALR 3.1) Does research on the at least 3 types of ______ (EALR 4.1) Determines costs and finance options (EALR 1.5.4) Reads consumer reports or product reviews (EALR 4.1) Compares costs, qualities, and finance options (EALR 3.1) Makes a final selection that is optimal for cost, quality and finance options within budget (EALR 2.1-2.3)

31. Example of the Process (continued) Modify the steps as needed to ensure demonstration of the EALRs: The student plans to buy a ___ on credit for _____ purpose. The student: Figures out how much s/he can spend (EALR 4.1): Determines money available for down-payment (EALR 4.1) Compares income and monthly expenses to determine cash available for monthly payment (EALR 3.1) Does research on the at least 3 types of ____ (EALR 4.1) Determines costs and finance options (EALR 1.5.4) Reads consumer reports or product reviews (EALR 4.1) Creates a table to show comparison of costs, qualities, and finance options (EALR 3.1) Makes a final selection and explains how it is optimal for cost, quality and finance options within budget (EALR 2.1-2.3)

32. Possible Authentic Performances in Mathematics Survey Research: Community issue School issue Return on investment (costs and sales) Purchasing decisions Graphic designs Animation Social science analyses Sources of GDP Major categories of federal budget Casualties during war

33. Possible Authentic Performances in Reading Literary analyses: Comparisons across different works by the same author Comparisons across works by different authors on same theme Analysis of theme, character, plot development Reading journals Research reports: Summary of information on a topic from multiple sources Investigation of a social or natural science research question using multiple sources Position paper based on information from multiple sources

34. Providing example blueprint for tests that can assess the standards

35. Example blueprint for tests that can assess standards

36. Solving Score Reliability Issues Train expert teachers to evaluate diverse collections of evidence Expert teachers evaluate the collection of work to determine whether it meets standards

37. Construct Irrelevant Variance Factors that are unrelated to targeted knowledge and skills that affect validity of performance Teachers provide too much “help” Teachers provide differential types of help Students get help from parents Directions for assignments are not clear Students are taught the content but not how to do the type of performance

38. Solving Construct Irrelevant Variance Problems Provide guidelines for what constitutes valid evidence Provide model performance assessments or benchmark performance descriptions Provide professional development on appropriate levels of help Provide professional development on the EALRs and GLEs Provide professional development on how to teach to authentic work

39. Conclusion Collections of evidence CAN be used to measure valued knowledge and skills Collection of Evidence (COE) guidelines for Washington State: Incorporate many of the characteristics that will ensure more valid student scores Will continue to improve as more examples are provided Scoring of collections: Will involve use of the same rigor in scoring as on WASL items Will provide reliable student level scores

  • Login