1 / 51

The Oregon DATA Project: Strand 3 Using Data to Improve Learning in the Classroom

Patman
Download Presentation

The Oregon DATA Project: Strand 3 Using Data to Improve Learning in the Classroom

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. The Oregon DATA Project: Strand 3 Using Data to Improve Learning in the Classroom January 28, 2010 Lane ESD Marianne Oakes Analicia Santos

    2. “As to methods, there may be a million and then some, but principles are few. The man who grasps principles can successfully select his own methods. The man who tries methods, ignoring principles, is sure to have trouble.” Ralph Waldo Emerson Assessment is an entire field of study and practice. As educators you may not have time to learn the entire field in depth, but this quote from Emerson captures what we’re about for the next two days. We’re going to discuss the important principles of assessment within the context of classroom instruction. Assessment is an entire field of study and practice. As educators you may not have time to learn the entire field in depth, but this quote from Emerson captures what we’re about for the next two days. We’re going to discuss the important principles of assessment within the context of classroom instruction.

    3. Professional Learning Norms Engagement Respect Choice and Responsibility

    4. Objectives of Strand 3 Participants will: Review and evaluate the characteristics and appropriate uses of various types of assessments Identify the uses of the OAKS assessment Evaluate the quality and efficiency of current school and classroom assessment practices Consider a collaborative process for using assessment results to target instruction These are the objectives that mirror the questions. Given what we’ve planned, we’re going to do some housekeeping and then move into our first activity for the day.These are the objectives that mirror the questions. Given what we’ve planned, we’re going to do some housekeeping and then move into our first activity for the day.

    5. Goal: Improved Teaching and Student Learning The key to any cyclical process is closing the circle or cycle. Without that complete cycle, you often have short-circuited efforts. We spend a lot of time in the 1st 3 blocks, analyzing student performance, determining focus areas or goals. We may spend some time, in reviewing research but not as much as we’d like because we’ve run out of energy from the data analysis part of the cycle. Where we often fall short is in our planning to implement, planning to evaluate and then actually doing so!The key to any cyclical process is closing the circle or cycle. Without that complete cycle, you often have short-circuited efforts. We spend a lot of time in the 1st 3 blocks, analyzing student performance, determining focus areas or goals. We may spend some time, in reviewing research but not as much as we’d like because we’ve run out of energy from the data analysis part of the cycle. Where we often fall short is in our planning to implement, planning to evaluate and then actually doing so!

    6. Data teams aren’t a one time event. They involve a cyclical process with a series of feedback loops. Think about your thermostat. When you set it at a particular temperature, it turns on to heat up or cool off the house. When the room meets the ambient temperature you set, the system shuts off. When the ambient room temperature falls below what you set, the system kicks back on. It is a feedback loop. What would happen if the information about the temperature changes didn’t make it back to the thermostat? What consequences would result? DDDM represents a series of feedback loops within the overall cycle. It is a flexible process that is not intended to be mechanically implemented. We begin with global questions about performance. These questions lead us to look for patterns in performance. Which students are performing poorly? In what areas? What common characteristics do these students have (starting points? Grade level, mobility, discipline, attendance, etc.) Often looking at groups of kids to reveal patterns When we bring in the idea of looking globally for patterns, it allows you to identify issues that are related to what you can control, the cause data, rather than the demographics of students which you can’t change. The answers from each set of questions circle you back to the data for more answers, and in some cases, to different data to see if other information shows similar patterns. Along this path, you’ll need some tools to help you organize and interpret your data.Data teams aren’t a one time event. They involve a cyclical process with a series of feedback loops. Think about your thermostat. When you set it at a particular temperature, it turns on to heat up or cool off the house. When the room meets the ambient temperature you set, the system shuts off. When the ambient room temperature falls below what you set, the system kicks back on. It is a feedback loop. What would happen if the information about the temperature changes didn’t make it back to the thermostat? What consequences would result? DDDM represents a series of feedback loops within the overall cycle. It is a flexible process that is not intended to be mechanically implemented. We begin with global questions about performance. These questions lead us to look for patterns in performance. Which students are performing poorly? In what areas? What common characteristics do these students have (starting points? Grade level, mobility, discipline, attendance, etc.) Often looking at groups of kids to reveal patterns When we bring in the idea of looking globally for patterns, it allows you to identify issues that are related to what you can control, the cause data, rather than the demographics of students which you can’t change. The answers from each set of questions circle you back to the data for more answers, and in some cases, to different data to see if other information shows similar patterns. Along this path, you’ll need some tools to help you organize and interpret your data.

    7. Activity: Living Likert Scale Find the place along the continuum that most closely represents your level of agreement with the statement Strongly Agree Agree Neutral/Not Sure Disagree Strongly Disagree Living Likert, activity we use to jump start conversation around our existing conceptions and experience with a particular topic. It is like an anticipatory set that we’ll use to informally assess where we are starting from for the concepts and objectives of this event. Stress Why we are doing this….pre-assess, get a sense of perceptions and misconceptions of participants with regards to topic. How does this apply in your practice? Have poster board or chart paper with Strongly agree at one end and strongly disagree at the other end of the room. Make sure there is room to have the group line up on the continuum. This is a process tool that you can use as a leader or even as a classroom teacher to get staff or students to think about what they know and believe about a concept, an opinion, etc. In this case, we’re going to start with a big concept around use of assessments. Talk to each other as you arrange yourself on the continuum. Be prepared to share out why you are standing in the position you are standing relative to the two extremes. Living Likert, activity we use to jump start conversation around our existing conceptions and experience with a particular topic. It is like an anticipatory set that we’ll use to informally assess where we are starting from for the concepts and objectives of this event. Stress Why we are doing this….pre-assess, get a sense of perceptions and misconceptions of participants with regards to topic. How does this apply in your practice? Have poster board or chart paper with Strongly agree at one end and strongly disagree at the other end of the room. Make sure there is room to have the group line up on the continuum. This is a process tool that you can use as a leader or even as a classroom teacher to get staff or students to think about what they know and believe about a concept, an opinion, etc. In this case, we’re going to start with a big concept around use of assessments. Talk to each other as you arrange yourself on the continuum. Be prepared to share out why you are standing in the position you are standing relative to the two extremes.

    8. Tests designed for accountability are adequate for diagnosing students’ learning and instructional needs. Living Likert Scale: Strongly Agree to Strongly Disagree Have participants read the statement and line up along the Likert Scale according to their level of agreement. Take a few moments to have participants at different points on the scale talk about why they selected that location on the scale. In the given example, participants usually bring out the salient points regarding this statement. This leads into the next part of the session on quality assessment characteristics and the concept of purpose-dependent assessment. HINT: tell participants they can select a location based on conditions and they will get a chance to explain the conditions for their placement.Have participants read the statement and line up along the Likert Scale according to their level of agreement. Take a few moments to have participants at different points on the scale talk about why they selected that location on the scale. In the given example, participants usually bring out the salient points regarding this statement. This leads into the next part of the session on quality assessment characteristics and the concept of purpose-dependent assessment. HINT: tell participants they can select a location based on conditions and they will get a chance to explain the conditions for their placement.

    9. Accountability Exams Designed and valid for the stated purpose—accountability Monitored for reliability and classification accuracy Scores used for accountability at aggregate level—school, district, state Classification of student performance based on established levels that represent some consensus of expected student performance at each grade level Accountability exams are designed first and foremost to determine the extent to which students are achieving an established set of standards and skills. The validity and reliability of the exams for this purpose are well-established. When these results are also used at the instructional level to plan and guide instruction, the use of the results must be guided by knowledge of the tests purpose as well as the tests limitations!Accountability exams are designed first and foremost to determine the extent to which students are achieving an established set of standards and skills. The validity and reliability of the exams for this purpose are well-established. When these results are also used at the instructional level to plan and guide instruction, the use of the results must be guided by knowledge of the tests purpose as well as the tests limitations!

    10. Use and Consequences OAKS—used to measure the extent to which students are learning the state’s curriculum as described in the Oregon Academic Content Standards Also used to measure Essential skills of Oregon Diploma Project for reading, mathematics and part of writing Serves dual purpose of meeting federal and state accountability requirements OAKS stated purpose is definitive and provided by ODE in numerous locations in print and on the web-based materials.OAKS stated purpose is definitive and provided by ODE in numerous locations in print and on the web-based materials.

    11. Accountability assessments can be limited in diagnosing students’ instructional needs. Accountability exams are usually used summatively—to sum up progress in achieving established learning targets. Summative measures are designed for evaluative or accountability purposes. They are not intended for diagnostic purposes. Intended to identify broad strengths and weaknesses Informational slide— Additional Background: accountability exams designed to be used summatively. Limited because in development these exams usually have to balance time for administration (limited) with purpose of getting a valid and reliable score for accountability or evaluation. Most state CRTs were developed due to a call from educators for accountability exams that are built to their state standards (as opposed to the perception that NRTs are not). A CRT by definition is designed to be a deep test of content specific learning objectives. In theory, a CRT is designed to provide information at a deep level to determine students’ level of mastery on specific content, skills and subskills. In reality, due to time and money constraints, state CRTs generally don’t have enough items on any single skill or learning objective to provide the in depth information intended by a CRT. Therefore, the score information at a subscore level is not deep and not intended to diagnose student learning needs. Rather, state CRT scores can be used to classify students and give broad strengths and weaknesses that can be explored through further assessment activities. Informational slide— Additional Background: accountability exams designed to be used summatively. Limited because in development these exams usually have to balance time for administration (limited) with purpose of getting a valid and reliable score for accountability or evaluation. Most state CRTs were developed due to a call from educators for accountability exams that are built to their state standards (as opposed to the perception that NRTs are not). A CRT by definition is designed to be a deep test of content specific learning objectives. In theory, a CRT is designed to provide information at a deep level to determine students’ level of mastery on specific content, skills and subskills. In reality, due to time and money constraints, state CRTs generally don’t have enough items on any single skill or learning objective to provide the in depth information intended by a CRT. Therefore, the score information at a subscore level is not deep and not intended to diagnose student learning needs. Rather, state CRT scores can be used to classify students and give broad strengths and weaknesses that can be explored through further assessment activities.

    12. When reviewing state assessment data, we need to keep in mind the questions we want to answer. This feedback chain has four levels of questions that start globally, looking first at point in time performance. It is helpful to have a graphic organizer to summarize point in time performance. It is also helpful to think of student achievement measures along a continuum of performance in context.When reviewing state assessment data, we need to keep in mind the questions we want to answer. This feedback chain has four levels of questions that start globally, looking first at point in time performance. It is helpful to have a graphic organizer to summarize point in time performance. It is also helpful to think of student achievement measures along a continuum of performance in context.

    13. Putting Student Achievement in Context Measures of Point in Time Performance Measures of Improvement Measures of Growth Turn and talk at your table: Compare and contrast the three measures. When reviewing measures of student achievement it is helpful to think about 3 different ways of using the information. Measures of point in time performance, measures of improvement and measures of growth. Let’s start with some global assumptions about large scale or state level assessments. Process out comparisons and contrasts, this often brings up a lack of clear distinction between improvement and growth. Explain that there isn’t a fine line dividing them. It has to do with the way you use the information and the policy decisions that you apply to the measures. This is a helpful set for the information that follows. Test results to review for assessing student performance in context can be grossly categorized as point in time, improvement and growth. As you move from point in time to growth you get more information with more confidence in it for decisions. Before we consider what is meaningful we need to put measures of performance into context. When reviewing measures of student achievement it is helpful to think about 3 different ways of using the information. Measures of point in time performance, measures of improvement and measures of growth. Let’s start with some global assumptions about large scale or state level assessments. Process out comparisons and contrasts, this often brings up a lack of clear distinction between improvement and growth. Explain that there isn’t a fine line dividing them. It has to do with the way you use the information and the policy decisions that you apply to the measures. This is a helpful set for the information that follows. Test results to review for assessing student performance in context can be grossly categorized as point in time, improvement and growth. As you move from point in time to growth you get more information with more confidence in it for decisions. Before we consider what is meaningful we need to put measures of performance into context.

    14. Making Point in Time Performance Meaningful Describing Point in time performance Percent of students in performance levels Average or mean scores What others are you using? Challenge with point in time measures Determining meaningfulness of point in time performance How much confidence can you place in the score value? How much confidence can you place in the comparison of these scores to other scores? Measures of point in time performance reflect the performance of a student, a group of students, a grade level of students, etc. at a particular point in time. This information is collected and reported through state assessments, district assessments and classroom assessments. Current focus is on point in time performance relative to a set of standards or criteria (hence criterion-referenced test or CRT) as opposed to performance relative to others or a norm (norm-referenced test or NRT). What makes point in time performance meaningful? Putting the performance in context. The context comes from comparing your school’s or district’s performance with the expectations on a CRT and with others’ performance or your own previous performance (we’ll talk about this more in improvement). Think about the ruler that is used at the exit of a bank. The ruler provides a standard to compare the height of individuals to so that employees can estimate height more accurately in the event they have to describe a “suspect”. For example, how do you know if your school’s performance is acceptable? Exceeding expectations? Too low? You know this by looking at other schools’ performance and looking at state performance. Meaning comes when there is a basis for comparison. If your school has 85% meeting or exceeding standards that sounds like outstanding performance. If the exam results are from a criterion-referenced instrument. However, if all schools are at 85% meeting or exceeding standards then this accomplishment sounds more like achieving the average. Performance becomes meaningful when it is interpreted in context. Michael Phelps’ gold medals and world records represent a phenomenal accomplishment because what he did has never been done before. In a comparative context, he has achieved more than others before him or his contemporaries. He broke a record held by a single man for over 30 years. Also, think about the bank. At the exit doors to the bank a visual ruler is provided. This is for teller’s and other witnesses to gauge height of criminal as they make their getaway. Without that scale, description of the criminal’s height is subjective and perhaps not informative enough to allow for later identification. Likewise, district or school performance has meaning when it is interpreted in the context of what is expected and what is being achieved by others. Annual performance results can be described and reported. Meaning is attributed when you compare that performance to the standard and to others. There are also some cautions that must be taken in doing this. When interpreting the results of point in time performance you have to be careful not to draw inferences or conclusions that the data do not support. Reviewing state performance on average and then reviewing the distribution of performance of districts and schools provides a global picture of overall performance and the placement of the district within that global performance. Point in time performance is helpful, especially when reviewed in the context of statewide trends in performance. Another caution: Point in time performance for a student is much more volatile and subject to measurement error than aggregations of point in time performance. School/grade level or district/state grade level summaries are less volatile and subject to a standard error that is much smaller than the measurement error attributed to individual student scores. This is all about the n size or number of students in the calculations. The more students in the calculation, the lower the error. Standard error is a way of talking about the confidence you can have in a particular value falling within a particular range or “confidence band”. Measures of point in time performance reflect the performance of a student, a group of students, a grade level of students, etc. at a particular point in time. This information is collected and reported through state assessments, district assessments and classroom assessments. Current focus is on point in time performance relative to a set of standards or criteria (hence criterion-referenced test or CRT) as opposed to performance relative to others or a norm (norm-referenced test or NRT). What makes point in time performance meaningful? Putting the performance in context. The context comes from comparing your school’s or district’s performance with the expectations on a CRT and with others’ performance or your own previous performance (we’ll talk about this more in improvement). Think about the ruler that is used at the exit of a bank. The ruler provides a standard to compare the height of individuals to so that employees can estimate height more accurately in the event they have to describe a “suspect”. For example, how do you know if your school’s performance is acceptable? Exceeding expectations? Too low? You know this by looking at other schools’ performance and looking at state performance. Meaning comes when there is a basis for comparison. If your school has 85% meeting or exceeding standards that sounds like outstanding performance. If the exam results are from a criterion-referenced instrument. However, if all schools are at 85% meeting or exceeding standards then this accomplishment sounds more like achieving the average. Performance becomes meaningful when it is interpreted in context. Michael Phelps’ gold medals and world records represent a phenomenal accomplishment because what he did has never been done before. In a comparative context, he has achieved more than others before him or his contemporaries. He broke a record held by a single man for over 30 years. Also, think about the bank. At the exit doors to the bank a visual ruler is provided. This is for teller’s and other witnesses to gauge height of criminal as they make their getaway. Without that scale, description of the criminal’s height is subjective and perhaps not informative enough to allow for later identification. Likewise, district or school performance has meaning when it is interpreted in the context of what is expected and what is being achieved by others. Annual performance results can be described and reported. Meaning is attributed when you compare that performance to the standard and to others. There are also some cautions that must be taken in doing this. When interpreting the results of point in time performance you have to be careful not to draw inferences or conclusions that the data do not support. Reviewing state performance on average and then reviewing the distribution of performance of districts and schools provides a global picture of overall performance and the placement of the district within that global performance. Point in time performance is helpful, especially when reviewed in the context of statewide trends in performance. Another caution: Point in time performance for a student is much more volatile and subject to measurement error than aggregations of point in time performance. School/grade level or district/state grade level summaries are less volatile and subject to a standard error that is much smaller than the measurement error attributed to individual student scores. This is all about the n size or number of students in the calculations. The more students in the calculation, the lower the error. Standard error is a way of talking about the confidence you can have in a particular value falling within a particular range or “confidence band”.

    15. DIBELS Distribution Report How many total students were tested? How many students are in each classroom? For each measure, how many and what percent scored in each status category? (LR/Est, SR/Em, AR/Def) How many and what percent of students require benchmark, strategic and intensive instructional support?How many total students were tested? How many students are in each classroom? For each measure, how many and what percent scored in each status category? (LR/Est, SR/Em, AR/Def) How many and what percent of students require benchmark, strategic and intensive instructional support?

    16. Point in Time Status To understand this figure, imagine an elementary school with students in grades three through five tested annually. The rows represent test scores from successive years and the columns represent scores from students at successive grades. The ovals show that, for the Point In Time Status design, data are considered for just one year at a time. Average scores (e.g., reading scores) for each grade (second-grade, third-grade, etc.), or percents of students meeting some standard at each grade (e.g., percents with reading test scores at or above the proficient level) are accumulated over grade levels to arrive at a single number describing the entire school. The same calculations might or might not also be carried out separately for demographic subgroups within the school. There could be many variations. Point in Time Status designs is that test data for just one year at a time are considered. Such designs are also referred to as cross-sectional models. To understand this figure, imagine an elementary school with students in grades three through five tested annually. The rows represent test scores from successive years and the columns represent scores from students at successive grades. The ovals show that, for the Point In Time Status design, data are considered for just one year at a time. Average scores (e.g., reading scores) for each grade (second-grade, third-grade, etc.), or percents of students meeting some standard at each grade (e.g., percents with reading test scores at or above the proficient level) are accumulated over grade levels to arrive at a single number describing the entire school. The same calculations might or might not also be carried out separately for demographic subgroups within the school. There could be many variations. Point in Time Status designs is that test data for just one year at a time are considered. Such designs are also referred to as cross-sectional models.

    17. Another way of measuring performance is to think in terms of improvement Change in performance between two or more points in time. Improvement implies a positive change, but looking at change over time must include +, - and no change to describe Did our school improve student achievement compared to prior years? Were more students meeting or exceeding standards? Did our average scores increase over time? Does performance on standards vary over time? Are there gaps? Improvement can be described cross-sectionally and longitudinally. Cross-sectional data provide a snapshot or description of students’ performance at a particular point in time for a specific unit of analysis such as a classroom, grade, school or district. Typically, cross-sectional review of data includes comparing groups over time that do not consist of the same students. For example, if you are looking at grade 2 reading for the past 3 years, you are looking at 3 different groups of grade 2 students’ performance. The consistent unit is the grade level. Cross-sectional data from multiple years can be analyzed for patterns or trends over time. Generally, improvement refers to increases or decreases in student performance. This can be for an individual, a subgroup, grade level or any other higher aggregation. Improvement can be as simple as comparing the difference in percentages of students meeting or exceeding proficiency status between two or more years, or comparing changes in mean exam scores from year to year, to more complex gain matrices that incorporate various achievement components. In the Oregon Report Card, improvement is referred to as the change in two-year moving averages for math and reading. The change in attendance and dropout rate are also included under improvement for the report card. In Oregon’s AYP model, improvement falls under Safe Harbor. Safe Harbor is referred to as “Academic Growth” in Table 6 of the AYP Policy and Technical Manual. Most states referred to Safe Harbor as “growth” in their accountability plans and associated materials. This has led to a lot of confusion as states move toward implementing true growth models for achievement. Safe Harbor “growth” is simply looking at the increase in the number of students meeting proficiency standards from the prior to the current year. In my opinion, this is better described as improvement and not growth. Improvement questions you are trying to answer include Did our district improve student achievement compared to last year (or some other prior point)? How did our schools change performance as compared to the district or state? It is also important to recognize that improvement at the school level can be defined in many different ways that use cross-sectional or longitudinal data. Similarly, improvement at the student level can be defined in many different ways. How student improvement is measured and aggregated becomes a policy question that should be informed by statistical modeling to align the model with intentions of the policy. It is important to note that there is no line separating improvement and growth as two black and white separate concepts. You will hear the two terms used interchangeably in some contexts. That is why it is important to establish a common definition that will be used by ODE and others in the state to describe measures of student performance over time. Improvement can be described cross-sectionally and longitudinally. Cross-sectional data provide a snapshot or description of students’ performance at a particular point in time for a specific unit of analysis such as a classroom, grade, school or district. Typically, cross-sectional review of data includes comparing groups over time that do not consist of the same students. For example, if you are looking at grade 2 reading for the past 3 years, you are looking at 3 different groups of grade 2 students’ performance. The consistent unit is the grade level. Cross-sectional data from multiple years can be analyzed for patterns or trends over time. Generally, improvement refers to increases or decreases in student performance. This can be for an individual, a subgroup, grade level or any other higher aggregation. Improvement can be as simple as comparing the difference in percentages of students meeting or exceeding proficiency status between two or more years, or comparing changes in mean exam scores from year to year, to more complex gain matrices that incorporate various achievement components. In the Oregon Report Card, improvement is referred to as the change in two-year moving averages for math and reading. The change in attendance and dropout rate are also included under improvement for the report card. In Oregon’s AYP model, improvement falls under Safe Harbor. Safe Harbor is referred to as “Academic Growth” in Table 6 of the AYP Policy and Technical Manual. Most states referred to Safe Harbor as “growth” in their accountability plans and associated materials. This has led to a lot of confusion as states move toward implementing true growth models for achievement. Safe Harbor “growth” is simply looking at the increase in the number of students meeting proficiency standards from the prior to the current year. In my opinion, this is better described as improvement and not growth. Improvement questions you are trying to answer include Did our district improve student achievement compared to last year (or some other prior point)? How did our schools change performance as compared to the district or state? It is also important to recognize that improvement at the school level can be defined in many different ways that use cross-sectional or longitudinal data. Similarly, improvement at the student level can be defined in many different ways. How student improvement is measured and aggregated becomes a policy question that should be informed by statistical modeling to align the model with intentions of the policy. It is important to note that there is no line separating improvement and growth as two black and white separate concepts. You will hear the two terms used interchangeably in some contexts. That is why it is important to establish a common definition that will be used by ODE and others in the state to describe measures of student performance over time.

    18. Cross Section Across Years If you don’t have a data management tool that automatically provides multiple years or multiple grades then this is one suggested way to record and graphically represent the information for ease of interpretation. This is something you can consider suggesting to the data project in terms of what is being built for you technically If you don’t have a data management tool that automatically provides multiple years or multiple grades then this is one suggested way to record and graphically represent the information for ease of interpretation. This is something you can consider suggesting to the data project in terms of what is being built for you technically

    19. Measuring Improvement (cont.) Different methods exist for measuring improvement Test dependent—some exams do not support certain measures of improvement. i.e. Non-linear scales, vertical scales Challenge: How much improvement or lack of improvement is meaningful? Still relative—improvement compared to what? Measuring improvement can be as simple as looking at the difference in performance between two measurement opportunities. How do you use improvement data? What do you use to measure improvement? How do you determine if it is meaningful? The extent to which you can Measure improvement and the manner in which you measure it depends on the properties of the test. At the broadest level, when dealing with criterion-referenced tests, you can compare changes in performance levels over time. It is useful to look at three or more years to be able to determine if a trend exists. Two years do not make a trend. Oregon reports test results in RIT scores. These are scale scores designed to provide improvement information, specifically, these scores can be used to measure student growth over time in each academic area. These scores can also be averaged and the changes in average performance over time can be viewed. As Oregon settles on a growth model, there will be more opportunity to discuss the difference between measures of improvement and measures of growth. Measuring improvement can be as simple as looking at the difference in performance between two measurement opportunities. How do you use improvement data? What do you use to measure improvement? How do you determine if it is meaningful? The extent to which you can Measure improvement and the manner in which you measure it depends on the properties of the test. At the broadest level, when dealing with criterion-referenced tests, you can compare changes in performance levels over time. It is useful to look at three or more years to be able to determine if a trend exists. Two years do not make a trend. Oregon reports test results in RIT scores. These are scale scores designed to provide improvement information, specifically, these scores can be used to measure student growth over time in each academic area. These scores can also be averaged and the changes in average performance over time can be viewed. As Oregon settles on a growth model, there will be more opportunity to discuss the difference between measures of improvement and measures of growth.

    20. Improvement Figure 2 illustrates Grade Level Status Growth design. Unlike the Point In Time Status design, this design is a growth model, with a focus on year-to-year change in school performance. As in Figure 1, the rows and columns represent test scores from children tested annually in each of grades two through five. In Figure 2, however, vertical arrows have been added. These represent comparisons of scores earned by successive cohorts of second graders, third graders, etc. Thus, the scores of second graders in 2006 are compared to the scores of second graders in 2005, and similarly for other grades. The average score in 2005 might be subtracted from the average score in 2006 to determine how much higher (or lower) students at that grade level were scoring in 2006. Or, the percent proficient in 2005 might be subtracted from the percent proficient in 2006 to determine whether a higher proportion of students were meeting the standard in 2006. Note that the ovals representing this accumulation of scores across grade levels each cover a span of two years. The same calculations might or might not also be Longitudinal Tracking and carried out separately for demographic subgroups within the school. There could be many variations. The defining features of this design is first, that test data for at least two years at a time are considered, and second, that direct comparisons are made between the scores of different groups of students. The children who are second graders in 2006 are a different group from those who were second graders in 2005. Such designs are longitudinal at the school level, in that each school is observed for two years in a row. However, there is no linkage of individual students' scores over time. Figure 2 illustrates Grade Level Status Growth design. Unlike the Point In Time Status design, this design is a growth model, with a focus on year-to-year change in school performance. As in Figure 1, the rows and columns represent test scores from children tested annually in each of grades two through five. In Figure 2, however, vertical arrows have been added. These represent comparisons of scores earned by successive cohorts of second graders, third graders, etc. Thus, the scores of second graders in 2006 are compared to the scores of second graders in 2005, and similarly for other grades. The average score in 2005 might be subtracted from the average score in 2006 to determine how much higher (or lower) students at that grade level were scoring in 2006. Or, the percent proficient in 2005 might be subtracted from the percent proficient in 2006 to determine whether a higher proportion of students were meeting the standard in 2006. Note that the ovals representing this accumulation of scores across grade levels each cover a span of two years. The same calculations might or might not also be Longitudinal Tracking and carried out separately for demographic subgroups within the school. There could be many variations. The defining features of this design is first, that test data for at least two years at a time are considered, and second, that direct comparisons are made between the scores of different groups of students. The children who are second graders in 2006 are a different group from those who were second graders in 2005. Such designs are longitudinal at the school level, in that each school is observed for two years in a row. However, there is no linkage of individual students' scores over time.

    21. A new source of information: growth measures Another way to look at student performance---Did your students grow academically during the year? RIT scores are scaled to allow for student growth to be measured over time RIT scores are comparable within the same content area and grade across administrations Policy questions include What constitutes expected annual growth? What constitutes value added?

    22. Measuring growth Test dependent-test properties limit or enable growth measures Growth measured in different ways Change in performance class Change in scale score compared to expectation or to predicted value Equi-percentile—change in student position relative to starting position Value added—change over and above what is expected Change in performance class can be used to develop a measure of gains. Some states use a gain index to rate school or district change in performance. This is loosely described as growth because it takes into account changes in the number of students moving between performance levels from one year to the next. In this model schools are given credit for incremental change in student performance within and between performance classes. The disadvantage is that student level information that forms the basis for this model is simply the students scale score and performance level. “Gross level” model. Changes in scale score can provide information about growth if the test has been designed with vertical scales. In other words, the scale scores for each grade level vertically articulate a continuum of increasing content knowledge, skill and cognitive demand that is appropriate to the grade level. This is done through careful test design and standard setting. These scores provide great opportunity to measure student growth over time. The policy issues that must be addressed with measuring growth involve setting the expectation for annual change. How much growth is expected? Compared to what? A predicted value? A proficiency score? A student’s position relative to his/her starting position? Value added is another term that is thrown around in terms of growth. Value added is a form of a growth model that looks at how much change occurs beyond what is expected. This change beyond expectation is the value added. Change in performance class can be used to develop a measure of gains. Some states use a gain index to rate school or district change in performance. This is loosely described as growth because it takes into account changes in the number of students moving between performance levels from one year to the next. In this model schools are given credit for incremental change in student performance within and between performance classes. The disadvantage is that student level information that forms the basis for this model is simply the students scale score and performance level. “Gross level” model. Changes in scale score can provide information about growth if the test has been designed with vertical scales. In other words, the scale scores for each grade level vertically articulate a continuum of increasing content knowledge, skill and cognitive demand that is appropriate to the grade level. This is done through careful test design and standard setting. These scores provide great opportunity to measure student growth over time. The policy issues that must be addressed with measuring growth involve setting the expectation for annual change. How much growth is expected? Compared to what? A predicted value? A proficiency score? A student’s position relative to his/her starting position? Value added is another term that is thrown around in terms of growth. Value added is a form of a growth model that looks at how much change occurs beyond what is expected. This change beyond expectation is the value added.

    23. How different are the data from last year or last benchmark in the categories of Benchmark, Strategic and Intensive? How did students perform on each of the measures compared to last year at this time or from the beginning of the year?How different are the data from last year or last benchmark in the categories of Benchmark, Strategic and Intensive? How did students perform on each of the measures compared to last year at this time or from the beginning of the year?

    25. Growth: Cohort Growth Figure 3 illustrates Cohort Growth design. Like the Grade Level Status Growth, the CG design is a growth model. Whereas the Grade Level Status Growth design focused on change at the school level, the CG design focuses on change at the cohort level. It should be noted that individual growth is a model that is done at the Data Team Level. As before, rows and columns represent scores from an elementary school, organized by grade and year. In Figure 3, however, diagonal lines have been added. These represent the tracking of cohorts from one grade to the next. For example, most students in second grade in 2005 would be in third grade in 2006, fourth grade in 2007, and fifth grade in 2008. With CG, actual scores would typically be compared, rather than proficient/not proficient classifications. Different CG designs might track students for just one year (previous year to current year) or might reach back further in time to incorporate scores from two, three, or even more years earlier. The same calculations might or might not also be carried out separately for demographic subgroups within the school. There could be many variations. The defining features of all CG designs are first, that test data for two or more years at a time are considered, and second, that growth is measured for individual students. Such designs are longitudinal at the student level. Figure 3 illustrates Cohort Growth design. Like the Grade Level Status Growth, the CG design is a growth model. Whereas the Grade Level Status Growth design focused on change at the school level, the CG design focuses on change at the cohort level. It should be noted that individual growth is a model that is done at the Data Team Level. As before, rows and columns represent scores from an elementary school, organized by grade and year. In Figure 3, however, diagonal lines have been added. These represent the tracking of cohorts from one grade to the next. For example, most students in second grade in 2005 would be in third grade in 2006, fourth grade in 2007, and fifth grade in 2008. With CG, actual scores would typically be compared, rather than proficient/not proficient classifications. Different CG designs might track students for just one year (previous year to current year) or might reach back further in time to incorporate scores from two, three, or even more years earlier. The same calculations might or might not also be carried out separately for demographic subgroups within the school. There could be many variations. The defining features of all CG designs are first, that test data for two or more years at a time are considered, and second, that growth is measured for individual students. Such designs are longitudinal at the student level.

    26. Individual Student Growth The most detailed view of growth occurs at the student level. Student Cut Grade Score Growth Score “Gap” 3 190 - 204 14 4 206 16 211 5 5 211 5 218 7 6 219 8 2223 Which years are a “success” for the student? Given a student starting with a 190- at grade 3, the gap for the student is 14 (difference between cut score and actual score). If the student scores a 206 the next year, the student has grown in achievement of the content level standards. However, has the student grown enough? The student now has a gap of 5 points with the grade level cut score. The student experienced the most growth between grades 3 and 4, and grades 5 and 6, with the least growth from 4 to 5. Notice how the gap increased from grades 4 to 5. Given a student starting with a 190- at grade 3, the gap for the student is 14 (difference between cut score and actual score). If the student scores a 206 the next year, the student has grown in achievement of the content level standards. However, has the student grown enough? The student now has a gap of 5 points with the grade level cut score. The student experienced the most growth between grades 3 and 4, and grades 5 and 6, with the least growth from 4 to 5. Notice how the gap increased from grades 4 to 5.

    27. Sample Reading Growth Targets Growth targets will require the “gap” between a student’s score and benchmark to decrease over time, generally by about 40% each year. See slide 49 for the formula. This is how the formula plays out for students with different starting points. See slide 49 for the formula. This is how the formula plays out for students with different starting points.

    28. Oregon OAKS Risk The rules of the game: Tables can volunteer to answer before they see the question (with risk comes reward!) Members at the table may discuss their answer before presenting to the floor Correct answer from volunteer table nets 5 points for the table Incorrect answer opens the question to the floor Correct answer from the floor results in 2 points for the table answering the question Ask, who has had experience accessing the data available publicly on the ODE website? Who has accessed OAKS online private site? Make sure each table has someone who’s been in the system. We’re going to play risk. Have participants put their table number on their green table tent. Number them off. Explain the rules as given on the slide. Reward is situational. If you are doing this in a training day, then this event usually falls before lunch and you can dismiss winning groups to lunch first. Ask for question 1 volunteers. Ask, who has had experience accessing the data available publicly on the ODE website? Who has accessed OAKS online private site? Make sure each table has someone who’s been in the system. We’re going to play risk. Have participants put their table number on their green table tent. Number them off. Explain the rules as given on the slide. Reward is situational. If you are doing this in a training day, then this event usually falls before lunch and you can dismiss winning groups to lunch first. Ask for question 1 volunteers.

    29. Question 1 The latest publicly reported test results can be found under the following menu topic on the ODE website: Jobs Publications Reports Teaching and Learning Two of the above Three of the above Two of the above: Give extra points to tables for ID’ing the two—Reports and teaching and learning Two of the above: Give extra points to tables for ID’ing the two—Reports and teaching and learning

    30. Question 2 The latest publicly reported test results are available for all the years listed below except: 2006 - 2007 2008 - 2009 2003 - 2004 2007 - 2009 2009 - 2010 2008-09 Talk about why—difference between public reporting mechanism and OAKS, live dynamic system with constantly changing inputs in OAKS online. Current year data aren’t finalized. Public system has been through a process to vet final results prior to publication. Once published, it is set and usually not changed without tremendous effort or big errors. 2008-09 Talk about why—difference between public reporting mechanism and OAKS, live dynamic system with constantly changing inputs in OAKS online. Current year data aren’t finalized. Public system has been through a process to vet final results prior to publication. Once published, it is set and usually not changed without tremendous effort or big errors.

    31. Question 3 Subgroup results for percent and number of students in each performance category are available for all except the following: Migrant CLRAS Less than full academic year Race/ethnicity Gender Economically disadvantaged Less than full academic year This is an interesting variable to request since you don’t have it already. CLRAS is no longer given, but it is available for prior years, therefore it is in the filter on the public side of the site. Less than full academic year This is an interesting variable to request since you don’t have it already. CLRAS is no longer given, but it is available for prior years, therefore it is in the filter on the public side of the site.

    32. Question 4 Which of the following characteristics of point in time performance is important to use when interpreting scores? The number of students The color of the rows The performance categories One of the above Two of the above All of the above Two of the above, # of students and performance categories—give extra points for naming two correct answers Two of the above, # of students and performance categories—give extra points for naming two correct answers

    33. Question 5 Scale scores for the OAKS exam are called MAX scores Mean scores RIT scores Rater scores Explain where RIT comes from in terms of the acronym. Rasch unit based on the Rasch scale created using a Rasch model. Explain where RIT comes from in terms of the acronym. Rasch unit based on the Rasch scale created using a Rasch model.

    34. Question 6 RIT scores can be used to do all of the following except: Determine if a student meets standards Determine if a student has grown in Oregon content and skills Determine a student’s strengths and weaknesses in a content area Determine a student’s intervention needs Determine a student’s intervention needs. OAKS is a broad pointer stick. It is designed for accountability. It can give you a direction to further explore to uncover deeper learning and intervention needs.Determine a student’s intervention needs. OAKS is a broad pointer stick. It is designed for accountability. It can give you a direction to further explore to uncover deeper learning and intervention needs.

    35. Scale Scores: RIT Scores Oregon reports test scores in a scale score, or RIT score, based on the pattern of questions answered correctly and incorrectly compared to the total number and difficulty of the questions on the test. Scale ranges from 150 – 300 Scaled to allow for student growth to be measured over time RIT scores are comparable within the same content area and grade across administrations From technical manual: The RIT scale ranges from 150 - 300 and is similar in design to the scale used by the Scholastic Assessment Test (SAT) and American College Testing (ACT) college entrance exams. Since Oregon’s tests are vertically scaled, RIT scores, unlike raw scores, allow student growth to be measured over time. Rasch IRT calibration provides standardization of the item difficulties and a bias correction (Wright& Stone, 1979), while linking new items to the same scale as previously administered items. The RIT scale has a mean of 200 and a standard deviation of 10, and these RIT scores are comparable within the same content area and grade across administrations. A RIT score of 250 from one administration indicates the same level of examinee ability as a score of 250 from another administration. What does it mean to have scores that are vertically scaled? Think about your curriculum standards first. They are developed to represent a vertical continuum of increasing knowledge and skill complexity (cognitive domain) as students move from K-12. Now think about what we said was the purpose of the OAKS—to measure the extent to which students are learning the curriculum established in the Oregon Academic Content Standards. If we are accountable for students’ learning the content standards, and those standards represent a vertical continuum of increasing knowledge and skill complexity, then ideally we want a scale that reflects a measure of progress along that continuum. That is what a vertically scaled exam is designed to do, to measure progress along that continuum and communicate that progress in a scaled score so you can understand the magnitude of the change. RIT scores are vertically scaled. When a student increases in his/her scaled score, you can say that growth has occurred. However, what you can’t say is in what particular set of skills/subskills the student has grown. The score reporting categories give you a general point, but you will need more info for specific areas of growth. From technical manual: The RIT scale ranges from 150 - 300 and is similar in design to the scale used by the Scholastic Assessment Test (SAT) and American College Testing (ACT) college entrance exams. Since Oregon’s tests are vertically scaled, RIT scores, unlike raw scores, allow student growth to be measured over time. Rasch IRT calibration provides standardization of the item difficulties and a bias correction (Wright& Stone, 1979), while linking new items to the same scale as previously administered items. The RIT scale has a mean of 200 and a standard deviation of 10, and these RIT scores are comparable within the same content area and grade across administrations. A RIT score of 250 from one administration indicates the same level of examinee ability as a score of 250 from another administration. What does it mean to have scores that are vertically scaled? Think about your curriculum standards first. They are developed to represent a vertical continuum of increasing knowledge and skill complexity (cognitive domain) as students move from K-12. Now think about what we said was the purpose of the OAKS—to measure the extent to which students are learning the curriculum established in the Oregon Academic Content Standards. If we are accountable for students’ learning the content standards, and those standards represent a vertical continuum of increasing knowledge and skill complexity, then ideally we want a scale that reflects a measure of progress along that continuum. That is what a vertically scaled exam is designed to do, to measure progress along that continuum and communicate that progress in a scaled score so you can understand the magnitude of the change. RIT scores are vertically scaled. When a student increases in his/her scaled score, you can say that growth has occurred. However, what you can’t say is in what particular set of skills/subskills the student has grown. The score reporting categories give you a general point, but you will need more info for specific areas of growth.

    36. S-3 Cut scores available in Volume 3 of the technical report on the Oregon assessment system. Here is the cut for meets standards for subjects and grades. Supplemental materials tooS-3 Cut scores available in Volume 3 of the technical report on the Oregon assessment system. Here is the cut for meets standards for subjects and grades. Supplemental materials too

    37. Comparing performance at a global level using public results available at ODE Accountability/Reporting http://www.ode.state.or.us/search/results/?id=172 When you view these two districts grades 3 and 4 results, how would you describe the performance? What information is provided here for point in time performance? Compared to each other? Compared to the state? POINT IS TO STICK TO THE FACTS! Use objective terms to describe performance. Steer participants away from judgment words like “better”, “more effective”, these are inferences that can’t be supported from descriptive data provided. Take time to get a few people to respond. Watch for descriptions that infer beyond what the results support: Misconceptions when reviewing performance: “Both districts are better than the state in performance.” “District A has a higher percentage of students meeting or exceeding performance expectations than other districts in the region, therefore, District A is more effective than other districts in the region. “ Better is a judgment word that reveals an inference. District A and District B have a higher percent of students meeting or exceeding performance standards for grade 3. For grade 4, District A has a higher percent of students meeting or exceeding compared to District B and compared to the state. District B is performing similar to the state. What questions would you ask relative to these results so far? What more information would you want to get to help you understand their performance? Look at the pattern across all grades: Also make sure you check group sizes: From technical manual: “The smaller the group size, the larger the measurement error (standard error) associated with the results and the more caution required with interpretation.” We’’ talk more about error later. When you view these two districts grades 3 and 4 results, how would you describe the performance? What information is provided here for point in time performance? Compared to each other? Compared to the state? POINT IS TO STICK TO THE FACTS! Use objective terms to describe performance. Steer participants away from judgment words like “better”, “more effective”, these are inferences that can’t be supported from descriptive data provided. Take time to get a few people to respond. Watch for descriptions that infer beyond what the results support: Misconceptions when reviewing performance: “Both districts are better than the state in performance.” “District A has a higher percentage of students meeting or exceeding performance expectations than other districts in the region, therefore, District A is more effective than other districts in the region. “ Better is a judgment word that reveals an inference. District A and District B have a higher percent of students meeting or exceeding performance standards for grade 3. For grade 4, District A has a higher percent of students meeting or exceeding compared to District B and compared to the state. District B is performing similar to the state. What questions would you ask relative to these results so far? What more information would you want to get to help you understand their performance? Look at the pattern across all grades: Also make sure you check group sizes: From technical manual: “The smaller the group size, the larger the measurement error (standard error) associated with the results and the more caution required with interpretation.” We’’ talk more about error later.

    38. OAKS Performance Reports Transition back to OAKS. OAKS Online provides information that gives point in time performance, measures of improvement and measures of growth. This system provides some different views, as well as additional detail not available on the public reporting system. Transition back to OAKS. OAKS Online provides information that gives point in time performance, measures of improvement and measures of growth. This system provides some different views, as well as additional detail not available on the public reporting system.

    39. Online Score Report Table This initial report gives state and school or district level summary results for each of the achievement levels, the current year scale score, total count of students, as well as last year’s total count and scale score at the same time of year in prior year. An overview of the # of tests students took and total participation is also provided. The scale score is the RIT score. This report can be drilled down to student level by clicking on the hyperlinked district, institution and personnel links. This initial report gives state and school or district level summary results for each of the achievement levels, the current year scale score, total count of students, as well as last year’s total count and scale score at the same time of year in prior year. An overview of the # of tests students took and total participation is also provided. The scale score is the RIT score. This report can be drilled down to student level by clicking on the hyperlinked district, institution and personnel links.

    40. General Overview of Performance by Grade and Subject Start with overall performance and work your way down the funnel to the areas of focus. AT the global level you have the average highest scale score and the margin of error associated with it. The graphic also indicates the performance level that the score falls in. You can drill down to a school and teacher/personnel, or select a report to view a more detailed report on performance levels. Start with overall performance and work your way down the funnel to the areas of focus. AT the global level you have the average highest scale score and the margin of error associated with it. The graphic also indicates the performance level that the score falls in. You can drill down to a school and teacher/personnel, or select a report to view a more detailed report on performance levels.

    41. A visual representation of performance by achievement levels by grade and subject is given by graph (performance by level of aggregation) This chart provides a breakdown of the percentage of students at each performance level. Again, it is more global information regarding overall performance. The advantage of this view is the quick visual interpretation that is possible. Notice that the desirable performance levels are above the line and the undesirable levels are below the line. You can quickly assess where the bulk of your students’ lie in terms of meeting/exceeding or not. You also get a visual comparison to the state context. To dig deeper into the data you can look at the performance breakdown by score reporting categories. Before we do that, we need to address a common misconception about subcategory/strand/or subtest scores. This chart provides a breakdown of the percentage of students at each performance level. Again, it is more global information regarding overall performance. The advantage of this view is the quick visual interpretation that is possible. Notice that the desirable performance levels are above the line and the undesirable levels are below the line. You can quickly assess where the bulk of your students’ lie in terms of meeting/exceeding or not. You also get a visual comparison to the state context. To dig deeper into the data you can look at the performance breakdown by score reporting categories. Before we do that, we need to address a common misconception about subcategory/strand/or subtest scores.

    42. Table (Online Score Report) Available under Score Reports tab, Drill down on the district to school, grade, teacher and you will get this report. This report provides students’ total scores as well as their strand scores. Achievement level, RIT score and standard error of measurement are provided to aid in interpretation. The question marks indicate a student’s score and associated standard error of measurement prevent an achievement level from being assigned for the strand. This is due to the fact that the student’s score with the standard error of measurement crosses the cut point for two performance classes. This is helpful information. It lets you know when a student is not definitively in a particular category for a strand. Available under Score Reports tab, Drill down on the district to school, grade, teacher and you will get this report. This report provides students’ total scores as well as their strand scores. Achievement level, RIT score and standard error of measurement are provided to aid in interpretation. The question marks indicate a student’s score and associated standard error of measurement prevent an achievement level from being assigned for the strand. This is due to the fact that the student’s score with the standard error of measurement crosses the cut point for two performance classes. This is helpful information. It lets you know when a student is not definitively in a particular category for a strand.

    43. How would you respond to the following statement? Living Likert

    44. Subtest or strand scores are as reliable as the complete test score. Living Likert Scale: Strongly Agree to Strongly Disagree Find your place on the Likert scale based on your level of agreement. The intent of this statement is to reveal participants’ perceptions of subtest score use and reliability. Many different perceptions are revealed usually. Some participants confuse reliability (consistent scores over time) with validity (measuring what you intend to measure). The point of bringing up this discussion is to set the state for margin of error discussions. Also to help participants understand why it is important to be cautious with subscores. Find your place on the Likert scale based on your level of agreement. The intent of this statement is to reveal participants’ perceptions of subtest score use and reliability. Many different perceptions are revealed usually. Some participants confuse reliability (consistent scores over time) with validity (measuring what you intend to measure). The point of bringing up this discussion is to set the state for margin of error discussions. Also to help participants understand why it is important to be cautious with subscores.

    45. OAKS Online Longitudinal Reports Longitudinal reports are available under this tab. Longitudinal reports are available under this tab.

    46. Note this report provides the average scaled score for the grade level in selected subject for the school selected. The average scale score for all grade 3 students in 07 is compared to the average highest score for all grade 3 students this year. The black line indicates the RIT cut score for Meets Standards. Note the black line gives you the lower grade level proficiency cutpoint. In this case, the cut point is 205 for math at grade 3 In addition to the average, you are given a margin of error. This can be very helpful in determining if the performance from one year to the next is meaningfully different. IN 0708, you can be confident that the group average value fell within a range of 208.734 to 211.266. In 2008/09, the range of scores that fall within the margin of error are 212.725 to 215.275. For this school, given the margin of error, you can be confident both 0708 and 0809 average values are above the Meets Standards cut score. Margin of error is important concept to include in any interpretation because it helps to determine meaningful differences. If you add and subtract the margin of error to the school and state average scale score, you will get the range of values that represent the group’s performance. Also notice the difference in the size of the margin of error for the school versus that of the state. Remember, the margin of error is influenced by the size of the group, among other things, so the state margin of error is much smaller than the school’s. Note this report provides the average scaled score for the grade level in selected subject for the school selected. The average scale score for all grade 3 students in 07 is compared to the average highest score for all grade 3 students this year. The black line indicates the RIT cut score for Meets Standards. Note the black line gives you the lower grade level proficiency cutpoint. In this case, the cut point is 205 for math at grade 3 In addition to the average, you are given a margin of error. This can be very helpful in determining if the performance from one year to the next is meaningfully different. IN 0708, you can be confident that the group average value fell within a range of 208.734 to 211.266. In 2008/09, the range of scores that fall within the margin of error are 212.725 to 215.275. For this school, given the margin of error, you can be confident both 0708 and 0809 average values are above the Meets Standards cut score. Margin of error is important concept to include in any interpretation because it helps to determine meaningful differences. If you add and subtract the margin of error to the school and state average scale score, you will get the range of values that represent the group’s performance. Also notice the difference in the size of the margin of error for the school versus that of the state. Remember, the margin of error is influenced by the size of the group, among other things, so the state margin of error is much smaller than the school’s.

    47. Cohort Over Time This report represents a group of students that have been selected for a roster (by teacher or some other reason for grouping the students together over time). This group’s performance is provided in terms of average scale score, margin of error, and the percentage of these students proficient in 200708 and then in 2008/09 This is the same group of students except for those who must be dropped from the roster. Students dropped for various reasons (ask tony about this to be sure). What would be an advantage of using a cohort report? What would be a disadvantage? This report represents a group of students that have been selected for a roster (by teacher or some other reason for grouping the students together over time). This group’s performance is provided in terms of average scale score, margin of error, and the percentage of these students proficient in 200708 and then in 2008/09 This is the same group of students except for those who must be dropped from the roster. Students dropped for various reasons (ask tony about this to be sure). What would be an advantage of using a cohort report? What would be a disadvantage?

    48. Read the notes infine print ! Teachers should not base decisions and evaluations solely on data displayed in this online score reporting system. Comparative data reports are based on the number of students tested at the time and may not be representative of official results. FERPA prohibits the release of any personally identifiable information. OAKS system online has several small print reminders about how to use the information in the site and the responsibility attached to the privilege of access. Also note that there helpful information in the report interpretation guides provided by the department. OAKS system online has several small print reminders about how to use the information in the site and the responsibility attached to the privilege of access. Also note that there helpful information in the report interpretation guides provided by the department.

    49. Subtest Scores: Use with Caution! Subtest or category scores are based on fewer items Fewer items = greater measurement error = lower reliability of scores Use with caution Understand this limitation Triangulate—look for convergent evidence to develop interventions or instructional plans for students. By definition, subtest scores are comprised of a smaller subset of items than a complete test score. To understand why subtest scores are less reliable, you’ll need to learn a few reliability basics. Caution: Individual student item analysis subject to large measurement error and issues of validity! By definition, subtest scores are comprised of a smaller subset of items than a complete test score. To understand why subtest scores are less reliable, you’ll need to learn a few reliability basics. Caution: Individual student item analysis subject to large measurement error and issues of validity!

    50. Performance Levels for Subtests: Graph (performance by score reporting categories) In this report you get inferred student performance on each of the subcategories for the subject. These are inferred because the achievement levels are set on the overall subject scaled score, not on the categories. Therefore, the category break downs into achievement levels are inferred. Note that there are students for whom a performance level cannot be inferred. These are indicated by not enough information. Not Enough Information" means that a student's score is too close to the cut score to determine whether the score can be statistically classified as "Meeting" or "Not Meeting" state standards. Another factor is the number of items that a student answered within a strand, since a scale score tends to become statistically more precise as more questions are presented to a student. Students whose achievement level can’t be inferred are grouped into the not enough information group. This brings us to the concept of the margin of error. In this report you get inferred student performance on each of the subcategories for the subject. These are inferred because the achievement levels are set on the overall subject scaled score, not on the categories. Therefore, the category break downs into achievement levels are inferred. Note that there are students for whom a performance level cannot be inferred. These are indicated by not enough information. Not Enough Information" means that a student's score is too close to the cut score to determine whether the score can be statistically classified as "Meeting" or "Not Meeting" state standards. Another factor is the number of items that a student answered within a strand, since a scale score tends to become statistically more precise as more questions are presented to a student. Students whose achievement level can’t be inferred are grouped into the not enough information group. This brings us to the concept of the margin of error.

    51. When the data leave you adrift What do you do when you are not sure how to interpret the data? Can you be confident in your findings? To what degree? Tools for getting your bearings Margin of error at group level (standard error) Margin of error at student level (error of measurement) Bearings story northwest Atlantic, Labrador current, how do you find your position in the black ot the night? Use light houses, buoy’s or other landmarks. Take a compass heading for each, draw the lines on the chart from those headings and triangulate your position! The more information we have, the more confident we are that we’ve located our actual position. This is true with assessment information. More items, or more sources of information increase our confidence in the information. Bearings story northwest Atlantic, Labrador current, how do you find your position in the black ot the night? Use light houses, buoy’s or other landmarks. Take a compass heading for each, draw the lines on the chart from those headings and triangulate your position! The more information we have, the more confident we are that we’ve located our actual position. This is true with assessment information. More items, or more sources of information increase our confidence in the information.

    52. Why do you care about margin of error? Margin of error for group scores tells you how much confidence you can have that you’ve pinpointed the group’s actual performance (provides a range or band of scores) Influenced by size of group and the size of the range of the scores of students in the group Student level—provides a range of values that represent the range within which a student would score again. Influenced by number of items a student answers for a particular score category The more items answered, the smaller the margin of error Margin of Error helps you understand the volatility of the single value at a group or student level. Think about how percentage is impacted by the number in a group. 1 out 5 students changing to meets standards looks like a bigger change than 1 out of 100. Standard error of mean or average is influenced by the size of the group and it provides a range of values that the group’s actual performance is likely to fall. Standard error of mean and standard error of measurement. We’ll look at these in more context when we look at scale score reports with margins of error. Margin of Error helps you understand the volatility of the single value at a group or student level. Think about how percentage is impacted by the number in a group. 1 out 5 students changing to meets standards looks like a bigger change than 1 out of 100. Standard error of mean or average is influenced by the size of the group and it provides a range of values that the group’s actual performance is likely to fall. Standard error of mean and standard error of measurement. We’ll look at these in more context when we look at scale score reports with margins of error.

    53. Margin of Error in Context Note the scaled scores for this group fall close to the line that indicates the scaled score equal to meeting standards. If you add and subtract the margin of error to the school and state average scale score, you’ll notice that the margin of error crosses the line. In other words, the groups’ average has a range of values that include scaled score values above and below the cut line. Also notice the difference in the size of the margin of error for the school versus that of the state. Remember, the margin of error is influenced by the size of the group, among other things.Note the scaled scores for this group fall close to the line that indicates the scaled score equal to meeting standards. If you add and subtract the margin of error to the school and state average scale score, you’ll notice that the margin of error crosses the line. In other words, the groups’ average has a range of values that include scaled score values above and below the cut line. Also notice the difference in the size of the margin of error for the school versus that of the state. Remember, the margin of error is influenced by the size of the group, among other things.

    54. What about the margin of error for these students? Turn and talk about the implications of margin of error here! How would you use this information in your decision making about individual students? Turn and talk about the implications of margin of error here! How would you use this information in your decision making about individual students?

    55. Additional Reports: Class Roster Report This report allows you to identify strengths and weaknesses at the student level in overall performance and category performance. Don’t forget that there is error at the student level., hence the margins of error are provided. What do you notice about error at the student level? Standard error of measurement-estimate of how often you can expect test errors of a given size Another way of expressing reliability Determines the confidence you can place in interpretation of individual student results Smaller measurement error, higher reliability, greater confidence in results Important to understand existence of measurement error and the need to verify findings with other evidence of student performance or achievement. Should use classroom or other assessments to verify strengths and weaknesses initially identified in state assessment. If the student answered more items for a category, you can have more confidence in classifying the student into a particular performance level on a category. Trying to estimate the student’s “True Score”. The margin of error tells you the range within which it is likely to fall when you take into account the precision of the measurement. This report allows you to identify strengths and weaknesses at the student level in overall performance and category performance. Don’t forget that there is error at the student level., hence the margins of error are provided. What do you notice about error at the student level? Standard error of measurement-estimate of how often you can expect test errors of a given size Another way of expressing reliability Determines the confidence you can place in interpretation of individual student results Smaller measurement error, higher reliability, greater confidence in results Important to understand existence of measurement error and the need to verify findings with other evidence of student performance or achievement. Should use classroom or other assessments to verify strengths and weaknesses initially identified in state assessment. If the student answered more items for a category, you can have more confidence in classifying the student into a particular performance level on a category. Trying to estimate the student’s “True Score”. The margin of error tells you the range within which it is likely to fall when you take into account the precision of the measurement.

    56. Combined Student Report This report allows you to see the different subject scores for a student. Is this student strong/weak in all areas, or in only certain areas? This report allows you to see the different subject scores for a student. Is this student strong/weak in all areas, or in only certain areas?

    57. Individual Student Report This report gives you the students score and margin of error for each score reporting category. Notes from technical manual: “Student performance by strand should be interpreted as a relative indication of individual student strengths and weaknesses.” Why do you think this note is provided in the technical manual? This report gives you the students score and margin of error for each score reporting category. Notes from technical manual: “Student performance by strand should be interpreted as a relative indication of individual student strengths and weaknesses.” Why do you think this note is provided in the technical manual?

    58. Dig into your OAKS data (if K-2, use other data) Decide on subject area and grade level(s)/classroom level you want to investigate using your data Find and record pertinent information from your data (use S-4 to record your findings, or use another table structure that works for you) Organize your data to make sense of it Summarize your information (use S-3 through S-6 as guide) List your strong and/or weak areas on chart paper or S-5 and S-6 While you work, use your Table tents to alert us if you need clarification or assistance! 30 minutes, have them go through OAKS and if they want they can do the scavenger hunt to familiarize themselves with what is there. 30 minutes, have them go through OAKS and if they want they can do the scavenger hunt to familiarize themselves with what is there.

    59. What does global performance look like for your school? At each grade level? Look at multiple years. This is an organizer for participants that need a way to compile their data to get a sense of patterns at an aggregate level. This is not for everyone. Some groups will already have multi-year views of data. This is in the event the team/group is starting from numerous different sources of info. This is an organizer for participants that need a way to compile their data to get a sense of patterns at an aggregate level. This is not for everyone. Some groups will already have multi-year views of data. This is in the event the team/group is starting from numerous different sources of info.

    60. Review your data and summarize your strengths before we look at deeper levels: S-5 of supplemental materials. We’ll do two tables, one for strengths, one for weak areas. The idea here is to look at strengths so you can analyze for replication and to look at weak areas to develop strategies to intervene. Again, these are optional summarizing sheets. The idea is to finish the time period with a general sense of strong and week areas by grade level or classroom. Then to look deeper at the next two charts for student level patterns.S-5 of supplemental materials. We’ll do two tables, one for strengths, one for weak areas. The idea here is to look at strengths so you can analyze for replication and to look at weak areas to develop strategies to intervene. Again, these are optional summarizing sheets. The idea is to finish the time period with a general sense of strong and week areas by grade level or classroom. Then to look deeper at the next two charts for student level patterns.

    61. S-7 of supplemental materials. Review your student data and tentatively group students by areas of strength and weakness: Students may be low, middle or high in one area and at a different level in another area of the curriculum. Grouping students by shared strengths and weaknesses will allow you to get an overall picture of the differentiated instructional needs of your students. In addition, it may help you to determine what additional information you need to from other sources such as classroom or vendor based formative assessments. Use your class roster reports and ISRs as well as the individual student report over time. Start with your high performers. Are there areas where your high performers were all weak? If so, then this says something about instructional coverage or the alignment with interpretation of state. Then think about students in order of needs based on intensity of weaknesses in the area. This will help you with students near the top of one category and bottom of the other. Once you have students entered into the grid, think about any additional information you may want to bring to bear to make identifying differentiated needs more clear. S-7 of supplemental materials. Review your student data and tentatively group students by areas of strength and weakness: Students may be low, middle or high in one area and at a different level in another area of the curriculum. Grouping students by shared strengths and weaknesses will allow you to get an overall picture of the differentiated instructional needs of your students. In addition, it may help you to determine what additional information you need to from other sources such as classroom or vendor based formative assessments. Use your class roster reports and ISRs as well as the individual student report over time. Start with your high performers. Are there areas where your high performers were all weak? If so, then this says something about instructional coverage or the alignment with interpretation of state. Then think about students in order of needs based on intensity of weaknesses in the area. This will help you with students near the top of one category and bottom of the other. Once you have students entered into the grid, think about any additional information you may want to bring to bear to make identifying differentiated needs more clear.

    62. S-8 of supplemental materials. Review your student data and tentatively group students by areas of strength and weakness: Students may be low, middle or high in one area and at a different level in another area of the curriculum. Grouping students by shared strengths and weaknesses will allow you to get an overall picture of the differentiated instructional needs of your students. In addition, it may help you to determine what additional information you need to from other sources such as classroom or vendor based formative assessments. Use your class roster reports and ISRs as well as the individual student report over time. Start with your high performers. Are there areas where your high performers were all weak? If so, then this says something about instructional coverage or the alignment with interpretation of state. Then think about students in order of needs based on intensity of weaknesses in the area. This will help you with students near the top of one category and bottom of the other. Once you have students entered into the grid, think about any additional information you may want to bring to bear to make identifying differentiated needs more clear. S-8 of supplemental materials. Review your student data and tentatively group students by areas of strength and weakness: Students may be low, middle or high in one area and at a different level in another area of the curriculum. Grouping students by shared strengths and weaknesses will allow you to get an overall picture of the differentiated instructional needs of your students. In addition, it may help you to determine what additional information you need to from other sources such as classroom or vendor based formative assessments. Use your class roster reports and ISRs as well as the individual student report over time. Start with your high performers. Are there areas where your high performers were all weak? If so, then this says something about instructional coverage or the alignment with interpretation of state. Then think about students in order of needs based on intensity of weaknesses in the area. This will help you with students near the top of one category and bottom of the other. Once you have students entered into the grid, think about any additional information you may want to bring to bear to make identifying differentiated needs more clear.

    63. The funnel is another way of thinking about successive sources of information on student performance that bring you closer to figuring out how to target instruction and to whom you’ll need to provide additional support or enrichment. We’re going to start with the global summary information and work our way down to classroom and student level. Along the way, I’d like you to be thinking about the adult behaviors or actions that impact the information we’re gathering on student outcomes. The funnel is another way of thinking about successive sources of information on student performance that bring you closer to figuring out how to target instruction and to whom you’ll need to provide additional support or enrichment. We’re going to start with the global summary information and work our way down to classroom and student level. Along the way, I’d like you to be thinking about the adult behaviors or actions that impact the information we’re gathering on student outcomes.

    64. When we bring in the idea of looking globally for patterns, it allows you to identify issues that are related to what you can control, the cause data, rather than the demographics of students which you can’t change. AT this point we have information on patterns of performance. When we bring in the idea of looking globally for patterns, it allows you to identify issues that are related to what you can control, the cause data, rather than the demographics of students which you can’t change. AT this point we have information on patterns of performance.

    65. Assessment for the sake of assessment? Or, Assessment with a purpose! Are we assessing for the sake of assessment, or are we assessing purposefully and thoughtfully, in a manner that makes the time invested worth it?Are we assessing for the sake of assessment, or are we assessing purposefully and thoughtfully, in a manner that makes the time invested worth it?

    66. At the heart of assessment: We use tests or assessments to collect overt (visible) evidence to make inferences about covert (unseen) status of student skills and knowledge. Restated - we use a limited sample of test items so that we can generalize student performance on a content standard. Items reflect different ways of interpreting and operationalizing content standards. These may require different types of instruction. Read pages 3 and 4 in the book. Stand when you are finished reading. What are we doing at the heart of assessment? For example, the limited number of geometry problems on the test are selected from a larger pool of geometry problems to represent the behaviors or responses expected from a student that meets the geometry content standards at a particular grade level. Because we are using a limited sample of items to infer a students’ achievement in a content standard, we can place varying degrees of confidence in predictions based on what we infer from the assessment. How much confidence can we place in our inferences about student performance based on the sample of items we’ve used? This assumption is also an underpinning for growth models. The characteristics of the growth model will determine the degree of confidence you will place in predictions of expected growth and assessment of student growth. Read pages 3 and 4 in the book. Stand when you are finished reading. What are we doing at the heart of assessment? For example, the limited number of geometry problems on the test are selected from a larger pool of geometry problems to represent the behaviors or responses expected from a student that meets the geometry content standards at a particular grade level. Because we are using a limited sample of items to infer a students’ achievement in a content standard, we can place varying degrees of confidence in predictions based on what we infer from the assessment. How much confidence can we place in our inferences about student performance based on the sample of items we’ve used? This assumption is also an underpinning for growth models. The characteristics of the growth model will determine the degree of confidence you will place in predictions of expected growth and assessment of student growth.

    67. Assessment of Assessment Think about assessment and accountability cycles in your school. What do you have going on? When? Who does it impact? When does it impact? Segue into thinking about other assessments that schools are using in addition to the OAKS. Think about assessment and accountability cycles in your school. What assessments do you give at your school? In your classroom? When? What impacts your school? When does it impact your school? Segue into thinking about other assessments that schools are using in addition to the OAKS. Think about assessment and accountability cycles in your school. What assessments do you give at your school? In your classroom? When? What impacts your school? When does it impact your school?

    68. Assessment and Accountability Time Lines S-15 of supplemental materials. Briefly note the various assessments and the accountability points in an annual cycle for your school. Include state, district, and local assessment. What is happening, when is it happening? This goes on the top line. Then on the bottom line you will focus on what you are doing with your assessment results. What do you use? When do you use the results? With your team, create an assessment and accountability time line. Draw a single arrow. Put the assessment cycle above the line. Include assessment for learning and assessment of learning. Include standardized and informal assessment on the timeline. You may want to break the line into quarters or into 4 to 5 week segments. Detail the uses of your assessments on the bottom line. Think of accountability as what is done with the results. What decisions are made? What consequences are associated with those decisions? Who is impacted? To whom are the results communicated? We will revisit these charts throughout the next two days. Process this out key insights?S-15 of supplemental materials. Briefly note the various assessments and the accountability points in an annual cycle for your school. Include state, district, and local assessment. What is happening, when is it happening? This goes on the top line. Then on the bottom line you will focus on what you are doing with your assessment results. What do you use? When do you use the results? With your team, create an assessment and accountability time line. Draw a single arrow. Put the assessment cycle above the line. Include assessment for learning and assessment of learning. Include standardized and informal assessment on the timeline. You may want to break the line into quarters or into 4 to 5 week segments. Detail the uses of your assessments on the bottom line. Think of accountability as what is done with the results. What decisions are made? What consequences are associated with those decisions? Who is impacted? To whom are the results communicated? We will revisit these charts throughout the next two days. Process this out key insights?

    69. Stop and Reflect Revisit Assessment Timeline How would you classify the assessments you are using in your school? Which do you use the most? What purposes do they serve? At what intervals are they given? What do the adults do with the results? With what consequences? What do the students do with the results With what consequences? Select one of the assessments you placed on your timeline to investigate more deeply. Given what we’ve just talked about, take a few moments to discuss the following questions while revisiting your time line. What assessments are you using? Would you classify them as placement, diagnostic, evaluative? For what purposes are you actually using the assessments? Are you using a summative tool diagnostically, or vice versa? How often are you using the assessment? What are the adults doing with the results? Are they using them for decisions? Filing them? Using them to talk to students? Are there consequences for adults based on the results? What do the students do with the results? Are the results communicated to students? Do the students engage in reflection regarding their results? Are there consequences? Do students get placed in a course of study, or grouped for instruction? Are there unintended consequences? Supplemental note page _____Select one of the assessments you placed on your timeline to investigate more deeply. Given what we’ve just talked about, take a few moments to discuss the following questions while revisiting your time line. What assessments are you using? Would you classify them as placement, diagnostic, evaluative? For what purposes are you actually using the assessments? Are you using a summative tool diagnostically, or vice versa? How often are you using the assessment? What are the adults doing with the results? Are they using them for decisions? Filing them? Using them to talk to students? Are there consequences for adults based on the results? What do the students do with the results? Are the results communicated to students? Do the students engage in reflection regarding their results? Are there consequences? Do students get placed in a course of study, or grouped for instruction? Are there unintended consequences? Supplemental note page _____

    70. Analyzing Your Use of Assessment S – 16 of supplemental materials. Organize your analysis in this chart. S – 16 of supplemental materials. Organize your analysis in this chart.

    71. S-17 Just an organizer for you. As we look at the four suggested uses of classroom assessment, think about whether these elements are part of your classroom practice. If you are an administrator, how aware are you of these practices occurring in your classrooms? S-17 Just an organizer for you. As we look at the four suggested uses of classroom assessment, think about whether these elements are part of your classroom practice. If you are an administrator, how aware are you of these practices occurring in your classrooms?

    73. What happens when we are not clear on the standard or common curricular goal? Ma and Pa Kettle illustrate! Read pages 6 – 9 in Test Better Teach Better What happens when we’re not clear on our standards? Let’s look at an example with ma and pa kettle. Next part is optional if group needs deeper look or more exposure to this component. Take a look at pages 6-9 of Test Better Teach Better including the sample items. How do the items help explain or clarify the standard given? Think about test-triggered clarity. Think about teaching to the test. What happens when we’re not clear on our standards? Let’s look at an example with ma and pa kettle. Next part is optional if group needs deeper look or more exposure to this component. Take a look at pages 6-9 of Test Better Teach Better including the sample items. How do the items help explain or clarify the standard given? Think about test-triggered clarity. Think about teaching to the test.

    74. Today’s focus starts with a reminder that we have limited information from state data. You’ve used point in time performance, improvement and even growth data to determine your students areas of strengths and areas of need. Now it is time to determine if other data support your findings. Given what we’ve talked about in terms of measurement error and the concept of point in time, it is important to bring multiple sources of information to bear on student level instructional decisions, particularly where remediation or intervention are concerned as these efforts usually come at a cost to students’ time in general instruction. Let’s take a few moments to assess the state of assessment at your school. Today’s focus starts with a reminder that we have limited information from state data. You’ve used point in time performance, improvement and even growth data to determine your students areas of strengths and areas of need. Now it is time to determine if other data support your findings. Given what we’ve talked about in terms of measurement error and the concept of point in time, it is important to bring multiple sources of information to bear on student level instructional decisions, particularly where remediation or intervention are concerned as these efforts usually come at a cost to students’ time in general instruction. Let’s take a few moments to assess the state of assessment at your school.

    75. Discuss How is Test-triggered Clarity different from “Teaching to the Test”? Have participants discuss this question with their teams. Then bring the discussion to the larger group. Help them to differentiate using test items to get a clearer idea of the content and cognitive demand represented by the standards, versus teaching to the test. An example I use comes from early accountability tests we gave in Arkansas. Fifty percent of the strand score was attributed to an open response item. At grade four, the problem for numbers and operations strand required students to demonstrate whole number place value by using paper clips to represent the value. After reviewing test results and the released item, the demand for paper clips in Arkansas schools increased dramatically. However, place value with paper clips was missing the point. The next year the problem was place value once again, but this time the problem asked students to represent a value using playing cards. This brings me to the next point. Teachers who taught to the test were out of luck. Teachers who taught place value, providing students with multiple concrete and abstract experiences with it, were on point to help students succeed. A concept to keep in mind is generalizability. This is covered in pages 23-26 of the book. Have participants discuss this question with their teams. Then bring the discussion to the larger group. Help them to differentiate using test items to get a clearer idea of the content and cognitive demand represented by the standards, versus teaching to the test. An example I use comes from early accountability tests we gave in Arkansas. Fifty percent of the strand score was attributed to an open response item. At grade four, the problem for numbers and operations strand required students to demonstrate whole number place value by using paper clips to represent the value. After reviewing test results and the released item, the demand for paper clips in Arkansas schools increased dramatically. However, place value with paper clips was missing the point. The next year the problem was place value once again, but this time the problem asked students to represent a value using playing cards. This brings me to the next point. Teachers who taught to the test were out of luck. Teachers who taught place value, providing students with multiple concrete and abstract experiences with it, were on point to help students succeed. A concept to keep in mind is generalizability. This is covered in pages 23-26 of the book.

    76. Advantage of using tests to clarify curricular goals: More accurate task analysis-what are my students expected to know and do? When you begin with the end in mind you have a better chance of getting there! Can identify “enabling subskills” or “enabling knowledge” AKA unwrapping the standards! Clearer instruction and explanations More appropriate practice activities Test triggered clarity leads to these advantages. Test triggered clarity leads to these advantages.

    77. What about generalizability? Scenario 1, page 24 Clarify nature of curriculum content standard by analyzing measures used to assess standard. Look at various ways it is assessed. Teach toward the skills or knowledge a test represents, not toward test itself. Extend, apply, etc. for generalizability (for more on this tab pages 23-25). Have participants read the benchmark, testing tactics and instructional implications on page 24 of the book. See page 23 for set. Mention this is oversimplification for illustration only. This is going to lead to the discussion of test-triggered instruction where teachers get an idea of how the content standard is operationalized through reflecting on the test item and considering the cognitive demand of the task, as well as the skills needed to successfully answer the item. This should lead into a discussion about generalizable skill-master. Teaching to the test results in a narrow focus on specific tasks presented in the items. However, teaching toward test-represented targets considers several different ways the content standard is tested. The follow up is to determine the skills, knowledge, subskills and prior knowledge and cognitive demand of the task and to use this information to build instruction and classroom assessment. When teachers then use diverse methods of assessment of a content standard, then the teacher is seeking to get a fix on students’ ‘generalizable mastery. The more diverse the assessment techniques, the stronger the inferences you can draw about the cognitive demand your assessments are placing on your students. Once you have an idea about the cognitive demand of the task and your student’s readiness, you can use diverse, assessment grounded instructional methods to build generalizability of the skills and knowledge.Have participants read the benchmark, testing tactics and instructional implications on page 24 of the book. See page 23 for set. Mention this is oversimplification for illustration only. This is going to lead to the discussion of test-triggered instruction where teachers get an idea of how the content standard is operationalized through reflecting on the test item and considering the cognitive demand of the task, as well as the skills needed to successfully answer the item. This should lead into a discussion about generalizable skill-master. Teaching to the test results in a narrow focus on specific tasks presented in the items. However, teaching toward test-represented targets considers several different ways the content standard is tested. The follow up is to determine the skills, knowledge, subskills and prior knowledge and cognitive demand of the task and to use this information to build instruction and classroom assessment. When teachers then use diverse methods of assessment of a content standard, then the teacher is seeking to get a fix on students’ ‘generalizable mastery. The more diverse the assessment techniques, the stronger the inferences you can draw about the cognitive demand your assessments are placing on your students. Once you have an idea about the cognitive demand of the task and your student’s readiness, you can use diverse, assessment grounded instructional methods to build generalizability of the skills and knowledge.

    78. Advantages of assessing prior learning: Economizes instructional planning Many standards, not enough time, teach what is needed, not what is already known by students Gives teacher the lay of the landscape Diversity of learners Diversity of prior knowledge/readiness to learn Provides connections from which to build new knowledge and skills when you include key enabling skills and subskills or bodies of knowledge What do students already know? What needs to be taught? To whom? To what degree? Knowing these basic concepts is one thing, but ask yourself how intentional are you about doing this? Being intentional in your focus on what and how you do! The importance of pretesting or using instructionally diagnostic tests to plan instruction is more pressing now than ever. Most states have many grade level standards that teachers are responsible for teaching and students are responsible for learning. Pretesting to determine prior knowledge provides critical information to determine what students already know, what they don’t already know, to what extent they do know (at what cognitive level compared to the standard?) and the diversity of the instructional landscape relative to the unit in question. By assessing prior learning, you can reveal where students need scaffolding to establish readiness for learning the concepts and skills at the appropriate cognitive level to match the standards. The standards are imposing in depth and breadth. Therefore, you need tools to narrow the instructional focus. Pre-testing can assist in this. Additionally, students come to you at different levels of prior knowledge and with different learning styles. Pretesting helps you assess students’ readiness, who needs what and to what extent. For learners who’ve already gotten it, it allows you to meet their needs through enrichment or acceleration. Finally, you build connections to new learning from prior learning experiences to enhance students ability to embed the new learning. This is particularly important for students from disadvantaged and second language backgrounds. What do students already know? What needs to be taught? To whom? To what degree? Knowing these basic concepts is one thing, but ask yourself how intentional are you about doing this? Being intentional in your focus on what and how you do! The importance of pretesting or using instructionally diagnostic tests to plan instruction is more pressing now than ever. Most states have many grade level standards that teachers are responsible for teaching and students are responsible for learning. Pretesting to determine prior knowledge provides critical information to determine what students already know, what they don’t already know, to what extent they do know (at what cognitive level compared to the standard?) and the diversity of the instructional landscape relative to the unit in question. By assessing prior learning, you can reveal where students need scaffolding to establish readiness for learning the concepts and skills at the appropriate cognitive level to match the standards. The standards are imposing in depth and breadth. Therefore, you need tools to narrow the instructional focus. Pre-testing can assist in this. Additionally, students come to you at different levels of prior knowledge and with different learning styles. Pretesting helps you assess students’ readiness, who needs what and to what extent. For learners who’ve already gotten it, it allows you to meet their needs through enrichment or acceleration. Finally, you build connections to new learning from prior learning experiences to enhance students ability to embed the new learning. This is particularly important for students from disadvantaged and second language backgrounds.

    79. Assessing prior learning: What pre-assessments or screeners do you use? Brainstorm with your team a quick list of pre-assessment strategies. Discuss and share out. Have someone record a list popcorned out by participants.Discuss and share out. Have someone record a list popcorned out by participants.

    80. Advantage of using tests to determine how long to teach something: Economizes instructional planning Move on when students are ready, not when the unit planner indicates Many standards, not enough time, “steal” back time where possible Time saved in an easily mastered unit can be used for units with unexpected difficulty Who has gotten it? Is it time to move on? Who needs more time or different instruction? Do you have power or priority standards established? How do you determine your core standards. If this work is done alone, you have the problem of differing priorities and interpretations. Part of a collaborative effort to a vertical continuum and congruence across grades. Who has gotten it? Is it time to move on? Who needs more time or different instruction? Do you have power or priority standards established? How do you determine your core standards. If this work is done alone, you have the problem of differing priorities and interpretations. Part of a collaborative effort to a vertical continuum and congruence across grades.

    81. The Dipstick Assessment: How long do I need to teach this set of skills/concepts? Item-sampling method for quick assessment Different students complete different subsamples of items from your unit test (a couple of items each) Takes less than five minutes to administer to students Gives quick fix on status of entire class—not intended for inferences about individual students So how do you do it? “Flexible, en route test-guided instructional scheduling can allow your students to move on to fascinating application activities or delve more deeply into other content areas.” page 12.So how do you do it? “Flexible, en route test-guided instructional scheduling can allow your students to move on to fascinating application activities or delve more deeply into other content areas.” page 12.

    82. Simple, but powerful model… Note that the post test only model doesn’t factor in the students’ pre-instructional status. This makes it difficult to determine whether instruction impacted student learning, or other factors. Pretesting and then comparing post testing results to pretest results allows a teacher to determine his/her impact instructionally, on student learning. The student becomes his/her own “control” in this model. Where high mobility of students is an issue, you would analyze the cohort of students who were pre and post tested for evaluating your own impact. Evaluate all post test scores for all students to gauge student mastery of the content, but use the cohort pre to post test comparison to help you understand your instructional impact. This can also be a time saver because it wraps back to what we said assessing prior learning. You may already be doing this, but how intentional are you? How are you using the data teams/plcs to help you do this work? Note that the post test only model doesn’t factor in the students’ pre-instructional status. This makes it difficult to determine whether instruction impacted student learning, or other factors. Pretesting and then comparing post testing results to pretest results allows a teacher to determine his/her impact instructionally, on student learning. The student becomes his/her own “control” in this model. Where high mobility of students is an issue, you would analyze the cohort of students who were pre and post tested for evaluating your own impact. Evaluate all post test scores for all students to gauge student mastery of the content, but use the cohort pre to post test comparison to help you understand your instructional impact. This can also be a time saver because it wraps back to what we said assessing prior learning. You may already be doing this, but how intentional are you? How are you using the data teams/plcs to help you do this work?

    83. Essential assumptions in assessment: Validity — The test measures what you are trying to measure. Reliability — The sample of items or responses is large enough to make inferences regarding students’ skill development in a particular area. How consistently are the scores measuring what they is supposed to measure from one time to the next or between equivalent or alternative forms? These are underpinnings of validity and reliability. Valid and reliable measurement instruments (tests or assessments) are the most basic requirement for providing evidence to support decisions at the classroom and student level. Validity—does the test measure what you intended to measure? Of the entire “universe” of test items, does the test include items that measure the content, skills and cognitive demand it is intended to measure? adequate in amount…we assume we have a large enough sample of items to enable us to make statements regarding a student’s overall skill development in that area representative in area. . .i.e., the validity of the test…we also assume the test measures what it claims to measure. In other words, you can only collect a sample of information from each student in order to determine what the student knows and is able to do with that knowledge. This is called behavior sampling. A test is used to elicit responses that will give you an idea of the level of understanding the student has of the content represented in the test. The second assumption: The test collects adequate information to produce reliable results. Is the evidence reliable? There are several concepts in reliability that are important to understand for using assessments or designing your own. These are underpinnings of validity and reliability. Valid and reliable measurement instruments (tests or assessments) are the most basic requirement for providing evidence to support decisions at the classroom and student level. Validity—does the test measure what you intended to measure? Of the entire “universe” of test items, does the test include items that measure the content, skills and cognitive demand it is intended to measure? adequate in amount…we assume we have a large enough sample of items to enable us to make statements regarding a student’s overall skill development in that area representative in area. . .i.e., the validity of the test…we also assume the test measures what it claims to measure. In other words, you can only collect a sample of information from each student in order to determine what the student knows and is able to do with that knowledge. This is called behavior sampling. A test is used to elicit responses that will give you an idea of the level of understanding the student has of the content represented in the test. The second assumption: The test collects adequate information to produce reliable results. Is the evidence reliable? There are several concepts in reliability that are important to understand for using assessments or designing your own.

    84. What constitutes valid scores? Align assessment with: Objectives for instruction content and skills you plan to teach Instruction that preceded assessment content and skills you actually taught Decisions or conclusions resulting from data analysis What constitutes validity in creating an exam? In selecting one? Look for alignment. Curriculum is so large that you can’t hit it all, so prioritize. Instruction is targeted at the most important subsets of the curriculum, power standards. The assessment samples the content you taught.What constitutes validity in creating an exam? In selecting one? Look for alignment. Curriculum is so large that you can’t hit it all, so prioritize. Instruction is targeted at the most important subsets of the curriculum, power standards. The assessment samples the content you taught.

    85. Valid? Reliable? Both? Neither? S-19 Put this in supplemental and refer to this at this time for team exercise. What do validity and reliability look like? Have handout of this for them to work on individually, then they are to discuss with a partner. Then discuss next slide. The targets represent what you are trying to measure, standard, curricular goal, etc. The green lines are the scores on an assessment designed to measure the target. Think of these as four different tests of the same standard. A group of students with the same ability level take each of the four exams. The green dashes represent their scores on each of the four tests. What does each one represent in terms of validity and reliability?S-19 Put this in supplemental and refer to this at this time for team exercise. What do validity and reliability look like? Have handout of this for them to work on individually, then they are to discuss with a partner. Then discuss next slide. The targets represent what you are trying to measure, standard, curricular goal, etc. The green lines are the scores on an assessment designed to measure the target. Think of these as four different tests of the same standard. A group of students with the same ability level take each of the four exams. The green dashes represent their scores on each of the four tests. What does each one represent in terms of validity and reliability?

    86. Valid? Reliable? Both? Neither? Validity and reliability of scores or test results determine the extent to which you can making meaningful interpretations or inferences, as well as the degree of confidence you can place in the interpretations This is particularly important when using scores for prediction of future performance. Validity and reliability of scores or test results determine the extent to which you can making meaningful interpretations or inferences, as well as the degree of confidence you can place in the interpretations This is particularly important when using scores for prediction of future performance.

    87. Oregon Primer on Educational Assessment & Guide to Common Assessments Jigsaw Activity This guide will serve as a resource for this section. Divide and assign sections of the primer for each table to use. On chart paper, each table summarizes key points in their section. Direction on how guide is set up to be used, follow up to questions and discussion of the reports brought in as well as key points on reliability and validity will be elicited using the guide. This guide will serve as a resource for this section. Divide and assign sections of the primer for each table to use. On chart paper, each table summarizes key points in their section. Direction on how guide is set up to be used, follow up to questions and discussion of the reports brought in as well as key points on reliability and validity will be elicited using the guide.

    88. Summarize key points from Primer Pages 1 – 4 Finding Assessment Instruments Pages 4 – 8 Review and Selection of Assessments Pages 8 – 13 Use and Interpretation of Assessments Pages 13 – 18 Selected Issues in Analyzing Assessment Data Pages 139 – 147 Test Better Teach Better, Instructionally Supported Standards Based Tests Have groups work for 20 minutes to read and summarize on chart paper the major points from their reading. Groups will then select a reporter to share the information, or if everyone needs an up activity we can have them set up as a carousel. If more than five groups, then have tables pair up to share and then choose reporters. Then have each table discuss and add to their chart the answer to the following questions. “Why is it important to know this? How does it apply to your practice? Have groups work for 20 minutes to read and summarize on chart paper the major points from their reading. Groups will then select a reporter to share the information, or if everyone needs an up activity we can have them set up as a carousel. If more than five groups, then have tables pair up to share and then choose reporters. Then have each table discuss and add to their chart the answer to the following questions. “Why is it important to know this? How does it apply to your practice?

    89. To Sum Up: When selecting and implementing assessments to augment state and classroom formative assessment

    90. Important Questions to Ask What do you want to learn from the assessment? Who will use the assessment information? What steps will be taken as a result of the assessment? What decisions are the results intended to inform? What support structures or professional development are needed to ensure the assessment results are used as intended? How will student learning improve as a result of using the assessment system? Will it improve more than if the assessment system wasn’t used? These questions sum up the assessment framework process we’ve been engaged in for the day. These questions should be part of your consideration in acquiring or developing assessment systems. DIBELS: Idea of form effects because scales aren’t built for progress monitoring or adjusting for skill development over time. Parallel forms not available or insufficient in amount to avoid form effect. These questions sum up the assessment framework process we’ve been engaged in for the day. These questions should be part of your consideration in acquiring or developing assessment systems. DIBELS: Idea of form effects because scales aren’t built for progress monitoring or adjusting for skill development over time. Parallel forms not available or insufficient in amount to avoid form effect.

    91. Which DIBELS reports do I need??

    92. Testing/Teaching Connection: How will you alter instruction as a result of what you’ve learned through assessment? To answer this question and to provide a guide for building connected instruction and assessment plans, see the following: To answer this question and to provide a guide for building connected instruction and assessment plans, see the following:

    93. Data Team Process Collect and chart data Analyze strengths and obstacles Establish goals: set, review, revise Select instructional strategies Determine results indicators What impacts effectiveness of data teams and use of data to inform instruction? What roadblocks exist to prohibit effectiveness? Quick overview of the steps in process. How many of you use data teams or PLCs or some structure to analyze data? As a review, remember the intent of data teams and the expectations of data team meetings. An underlying issue is represented by the question. Data teams are effective in so far as the participants are knowledgeable about assessment properties and how to use assessment results. Strands 1 and 2 addressed concepts of data teams to drive the CIP and SIP process. The context for data teams in Strand 3 is on the use of classroom and student level data. Quick overview of the steps in process. How many of you use data teams or PLCs or some structure to analyze data? As a review, remember the intent of data teams and the expectations of data team meetings. An underlying issue is represented by the question. Data teams are effective in so far as the participants are knowledgeable about assessment properties and how to use assessment results. Strands 1 and 2 addressed concepts of data teams to drive the CIP and SIP process. The context for data teams in Strand 3 is on the use of classroom and student level data.

    94. Data Teams from Teacher’s View Examine student data collaboratively Develop instructional strategies and interventions Adjust teaching strategies Monitor results Rephrased for strand 3 purposes to reflect focus on data teams for reviewing student work. Rephrased for strand 3 purposes to reflect focus on data teams for reviewing student work.

    95. Data Teams Rephrased Examine student work collaboratively: Within grade levels For which subjects and standards is performance strong? Weak? What are the instructional implications? Across grade levels-vertical teams For which subjects and standards is performance strong? Weak? What are the curriculum and instructional implications? Data teams implies working collaboratively to answer important questions. What do you get from looking at these two pictures? The pictures represent the two extremes in terms of collaborative data teams. You can’t do it alone, and you can’t do it with too many people. In the first picture, the suggestion to collaborate alone is an oxymoron, in the second picture, you have an environment set up for many people to work individually, most likely on email, while taking up mutual space! Within grade level teams are usually smaller than across grade level teams, but both teams need to be of a size and combination of staff to represent the pertinent interests but maximize participation and facilitation toward mutual analysis, goal setting and action planning! What are the advantages of collaborating? Even the lone ranger had Tonto---collaboration is essential to data driven decisions. Different perspectives, experiences, expertise provide a collective venue to interpret data, share hunches, and plan actions. What do you have across grade? Vertical grades in your building? Vertical (ladder) teams in your system or district? Create a to do list that we’ll revisit throughout. Important questions become part of data teams work when you examine student work collaboratively. These questions reflect that important work. Data teams implies working collaboratively to answer important questions. What do you get from looking at these two pictures? The pictures represent the two extremes in terms of collaborative data teams. You can’t do it alone, and you can’t do it with too many people. In the first picture, the suggestion to collaborate alone is an oxymoron, in the second picture, you have an environment set up for many people to work individually, most likely on email, while taking up mutual space! Within grade level teams are usually smaller than across grade level teams, but both teams need to be of a size and combination of staff to represent the pertinent interests but maximize participation and facilitation toward mutual analysis, goal setting and action planning! What are the advantages of collaborating? Even the lone ranger had Tonto---collaboration is essential to data driven decisions. Different perspectives, experiences, expertise provide a collective venue to interpret data, share hunches, and plan actions. What do you have across grade? Vertical grades in your building? Vertical (ladder) teams in your system or district? Create a to do list that we’ll revisit throughout. Important questions become part of data teams work when you examine student work collaboratively. These questions reflect that important work.

    96. Collaborative teams provide vetting of analysis and ideas, as well as support for answering important questions: Develop interventions Who needs intervention? In which concepts and skills? Who needs enrichment? In which concepts and skills? Adjust teaching strategies What teaching behaviors need to change in core instruction? Intervention? Enrichment? What other adult behaviors need to change in core instruction? Intervention? Enrichment? Collaborative teams work together to manage the task of analyzing data to develop interventions and adjust teaching strategies. Collaborative teams work together to manage the task of analyzing data to develop interventions and adjust teaching strategies.

    97. Essential to Data Team Success: Monitor results Which adult indicators will you monitor to assess implementation of changes? How will you collect this information? With what frequency? Within what timeline? Which student indicators will you monitor to assess student outcomes? How will you collect this information? With what frequency? Within what timeline? Monitoring adult and student indicators will ensure that you are able to make inferences that connect student outcomes with the changes in adult practices. If you don’t monitor fidelity of implementation of changes in practice, then you can’t connect changes in student outcomes to these practices. Implementation fidelity starts with implementation effort!!! Deliberate planning to monitor implementation is a necessary step in successful implementation. Monitoring adult and student indicators will ensure that you are able to make inferences that connect student outcomes with the changes in adult practices. If you don’t monitor fidelity of implementation of changes in practice, then you can’t connect changes in student outcomes to these practices. Implementation fidelity starts with implementation effort!!! Deliberate planning to monitor implementation is a necessary step in successful implementation.

    98. Data Teams: Mechanical vs. Mastery How would you describe the way your data teams or PLCs are functioning? Are you functioning mechanically, Or, are you functioning with mastery? Use an example of mastery versus mechanical implementation such as the one below. Adopting science materials aligned with new curriculum efforts was a once in a career opportunity. Our district had been able to revise district curricular goals within the same year that we considered science adoption materials. We selected a careful mix of science kit materials, FOSS and STC (science and technology for children). We provided training on each kit for grade level teachers and sent them off to do excellent science instruction. We found something very important during the process of follow up. We had teachers who used the instructional materials teacher scripts as actual scripts. They read the teacher part and after students responded, moved on to the next teacher part, regardless of what the teacher learned from the student responses. The activities were completed, but with none of the real spirit of inquiry or the flexibility that is inherent in inquiry were present. We call that mechanical implementation. In contrast, we had teachers who used the materials as instructional guides and resources. Students were guided through inquiry experiences and the teacher facilitated their learning by responding to their questions and experiences with flexibility and fluency. This was mastery implementation. I’d like you to consider your current view of data teams. If you are going to be a part of mastery data teams, not just mechanical teams, then what do you need to make it happen? Use an example of mastery versus mechanical implementation such as the one below. Adopting science materials aligned with new curriculum efforts was a once in a career opportunity. Our district had been able to revise district curricular goals within the same year that we considered science adoption materials. We selected a careful mix of science kit materials, FOSS and STC (science and technology for children). We provided training on each kit for grade level teachers and sent them off to do excellent science instruction. We found something very important during the process of follow up. We had teachers who used the instructional materials teacher scripts as actual scripts. They read the teacher part and after students responded, moved on to the next teacher part, regardless of what the teacher learned from the student responses. The activities were completed, but with none of the real spirit of inquiry or the flexibility that is inherent in inquiry were present. We call that mechanical implementation. In contrast, we had teachers who used the materials as instructional guides and resources. Students were guided through inquiry experiences and the teacher facilitated their learning by responding to their questions and experiences with flexibility and fluency. This was mastery implementation. I’d like you to consider your current view of data teams. If you are going to be a part of mastery data teams, not just mechanical teams, then what do you need to make it happen?

    99. Data Teams Video View video and process out look fors. Positive collaboration, one person was recorder, one was data person, discussion was focused on student work. They used data to develop a goal and plan for instructional changes. They pre and post assessed. They adjusted based on progress and post assessment,they created an adjusted plan. Background: look into 909090 schools study and Chenowith “It’s being done” +View video and process out look fors. Positive collaboration, one person was recorder, one was data person, discussion was focused on student work. They used data to develop a goal and plan for instructional changes. They pre and post assessed. They adjusted based on progress and post assessment,they created an adjusted plan. Background: look into 909090 schools study and Chenowith “It’s being done” +

    100. Action Research Anchor data team practices in an action research framework to maximize potential for replication! Action research is a process that allows you to respond to patterns in the data with a disciplined inquiry. It is the discipline of the inquiry that will allow you to determine the causes that impact the patterns you observe.Action research is a process that allows you to respond to patterns in the data with a disciplined inquiry. It is the discipline of the inquiry that will allow you to determine the causes that impact the patterns you observe.

    101. Integrate action research into data teams process Data Teams Process: Examine student work collaboratively observe, hypothesize, predict Develop interventions hypothesize, predict Adjust teaching strategies test hypothesis Monitor resultsgather data, explain, observe We’ve worked through strengths, weaknesses, we’ve organized and analyzed. Now to move to replicating desired outcomes and eliminated undesired outcomes we need to refer back to the action research framework. If this framework is integrated into your data teams process, you’ll have a greater connection between intentions and results. The key is to be more intentional in the strategies you employ by using these advanced analysis tools to work with your data. We’ve worked through strengths, weaknesses, we’ve organized and analyzed. Now to move to replicating desired outcomes and eliminated undesired outcomes we need to refer back to the action research framework. If this framework is integrated into your data teams process, you’ll have a greater connection between intentions and results. The key is to be more intentional in the strategies you employ by using these advanced analysis tools to work with your data.

    102. Two Types of Data Effect Data – what students are producing Student Achievement results Various measures – State, District, School, Grade Level, Classroom Formative and Summative Example: The percentage of students who score Define EFFECT Data. Clear understanding will assist participants as they construct indicators as a part of their plans later in the workshop. Define EFFECT Data. Clear understanding will assist participants as they construct indicators as a part of their plans later in the workshop.

    104. How different are the data from last year or last benchmark in the categories of Benchmark, Strategic and Intensive? How did students perform on each of the measures compared to last year at this time or from the beginning of the year?How different are the data from last year or last benchmark in the categories of Benchmark, Strategic and Intensive? How did students perform on each of the measures compared to last year at this time or from the beginning of the year?

    105. Histogram What is the benchmark goal for this measure? What is the expectation for this time of year? What percent and what number of students have Established/Low Risk skills? Emerging/Some Risk skills? Deficit/At Risk skills?What is the benchmark goal for this measure? What is the expectation for this time of year? What percent and what number of students have Established/Low Risk skills? Emerging/Some Risk skills? Deficit/At Risk skills?

    107. Of the students who were intensive, strategic or benchmark in the beginning of the year, how many and what percent met middle of year goal? Middle goal for NWF = 50 sounds per min.Of the students who were intensive, strategic or benchmark in the beginning of the year, how many and what percent met middle of year goal? Middle goal for NWF = 50 sounds per min.

    109. Two Types of Data Cause Data – what the adults are doing Information based on the actions of adults in the system materials used curriculum chosen frequency of lessons duration of lessons instructional strategies Example: Forty seven high school math Teachers took part in week long hands-on math course emphasizing writing in the math classroom. Define CAUSE data. Clear understanding will assist participants as they construct indicators as a part of their plans later in the workshop.Define CAUSE data. Clear understanding will assist participants as they construct indicators as a part of their plans later in the workshop.

    110. Cause-Data Helps Us Know What teaching practices work to improve student achievement? What practices do not work as well as anticipated and need to be modified or discarded? How can current teaching practices be augmented to bemore effective? Collecting the cause data provides insight and possible explanations for the effect data. Without the cause data, the effect data is not as useful. For strategic teaching and leadership to occur, the generation of cause data to monitor is vital. Cause data are those ‘actions of the adults’ that influence student achievement. Samples include: The number of minutes of daily writing, reading, math instruction The number of common end of course assessments administered The % of teachers who have had specific professional development in the area of Data Teams The % of principals who collect and monitor Data Team information The frequency of Data Team meetings The number of teachers who use the prescribed curriculum on a regular basis The number of teachers who use the district recommended pacing chart on a regular basis The frequency of teams that examine reading, writing and math data results and establish new differentiated learning/instructional groups for specific concepts and skills The number of effective teaching strategies selected for specific support of key concepts and skills The number of teachers who know and understand the effective teaching techniques The number of meetings held for the purpose of making student achievement decisions where decision makers have relevant data The number of teachers who have developed performance assessments for units of study Collecting the cause data provides insight and possible explanations for the effect data. Without the cause data, the effect data is not as useful. For strategic teaching and leadership to occur, the generation of cause data to monitor is vital. Cause data are those ‘actions of the adults’ that influence student achievement. Samples include: The number of minutes of daily writing, reading, math instruction The number of common end of course assessments administered The % of teachers who have had specific professional development in the area of Data Teams The % of principals who collect and monitor Data Team information The frequency of Data Team meetings The number of teachers who use the prescribed curriculum on a regular basis The number of teachers who use the district recommended pacing chart on a regular basis The frequency of teams that examine reading, writing and math data results and establish new differentiated learning/instructional groups for specific concepts and skills The number of effective teaching strategies selected for specific support of key concepts and skills The number of teachers who know and understand the effective teaching techniques The number of meetings held for the purpose of making student achievement decisions where decision makers have relevant data The number of teachers who have developed performance assessments for units of study

    112. An example of cause data Teacher talk time constitutes cause data that can be used to understand how adult actions impact student learning. Teacher talk time is a proxy for student engagement. Teacher talk time constitutes cause data that can be used to understand how adult actions impact student learning. Teacher talk time is a proxy for student engagement.

    113. To effect change, understand factors that impact: Review the adult indicators that impact student effect data or student outcomes. Review the adult indicators that impact student effect data or student outcomes.

    114. Before you can hypothesize and predict, you have to start with your initial observations. At your table brainstorm antecedents and cause data that you can observe at your school Brainstorm means get ideas out quickly, no discussion, no judgement. 2 minutes to 3 minutes. Share out by table one or two examples. Example of 90 minute literature block. On paper looks good, but what is supposed to happen in those 90 minutes? How do you now it is happening/ Antecedent or cause data are Adult actions or behaviors possibly associated with outcomes One should be for strengths identified, the other for the weaknesses identified. Give teams 10 minutes to think about and complete these tables. Then have groups share out one cause/effect they hypothesized. Brainstorm means get ideas out quickly, no discussion, no judgement. 2 minutes to 3 minutes. Share out by table one or two examples. Example of 90 minute literature block. On paper looks good, but what is supposed to happen in those 90 minutes? How do you now it is happening/ Antecedent or cause data are Adult actions or behaviors possibly associated with outcomes One should be for strengths identified, the other for the weaknesses identified. Give teams 10 minutes to think about and complete these tables. Then have groups share out one cause/effect they hypothesized.

    115. Data tools and when to use them: Basic tools used to gather and categorize information Advanced tools used to provide insights to make difficult decisions Process tools for facilitating decision making You won’t be exploring data patterns without some helpful tools. When you are confused about what to use when, there are data tools that are helpful for gathering and organizing information, and then there are ways that we “treat” the data, advanced tools, that allow us to glean further insights. We’ll be working with data from assessments, tools or organizers for gathering and categorizing information and then some advanced tools as well. Underneath all of this is the need for enough understanding of what is meaningful in the data and what may not be and therefore does not require action or decision or concern. You won’t be exploring data patterns without some helpful tools. When you are confused about what to use when, there are data tools that are helpful for gathering and organizing information, and then there are ways that we “treat” the data, advanced tools, that allow us to glean further insights. We’ll be working with data from assessments, tools or organizers for gathering and categorizing information and then some advanced tools as well. Underneath all of this is the need for enough understanding of what is meaningful in the data and what may not be and therefore does not require action or decision or concern.

    116. Ishakawa Fishbone Analysis: Looking for related causes and antecedents that may relate to a specific effect or outcome S-10 of supplemental materials. At your table, using chart paper, do a fishbone cause and effect analysis on one of your areas of strength or weakness following the guidelines. The idea is to take one of the patterns of student performance you’ve uncovered from your data work and to analyze the possible factors impacting the student outcome. This can be a positive effect that you want to replicate, or a negative effect that you want to change. Stop at number 5. Example, advanced students underperforming over time. Causes on spines—focus on lower level objectives, lack of deliberate enrichment opportunities, motivation, classroom management, time. S-10 of supplemental materials. At your table, using chart paper, do a fishbone cause and effect analysis on one of your areas of strength or weakness following the guidelines. The idea is to take one of the patterns of student performance you’ve uncovered from your data work and to analyze the possible factors impacting the student outcome. This can be a positive effect that you want to replicate, or a negative effect that you want to change. Stop at number 5. Example, advanced students underperforming over time. Causes on spines—focus on lower level objectives, lack of deliberate enrichment opportunities, motivation, classroom management, time.

    117. Hishikawa Fishbone: Cause & Effect Diagram Have participants select an effect or student outcome they want to study further. This can be an undesirable outcome that they want to change, or it can be a desirable/positive outcome they’d like to replicate. Other examples of outcomes include: significant increase in scores from one grade to another for a group, or cohort. Patterns of improvement, patterns of decline, etc. …. Use the Ishikawa activity to brainstorm possible causes of outcome. Add details to spines. 10-12 minute activity, some groups have used as much as 15-20 minutes. Watch for productive activity, not disengaged activity. Work the room to make sure teams are on topic and progressing, not stumped on how to proceed. Answer questions or give help as needed. Get an idea of outcomes chosen. Close this part of the activity by having teams share the effect/outcome they selected. Remind them that they have opportunity to talk to other groups about similar effects. Use the gallery walk activity if time is available or if group really needs to see what others are doing to get true feel for it. Have participants select an effect or student outcome they want to study further. This can be an undesirable outcome that they want to change, or it can be a desirable/positive outcome they’d like to replicate. Other examples of outcomes include: significant increase in scores from one grade to another for a group, or cohort. Patterns of improvement, patterns of decline, etc. …. Use the Ishikawa activity to brainstorm possible causes of outcome. Add details to spines. 10-12 minute activity, some groups have used as much as 15-20 minutes. Watch for productive activity, not disengaged activity. Work the room to make sure teams are on topic and progressing, not stumped on how to proceed. Answer questions or give help as needed. Get an idea of outcomes chosen. Close this part of the activity by having teams share the effect/outcome they selected. Remind them that they have opportunity to talk to other groups about similar effects. Use the gallery walk activity if time is available or if group really needs to see what others are doing to get true feel for it.

    118. Moving from analysis to hypothesis Count the number of spines and divide by 2 This will indicate the number of votes each member has in setting priorities for the next step Vote on priorities (check marks, raised hands, post it votes, etc.) Focus on these priorities when we move into hypothesizing Some will lead you to look at antecedents, some will lead you to look at instructional variables and some will be lead you to focus on other cause data Nominal voting allows you to narrow the focus of your research efforts. Use this to determine priorities. Then hypothesize based on these priorities. Nominal voting allows you to narrow the focus of your research efforts. Use this to determine priorities. Then hypothesize based on these priorities.

    119. Integrate action research into data teams process Data Teams Process: Examine student work collaboratively observe, hypothesize, predict Develop interventions hypothesize, predict Adjust teaching strategies test hypothesis Monitor resultsgather data, explain, observe We’ve worked through strengths, weaknesses, we’ve organized and analyzed. Now to move to replicating desired outcomes and eliminated undesired outcomes we need to refer back to the action research framework. If this framework is integrated into your data teams process, you’ll have a greater connection between intentions and results. The key is to be more intentional in the strategies you employ by using these advanced analysis tools to work with your data. We’ve worked through strengths, weaknesses, we’ve organized and analyzed. Now to move to replicating desired outcomes and eliminated undesired outcomes we need to refer back to the action research framework. If this framework is integrated into your data teams process, you’ll have a greater connection between intentions and results. The key is to be more intentional in the strategies you employ by using these advanced analysis tools to work with your data.

    120. Moving from Data to Hypothesis What patterns do the data reveal? What do I see? Why do I think the patterns exist? Why do I think it is happening? What can I do about it? What may change it, or what may lead me to replicate it? Hypothesize: If I change…, or If I do ..., Then I think… will happen…. S-11 of supplemental materials. Use commentary and group response to the review of data to have groups hypothesize. Insert a few slides that illustrate a pattern that emerged in day 1. Have groups develop hypotheses and then have them share with a table partner, then share out with entire group a few examples. Use this as an opportunity to clarify what is meant by a hypothesis, good and not so good hypotheses in terms of what is testable via action research. Then move to the testing hypothesis phaseS-11 of supplemental materials. Use commentary and group response to the review of data to have groups hypothesize. Insert a few slides that illustrate a pattern that emerged in day 1. Have groups develop hypotheses and then have them share with a table partner, then share out with entire group a few examples. Use this as an opportunity to clarify what is meant by a hypothesis, good and not so good hypotheses in terms of what is testable via action research. Then move to the testing hypothesis phase

    121. Plan to Evaluate Your Hypothesis What do you need to do to evaluate your hypothesis? What are the antecedents? Identify the structural variables (grouping, scheduling) What specific instructional strategies will you use to increase student learning (compare/contrast, Cornell notes) What are the cause indicators that you will monitor and evaluate to determine the impact on student learning, how will you monitor and evaluate these? (teacher frequency of use of Cornell notes as strategy, teacher having students apply info from notes) Have groups use organizer to test one of their hypotheses answering the questions in the slide. Testing your hypothesis often involves multiple measures. Testing your hypothesis doesn’t always add to data collection from students. You may already be giving formative assessments at intervals to inform your hypothesis. The most important part of action research is in explicitly spelling out teacher/adult behaviors and indicators, and establishing controls that will allow you to draw stronger conclusions from your efforts. Have groups use organizer to test one of their hypotheses answering the questions in the slide. Testing your hypothesis often involves multiple measures. Testing your hypothesis doesn’t always add to data collection from students. You may already be giving formative assessments at intervals to inform your hypothesis. The most important part of action research is in explicitly spelling out teacher/adult behaviors and indicators, and establishing controls that will allow you to draw stronger conclusions from your efforts.

    122. Organizer for testing hypothesis S-13 of supplemental materials. These are the specific questions to have participants work through for evaluating their hypothesis. Many people will find they have to restructure or rephrase their hypothesis to make it measurable. Facilitate group work to help participants develop testable hypotheses and think through what measures they’ll use, particularly measures for the adult indicators. S-13 of supplemental materials. These are the specific questions to have participants work through for evaluating their hypothesis. Many people will find they have to restructure or rephrase their hypothesis to make it measurable. Facilitate group work to help participants develop testable hypotheses and think through what measures they’ll use, particularly measures for the adult indicators.

    123. Revisit Roadblocks to Data Teams—what tools and resources can you take from today? Data Teams Process: Examine student work collaboratively observe, hypothesize, predict Develop interventions hypothesize, predict Adjust teaching strategies test hypothesis Monitor results gather data, explain, observe Given the work we did at the beginning of the day, what tools and resources can you take from today’s activities to help remove some of the roadblocks? What learning has given you a new or different perspective on tools or resources available to you? Given the work we did at the beginning of the day, what tools and resources can you take from today’s activities to help remove some of the roadblocks? What learning has given you a new or different perspective on tools or resources available to you?

    124. Find your Classroom Assessment Analysis S_9 of supplemental materials. This is intended for Day 2, but it may be helpful, if teams are flowing with the data work, for them to continue through completion of this chart based on their tentative groupings of students by areas of strength and weakness: (see s-5 through s-8). Students may be low, middle or high in one area and at a different level in another area of the curriculum. Once you’ve split students out by strand, then step back and look for overall patterns among your groups of students. Grouping students by shared strengths and weaknesses allowed you to get an overall picture of the differentiated instructional needs of your students. You do this by noting patterns and outliers from yesterday’s work. We went from group data to classroom data yesterday, identifying individual student’s strand strenghts and weaknesses. Now we’re moving from the granular detail to describing patterns of performance. S_9 of supplemental materials. This is intended for Day 2, but it may be helpful, if teams are flowing with the data work, for them to continue through completion of this chart based on their tentative groupings of students by areas of strength and weakness: (see s-5 through s-8). Students may be low, middle or high in one area and at a different level in another area of the curriculum. Once you’ve split students out by strand, then step back and look for overall patterns among your groups of students. Grouping students by shared strengths and weaknesses allowed you to get an overall picture of the differentiated instructional needs of your students. You do this by noting patterns and outliers from yesterday’s work. We went from group data to classroom data yesterday, identifying individual student’s strand strenghts and weaknesses. Now we’re moving from the granular detail to describing patterns of performance.

    125. How does pattern analysis apply to instructional/assessment planning? How do patterns of performance help you with instructional/assessment planning? These generalizations are a rough guide to planning instruction. You can also take this down to a more detailed level. How do patterns of performance help you with instructional/assessment planning? These generalizations are a rough guide to planning instruction. You can also take this down to a more detailed level.

    126. Connecting Assessment to Instruction Identify a student from each group whose responses or performance are representative of the group’s performance Describe each representative performance Describe each student’s learning needs based on this detail of performance Determine how you will differentiate instruction Deal with the students whose performance in a category is fuzzy, needs more data to determine Let’s take the patterns of performance and get detailed in terms of the instructional and future assessment considerations. Let’s take the patterns of performance and get detailed in terms of the instructional and future assessment considerations.

    127. Describe representative performance Handout. And S-21 for completion. Note the level of description for representative performance. This is similar to building a scoring rubric. Identify examples of performance for each score category. Handout. And S-21 for completion. Note the level of description for representative performance. This is similar to building a scoring rubric. Identify examples of performance for each score category.

    128. Describe learning needs based on representative performance and curricular learning goals Handout and S-22 for completion From representative performance, you determine the implied learning needs of each group of students. The more detail, the more precise the instructional planning.Handout and S-22 for completion From representative performance, you determine the implied learning needs of each group of students. The more detail, the more precise the instructional planning.

    129. Determine differentiated learning strategies S-23 of supplemental materials. Based on the identified learning needs, think about how you can work with flexible groups of students, other teachers in your data team/grade level, etc. to plan instructional strategies to meet the needs identified in this analysis. You may be able to group students differently for different needs, use some whole class and some small group strategies or centers to deal with the needs. S-23 of supplemental materials. Based on the identified learning needs, think about how you can work with flexible groups of students, other teachers in your data team/grade level, etc. to plan instructional strategies to meet the needs identified in this analysis. You may be able to group students differently for different needs, use some whole class and some small group strategies or centers to deal with the needs.

    130. Planning Instructional Strategies Linking Strategies to Assessment Given what you expect students to know and be able to do, what strategies will you use for instruction? How will you know that both adults and students are implementing strategies? What evidence from teachers (lesson plan and assessment)? What evidence from students (engagement and assessment)? This is where released items can help you target expected actions. At the teaching level, this is the cause and effect data piece. Make sure you have a clear understanding of what the student will do, as well as what you will do as an instructor or facilitator. Identify content and cognitive demand expected in terms of student behaviors Focus on observable and measurable student characteristics or behaviors Simplest model: The student will [student behavior] [specific content]At the teaching level, this is the cause and effect data piece. Make sure you have a clear understanding of what the student will do, as well as what you will do as an instructor or facilitator. Identify content and cognitive demand expected in terms of student behaviors Focus on observable and measurable student characteristics or behaviors Simplest model: The student will [student behavior] [specific content]

    131. Identify instructional activities and student engagement strategies S-24. Use this sheet to plan for instruction aligned with the goal you’ve chosen. Pay attention to the verbs you identified. These should provide you with direction for activities and actions of teacher and student. S-24. Use this sheet to plan for instruction aligned with the goal you’ve chosen. Pay attention to the verbs you identified. These should provide you with direction for activities and actions of teacher and student.

    132. S-25 of supplemental materials. You can use Bloom’s tables to get verbs that are aligned with the level of cognitive demand you are looking to ellicit. Then plan instructional activities that align with the level of demand. S-25 of supplemental materials. You can use Bloom’s tables to get verbs that are aligned with the level of cognitive demand you are looking to ellicit. Then plan instructional activities that align with the level of demand.

    133. Resources for Research-based Instructional Strategies What Works Clearinghouse http://ies.ed.gov/ncee/wwc http://ies.ed.gov/ncee/wwc/pdf/wwc_practitioners_guide.pdf http://ies.ed.gov/ncee/wwc/reports/topic.aspx?tid=10 http://ies.ed.gov/ncee/wwc/reports/topic.aspx?tid=03 Doing What Works http://dww.ed.gov/index.cfm Center for Instruction http://www.centeroninstruction.org/ http://www.centeroninstruction.org/files/Assessment%20Guide.pdf S-26 of supplemental materials. This slide includes links to sites that provide teacher friendly proven and promising practices. Review what is available. WWC has numerous topic reports to review. Center on Instruction has resources for math, reading, science and in particular for ELL and Special needs students.S-26 of supplemental materials. This slide includes links to sites that provide teacher friendly proven and promising practices. Review what is available. WWC has numerous topic reports to review. Center on Instruction has resources for math, reading, science and in particular for ELL and Special needs students.

    134. Optional activity to bring it together. Bring components of data teams, action research together to plan for assessing your own instruction. Optional activity to bring it together. Bring components of data teams, action research together to plan for assessing your own instruction.

    135. How do you gauge and compare implementation (what you do) and outcomes (what students do as a result of your actions)? You need an advanced analysis tool for a process to tie it all together! This is the final summary activity to bring together student and adult data to look for relationships. This is the final summary activity to bring together student and adult data to look for relationships.

    136. Triangulation squared! A tool for viewing complex relationsihps. Triangulation squared! A tool for viewing complex relationsihps.

    137. S-27 of supplemental materials Select key variables or factors to plot on the chart. Explain that this can also be done in Excel or 2007 Word. Use the Radar Chart option. S-27 of supplemental materials Select key variables or factors to plot on the chart. Explain that this can also be done in Excel or 2007 Word. Use the Radar Chart option.

    138. Wagon Wheel Analysis Tool Select 3-8 variables and assign each to a spoke of the wheel. 1. % Demonstrating mastery reading to perform a task on classroom assessment 2. % ELA Lessons explicitly supporting reading to perform a task 3. % Science/Math/Social Studies instructional activities involving reading to perform a task 4. 100% minus percent of teacher talk time 5. % of strategies for scaffolding students far to go 6. % of ELA time spent on reading instruction Suggested variables involving assessment include: Student outcomes in terms of different performance groups of students on progress monitoring or formative assessment. Teacher implementation of formative assessment strategies, fidelity to grouping and differentiating instruction, fidelity to use of results, fidelity to feedback to students and to instructional changes. Look for patterns between the adult indicators and student outcomes. Principals or grade level teams could do this by teacher or grade level. Other adult or student indicators could be incorporated if hypothesized to be impacting outcomes. Suggested variables involving assessment include: Student outcomes in terms of different performance groups of students on progress monitoring or formative assessment. Teacher implementation of formative assessment strategies, fidelity to grouping and differentiating instruction, fidelity to use of results, fidelity to feedback to students and to instructional changes. Look for patterns between the adult indicators and student outcomes. Principals or grade level teams could do this by teacher or grade level. Other adult or student indicators could be incorporated if hypothesized to be impacting outcomes.

    139. Additional Steps Collect data on each variable Establish a scale for each spoke so that the highest performance value is on the outer rim of wheel Plot data on spokes using color coding for different entities (classroom, grade, school, etc.) Connect lines for each entity Identify the variables that show the largest gaps between the entity and outer rim.

    140. S-28 of supplemental materials Review the example for classroom level data. Point out how group B performance is lower, and the adult indicators are also lower. S-28 of supplemental materials Review the example for classroom level data. Point out how group B performance is lower, and the adult indicators are also lower.

    141. Data Teams + Action Research = Improvement Data Teams Process: Examine student work collaborativelyobserve, hypothesize, predict Develop interventions hypothesize, predict Adjust teaching strategies test hypothesis Monitor resultsgather data, explain, observe Given the work we’ve done for two days, what tools and resources can you take from today’s activities to help remove some of the roadblocks? What learning has given you a new or different perspective on tools or resources available to you? Summarize the two days learning or ask participants to think about the principles they’ve learned. Given the work we’ve done for two days, what tools and resources can you take from today’s activities to help remove some of the roadblocks? What learning has given you a new or different perspective on tools or resources available to you? Summarize the two days learning or ask participants to think about the principles they’ve learned.

More Related