Using Summative Data to Monitor Student Performance: Choosing appropriate summative tests.

Using Summative Data to Monitor Student Performance:Choosing appropriate summative tests. Presented by Philip Holmes-Smith School Research Evaluation and Measurement Services

Overview of the session • Diagnostic vs. Summative Testing • Choosing Appropriate Summative Tests • The reliability of summative (standardised) tests. • Choosing appropriate summative tests. • When should you administer summative tests?

Overview of • Diagnostic vs. Summative • Testing

Examples of Diagnostic Testing • Assessment tools such as: • Marie Clay Inventory, • English Online Assessment (Government schools), • Maths Online Assessment (Government schools), • SINE (CEO schools), • On-Demand Linear Tests, • Probe. • Teacher assessments such as: • teacher questioning in class, • teacher observations, • student work (including portfolios).

Diagnostic Testing • Research shows that our most effective teachers (in terms of improving the learning outcomes of students)constantly use feedback (including the use of diagnostic information to inform their teaching). • Hattie (2003, 2009)* shows that using feedback (including using diagnostic information about what each student can and can’t do to inform teaching) has one of the biggest impacts on improving student learning outcomes. 2003: http://www.acer.edu.au/documents/RC2003_Hattie_TeachersMakeADifference.pdf 2009: Hattie, John. (2009). Visible Learning: A synthesis of over 800 meta-analyses relating to achievement. NY: Routledge. *

Examples of Summative (Standardised) Testing • Government sponsored assessment tools such as: • NAPLAN, • English Online Assessment (Government schools), • On-Demand Adaptive Tests. • Other commercial tests such as: • TORCH, • PAT-R, • PAT-Math (together with I Can Do Maths).

Summative (Standardised) Testing • Summative testing is essential to monitor the effectiveness of your teaching. • But, research shows that summative tests do not lead to improved learning outcomes. As the saying goes: “You don’t fatten a pig by weighing it” • So, although it is essential, keep summative testing to a minimum.

2. Summative Tests

Summative (Standardised) Testing • Summative testing is essential to monitor the effectiveness of our teaching, but: • Is NAPLAN reliable for all students? • Are the other summative tests you administer reliable for all students? • We need to maximise the reliability of the tests we use to monitor the effectiveness of our teaching.

Summative (Standardised) Testing • Summative testing is essential to monitor the effectiveness of our teaching, but: • Do we currently gather enough information to monitor the effectiveness of our teaching of ALL students? e.g. • Year 3 NAPLAN reflects the effectiveness of your Prep-Yr2 teaching but what about the Prep teaching vs. Yr1 teaching vs. the Yr2 teaching? • Year 9 NAPLAN reflects the effectiveness of your Yr7-Yr8 teaching but what about the Yr 7 teaching vs. Yr 8 teaching? • We need choose appropriate summative tests to monitor the effectiveness of our teaching at all year levels from Prep – Yr10!

The Reliability of Summative Tests

Three Questions • Do you believe that your students’ NAPLAN and/or On-Demand results accurately reflect their level of performance?

Three Questions • Do you believe that your students’ NAPLAN and/or On-Demand results accurately reflect their level of performance? • If we acknowledge that the odd student will have a lucky guessing day or a horror day, what about the majority? • Have your weakest students received a low score? • Have your average students received a score at about expected level? • Have your best students received a high score?

Three Questions • Do you believe that your students’ NAPLAN and/or On-Demand results accurately reflect their level of performance? • If we acknowledge that the odd student will have a lucky guessing day or a horror day, what about the majority? • Have your weakest students received a low score? • Have your average students received a score at about expected level? • Have your best students received a high score? • Think about your students who received high and low scores: • Are your low scores too low? • Are your high scores too high?

High highs and Low lows Is this reading score reliable? Is this reading score reliable?

Item difficulties for a typical test

Summary Statements about Scores • Low scores (i.e. more than 0.5 VELS levels below expected) indicate poor performance but the actual values should be considered as indicative only (i.e. such scores are associated with high levels of measurement error). • High scores (i.e. more than 0.5 VELS levels above expected) indicate good performance but the actual values should be considered as indicative only. (i.e. such scores are associated with high levels of measurement error). • Average scores indicate roughly expected levels of performance and the actual values are more reliable (i.e. such scores are associated with lower levels of measurement error).

Choosing appropriate summative tests

Item Difficulties for Booklet 6 on the PAT-R (Comprehension) scale score scale Average Item Difficulty

Converting Raw test Scores Booklet 6 To PAT-R (Comprehension) scale score

Test difficulties of the PAR-R (Comprehension) Tests on the TORCH score scale together with Year Level mean scores

Different norm tables for different tests

Test difficulties of the PAT-Maths Tests on the PATM scale score scale together with Year Level mean scores Which is the best test for an average Year 4 student? Year 10 Year 8&9 Year 6&7 Year 5 Year 4 Year 3 Year 2 • Source: • ACER, 2006 Year 1

Test difficulties of the PAT-Maths Tests on the PATM scale score scale together with Year Level mean scores The best test for an average Year 4 student is probably Test 4 or 5 Year 10 Year 8&9 Year 6&7 Year 5 Year 4 Year 3 Year 2 • Source: • ACER, 2006 Year 1

Things to look for in a summative test • Needs to have a single developmental scale that shows increasing levels of achievement over all the year levels at your school. • Needs to have “norms” or expected levels for each year level (e.g. The National “norm” for Yr 3 students on TORCH is an average of 34.7). • Needs to be able to demonstrate growth from one year to the next (e.g. during Yr 4, the average student grows from a score of 34.7 in Yr 3 to an expected score of 41.4 in Yr 4 – that is 6.7 score points). • As a bonus, the test could also provides diagnostic information.

TORCH NORMS Norms for Year 3 to Year 10 on the TORCH scale 90th Percentile 50th Percentile 10th Percentile

My Recommended Summative Tests(Pen & Paper) • Reading Comprehension • Progressive Achievement Test - Reading(Comprehension) (PAT-R, 4th Edition) • TORCH and TORCH plus • Mathematics • Progressive Achievement Test - Mathematics (PAT-Maths, 3rd Edition) combined with the I Can Do Maths

Selecting the correct PAT-C Test

Selecting the correct TORCH Test

Selecting the correct PAT-Math/ICDM Test

My Recommended Summative Tests(On-Line) • On-Demand - Reading Comprehension • The 30-item “On-Demand” Adaptive Reading test • On-Demand - Spelling • The 30-item“On-Demand” Adaptive Spelling test • On-Demand - Writing Conventions • The 30-item“On-Demand” Adaptive Writing test • On-Demand – General English (Comprehension, Spelling & Writing Conventions) • The 60-item“On-Demand” Adaptive General English test • On-Demand - Mathematics (Number, Measurement, Chance & Data and Space) • The 60-item“On-Demand” Adaptive General Mathematics test • On-Demand - Number • The 30-item “On-Demand” Adaptive Number test • On-Demand – Measurement, Chance & Data • The 30-item “On-Demand” Adaptive Measurement, Chance & Data test • On-Demand - Space • The 30-item “On-Demand” Adaptive Space test

Choosing the right starting point is still important (even for “Adaptive” Tests)

Summative Testing and Triangulation • Even if you give the right test to the right student, sometimes, the test score does not reflect the true ability of the student – every measurement is associated with some error. • To overcome this we should aim to get at least three independent measures – what researchers call TRIANGULATION. • This may include: • Teacher judgment • NAPLAN results • Other pen & paper summative tests (e.g. TORCH, PAT-R, PAT-Maths, I Can Do Maths) • On-line summative tests (e.g. On-Demand ‘Adaptive’ testing, English Online)

Summative Testing and Triangulation • BUT remember, more summative testing does not lead to improved learning outcomes so keep the summative testing to a minimum

When should you administer summative tests?

Timing for Summative Testing • Should be done at a time when teachers are trying to triangulate on each student’s level of performance. (i.e. mid-year and end-of-year reporting time.) • Should be done at a time that enables teachers to monitor growth – say, every six months. (i.e. From the beginning of the year to the middle of the year and from the middle of the year to the end of the year.)

Suggested timing • For Year 1 – Year 6 and Year 8 – Year 10 • Early June (for mid-year reporting and six-monthly growth*) • Early November (for end-of-year reporting and six-monthly growth) • For Prep and Year 7 and new students at other levels • Beginning of the year (for base-line data) • Early June (for mid-year reporting and six-monthly growth) • Early November (for end-of-year reporting and six-monthly growth) * November results from the year before form the base-line data for the current year. (i.e. February testing is not required for Year 1 – Year 6 or for Year 8 – Year 10)

Using Summative Data to Monitor Student Performance: Choosing appropriate summative tests.