Michigan Assessment Consortium Common Assessment Development Series Module 14 – Presenting the Results of an Assessment. Developed by. Bruce R. Fay, PhD & Ellen Vorenkamp , EdD Assessment Consultants Wayne RESA. Support.
Michigan Assessment ConsortiumCommon Assessment Development SeriesModule 14 –Presenting the Resultsof an Assessment
Bruce R. Fay, PhD &
Ellen Vorenkamp, EdD
Assessment ConsultantsWayne RESA
The Michigan Assessment Consortium professional development series in common assessment development is funded in part by the Michigan Association of Intermediate School Administrators in cooperation with …
Developed a test (for use as a ‘common’ assessment)
Pilot / field-tested it (right?)
Looked at the field test results (of course)
Before you present the results of your test, you need to be clear about:
Who the audience is
Why they are seeing this data? (What?)
Why they should care about it? (So what?)
What you want them to do as a result of seeing it? (Now what?)
Many score types that you may have heard of are really only appropriate for Norm-Referenced Tests (NRT), such as percentile rank, stanine, and grade level equivalent.
Your common assessment is a Criterion-Referenced Test (CRT), so lets focus on score types that are appropriate for that.
Number of items correct or
Number of points earned
Q? What’s the difference?
A! None, if each item has the same point value, otherwise…
If each test item has the same “weight”, say 1 point (1 if correct, 0 if wrong) then % correct is:
The simplest scaled score you can create
The same as %points earned
Puts the raw score on a scale of 0 – 100
Can/will be misinterpreted
Can make a 10 point test and a 100 point test appear equally important
Widely held belief that scores in certain ranges (60-70, 70-80, etc.) have some inherent meaning
Q? Is 50% correct good or bad?
A!: We don’t know yet. We don’t discuss standard–setting until the next module (15).
But most people think it is intuitively obvious that this is a “bad” score.
Yes, but we don’t really need them…
This is our concern in this module
The next module (15) deals with performance standards
A test covering 5 GLCEs with 5 selected-response items per GLCE, with each item worth 1 point (25 points total).
Q? What does a raw score of 20 (a % correct scaled score of 80%) mean?
A! It depends
GLCE 1: 5/5
GLCE 2: 5/5
GLCE 3: 5/5
GLCE 4: 3/5
GLCE 4: 2/5
Same or different?
GLCE 1: 5/5
GLCE 2: 5/5
GLCE 3: 5/5
GLCE 4: 5/5
GLCE 5: 0/5
These 4 examples all have a raw score of 20 (80% correct) but represent 4 different performances by the students.
Remember, we haven’t set performance standards yet, so we really can’t say what these scores mean
Even so, 5 out 5 may suggest that a student knows the material and 0 out 5 may suggest that they don’t (depends on item-GLCE match)
However…even though this is a CRT, you can’t make instructional decisions without the context of the overall pattern of scores
There will often be extreme scores (outliers) that are not representative of most of the scores in a set.
Q? What if most of the students scored a 0 or a 1 on GLCE 5 in the example?
A! Maybe a picture would help
Or, I can see clearly now
Title & Subtitles
Data Source and Time Frame
Readability (3-D doesn’t make it better)
Stem & Leaf
Pie Charts (evil)
Note: This will be replaced with a table so it looks better
Here’s how the spreadsheet is set up
We have established that 3 out of 5 on each standard is an acceptable standard of evidence that a student understands the GLCE in question (hey, these were hard items)
Then students who score a 3, 4, or 5 on the cluster of items for GLCE can be considered “proficient” while students with a 2, 1, or 0 are not.
This is what the previous data looks like in table form.
Would a picture help?
Note: this will be replaced with a table so it looks better
If you test the same content on more than one occasion, you can look at your test results over time.
As an example, lets look at test results for our class of 25 students on a pre-test, two intermediate tests, and a post-test covering the same five GLCEs. We will look only at GLCE 1, with 5 points possible each time.
(This is a somewhat idealized example), but interpret it with caution!
Note: This will be replaced with a table for better viewing