Michigan Assessment Consortium
Download
1 / 38

Developed by - PowerPoint PPT Presentation


  • 115 Views
  • Uploaded on

Michigan Assessment Consortium Common Assessment Development Series Module 14 – Presenting the Results of an Assessment. Developed by. Bruce R. Fay, PhD & Ellen Vorenkamp , EdD Assessment Consultants Wayne RESA. Support.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Developed by' - venus


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Michigan Assessment ConsortiumCommon Assessment Development SeriesModule 14 –Presenting the Resultsof an Assessment


Developed by

Developed by

Bruce R. Fay, PhD &

Ellen Vorenkamp, EdD

Assessment ConsultantsWayne RESA


Support
Support

The Michigan Assessment Consortium professional development series in common assessment development is funded in part by the Michigan Association of Intermediate School Administrators in cooperation with …


In module 14 you will learn about
In Module 14 you will learn about

  • Score types…

  • Standards-based reports…

  • Graphical Representations…


So you ve
So, you’ve…

Developed a test (for use as a ‘common’ assessment)

Pilot / field-tested it (right?)

Looked at the field test results (of course)

Now what?


Presenting your results
Presenting Your Results

Before you present the results of your test, you need to be clear about:

Who the audience is

Why they are seeing this data? (What?)

Why they should care about it? (So what?)

What you want them to do as a result of seeing it? (Now what?)



A score by any other name
A score by any other name

Many score types that you may have heard of are really only appropriate for Norm-Referenced Tests (NRT), such as percentile rank, stanine, and grade level equivalent.

Your common assessment is a Criterion-Referenced Test (CRT), so lets focus on score types that are appropriate for that.


Raw scores
Raw Scores

Number of items correct or

Number of points earned

Q? What’s the difference?

A! None, if each item has the same point value, otherwise…


Scaled score equal weight
Scaled Score(equal weight)

If each test item has the same “weight”, say 1 point (1 if correct, 0 if wrong) then % correct is:

The simplest scaled score you can create

The same as %points earned

Puts the raw score on a scale of 0 – 100


Scaled score unequal weight
Scaled Score(unequal weight)

  • If each test item does not have the same number of points (there are weighted and/or partial credit items on the test) then

    • % correct becomes % of total possible points earned

    • You still end up with a 0 – 100 scale


Correct features issues
% Correct Features (Issues)

Features

Issues

Can/will be misinterpreted

Can make a 10 point test and a 100 point test appear equally important

Widely held belief that scores in certain ranges (60-70, 70-80, etc.) have some inherent meaning

  • A “common” scale, as in “widely used”

  • A “common” scale, as in “the same regardless of raw score points”

  • Intuitively interpretable (maybe)

  • Permits comparisons between different tests


Correct interpretation
% Correct Interpretation

Q? Is 50% correct good or bad?

A!: We don’t know yet. We don’t discuss standard–setting until the next module (15).

But most people think it is intuitively obvious that this is a “bad” score.


Other ways to scale
Other ways to scale?

Yes, but we don’t really need them…



Two kinds of standards
Two kinds of “standards”

  • Content Standards

  • The definition of the content to be learned; what students are to know and be able to do

  • Performance Standards

  • The definition of how good is good enough on a test to determine if, or the extent to which, students know and can do


Reporting by content standards
Reporting byContent Standards

This is our concern in this module

The next module (15) deals with performance standards


Let s consider
Let’s consider…

A test covering 5 GLCEs with 5 selected-response items per GLCE, with each item worth 1 point (25 points total).

Q? What does a raw score of 20 (a % correct scaled score of 80%) mean?

A! It depends


Depends on what
Depends on What?

StudentA

StudentB

GLCE 1: 5/5

GLCE 2: 5/5

GLCE 3: 5/5

GLCE 4: 3/5

GLCE 4: 2/5

  • GLCE 1: 4/5

  • GLCE 2: 4/5

  • GLCE 3: 4/5

  • GLCE 4: 4/4

  • GLCE 5: 4/5

Same or different?


How about these two
How about these two?

StudentC

StudentD

GLCE 1: 5/5

GLCE 2: 5/5

GLCE 3: 5/5

GLCE 4: 5/5

GLCE 5: 0/5

  • GLCE 1: 5/5

  • GLCE 2: 5/5

  • GLCE 3: 4/5

  • GLCE 4: 3/5

  • GLCE 5: 3/5

These 4 examples all have a raw score of 20 (80% correct) but represent 4 different performances by the students.



Scores by standard
Scores by “Standard”

Remember, we haven’t set performance standards yet, so we really can’t say what these scores mean

Even so, 5 out 5 may suggest that a student knows the material and 0 out 5 may suggest that they don’t (depends on item-GLCE match)

However…even though this is a CRT, you can’t make instructional decisions without the context of the overall pattern of scores


Say what
Say what?

There will often be extreme scores (outliers) that are not representative of most of the scores in a set.

Q? What if most of the students scored a 0 or a 1 on GLCE 5 in the example?

A! Maybe a picture would help


Graphical representations

Or, I can see clearly now

Graphical representations


Guidelines for good graphs
Guidelines for Good Graphs

Title & Subtitles

Data Source and Time Frame

Axis Labels

Legend

Viewable Colors

Readability (3-D doesn’t make it better)


Appropriate type
Appropriate Type

Bar Graphs

Line Graphs

Scatterplots

Stem & Leaf

Pie Charts (evil)


Results for 25 students scoring at each score point for each glce
Results for 25 students(# scoring at each score point for each GLCE)


The data
The Data

Note: This will be replaced with a table so it looks better

Here’s how the spreadsheet is set up


Let s assume
Let’s Assume…

We have established that 3 out of 5 on each standard is an acceptable standard of evidence that a student understands the GLCE in question (hey, these were hard items)

Then students who score a 3, 4, or 5 on the cluster of items for GLCE can be considered “proficient” while students with a 2, 1, or 0 are not.


Proficiency by standard for 25 students
Proficiency by Standard(for 25 Students)

This is what the previous data looks like in table form.

Would a picture help?


Proficiency by standard for 25 students1
Proficiency by Standard(for 25 Students)


Here s the data
Here’s the data

Note: this will be replaced with a table so it looks better


Repeated measures
Repeated Measures

If you test the same content on more than one occasion, you can look at your test results over time.

As an example, lets look at test results for our class of 25 students on a pre-test, two intermediate tests, and a post-test covering the same five GLCEs. We will look only at GLCE 1, with 5 points possible each time.


The data results for 25 students on 4 tests by score point
The Data – Results for 25 students on 4 tests by score point

(This is a somewhat idealized example), but interpret it with caution!


And here s the picture results for 25 students on 4 tests by score point
And here’s the picture – Results for 25 students on 4 tests by score point


The excel spreadsheet
The Excel spreadsheet

Note: This will be replaced with a table for better viewing




ad