slide1 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Rating Performance Assessments of Students With and Without Disabilities: A Generalizability Study of Teacher Bias PowerPoint Presentation
Download Presentation
Rating Performance Assessments of Students With and Without Disabilities: A Generalizability Study of Teacher Bias

Loading in 2 Seconds...

play fullscreen
1 / 21

Rating Performance Assessments of Students With and Without Disabilities: A Generalizability Study of Teacher Bias - PowerPoint PPT Presentation


  • 116 Views
  • Uploaded on

Rating Performance Assessments of Students With and Without Disabilities: A Generalizability Study of Teacher Bias. Jose-Felipe Martinez-Fernandez Ann M. Mastergeorge.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Rating Performance Assessments of Students With and Without Disabilities: A Generalizability Study of Teacher Bias' - bonita


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

Rating Performance Assessments of Students With and Without Disabilities: A Generalizability Study of Teacher Bias

Jose-Felipe Martinez-Fernandez

Ann M. Mastergeorge

UCLA Graduate School of Education & Information StudiesCenter for the Study of EvaluationNational Center for Research on Evaluation, Standards, and Student Testing

American Educational Research Association

New Orleans April 1-5, 2001

introduction
Introduction
  • Performance assessments are increasingly popular methods for the evaluation of academic performance.
  • A number of studies have shown that well trained raters can be reliable scorers of performance assessments for the general population of students.
  • This study addressed whether any bias exists from trained raters when scoring performance assessments of students with disabilities.
purpose
Purpose
  • Compare the sources of score variability for students with and without disabilities in Language Arts and Mathematics performance assessments.
  • Determine if important differences exist across student groups in terms of variance components, and if so whether rater (teacher) bias plays a role.
  • Complement results with raters’ perceptions on bias (their own and other’s).
method
Method
  • Student and Rater samples come from a larger district-wide validation study involving thousands of performance assessments.
  • Teachers from each grade and content area were trained as Raters.
  • A total of 6 studies (each with different raters and students) were performed for 3rd , 7th and 9th grade assessments in Language Arts and Mathematics.
method continued
Method(continued)
  • For each study, 60 assessments (30 from regular education students and 30 from students who received some kind of accommodation) were rated by 4 raters in two occasions.
  • Raters were aware of each student’s disability status only in the 2ndrating occasion. Bias is defined as systematic differences in the scores across occasions.
  • No practice or memory effects expected.
  • Score scale ranges from 1 to 4.
method continued1
Method(continued)
  • Two kinds of Generalizability designs: First a “nested-within-disability” design with all 60 students [P(D) x R x O].
  • Second, separate fully crossed [P x R x O] designs for each disability group of 30 students.
  • Math assessments consisted of two tasks. Both a random [P x R x O x T] design and a fixed [P x R x O] design averaging over tasks were used.
  • A survey inquired about raters’ perceptions regarding bias in rating students with disabilities (their own and other raters’).
generalizability results nested design language arts score rater x occasion x person disability
Generalizability ResultsNested Design: Language Arts [Score=Rater x Occasion x Person (Disability)]
slide9
Generalizability Results (continued)Nested Design: Mathematics [Score=Task x Rater x Occasion x Person (Disability)]
slide10
Generalizability Results(continued)Crossed Design by Disability: Language Arts [Score=Rater x Occasion x Person]
slide11
Generalizability Results (continued)Crossed Design by Disability: Mathematics [Score=Task x Rater x Occasion x Person]
slide12

Generalizability Results(continued)Crossed Design by Disability: Mathematics with Task facet fixed [Score=Person x Rater x Occasion, averaging over the two tasks]

slide14
RaterSurvey(continued)Mean Score of Raters on Self and Others Regarding Fairness and Bias on Scoring
discussion
Discussion

Variance Components:

  • Person (P) component is always the largest (50% to 70% of variance across designs). However there still exists a good amount of measurement error (triple interaction, ignored facets).
  • Some differences exist between regular education and disability groups in terms of variance components
discussion continued
Discussion (continued)

Differences between groups:

  • Total amount of variance is always less in the disability groups (more skewed distribution).
  • Variance due to persons (P) and therefore Dependability coefficients are lower for the disability group in Language Arts. This is also true in Mathematics if we use a fixed averaged task facet, but not with two random tasks.
discussion continued1
Discussion(continued)

Rater Bias:

  • No Rater (R) main effects. No leniency differences across raters.
  • No “rating occasion” (O) effect. Overall there is no bias introduced by rater knowledge of disability status.
  • No rater interactions with tasks or occasions.
discussion continued2
Discussion(continued)
  • However, there is a non-negligible Person by Rater (PxR) interaction which is considerably larger for disability students.
    • This does not necessarily constitute bias but can still compromise validity of scores for accommodated students.
    • Are features in papers from students with disabilities differentially salient to different raters?
discussion continued3
Discussion(continued)
  • There is a Large Person by Task (PxT) interaction in Math, but it is considerably smaller for students with disabilities:
    • Disability students may not be as aware of the different nature of the tasks so that this somehow natural interaction (Miller & Linn, 2000 and others) would show.
    • Accommodations may not be having the intended leveling effects.
    • With a random task facet the lower PxT interaction “increases reliability” for disability students.
discussion continued4
Discussion(continued)

From Rater Survey:

  • Teachers believe that there is a certain bias and unfairness from raters when scoring performance assessments from students with disabilities.
  • Raters see themselves as more fair and unbiased than the general population of raters.
  • Whether this is due to training, or to initially high self-perceptions is not clear. A not uncommon “I’m great but others aren’t as much” kind of effect could be the sole reason.
future directions and questions
Future Directions and Questions
  • Are there different patterns for different kinds of disabilities/accommodations?
  • Are accommodations being used appropriately and having the intended effects?
  • Do patterns hold for raters at the local school sites who in general receive less training?
  • Does rater background influence the size and nature of these effects and interactions?
  • How does the testing occasion facet influence variance components/other interactions?