1 / 96

Centrality in a Modified- Angoff Standard Setting PhD Dissertation - Proposal by

Centrality in a Modified- Angoff Standard Setting PhD Dissertation - Proposal by Michael Scott Sommers ( 張夏石 ) Department of Educational Psychology, National Taiwan Normal University. Outline of the Proposal. Statement of the Problem Purpose & Motivation

phila
Download Presentation

Centrality in a Modified- Angoff Standard Setting PhD Dissertation - Proposal by

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Centrality in a Modified-Angoff Standard Setting PhD Dissertation - Proposal by Michael Scott Sommers (張夏石) Department of Educational Psychology, National Taiwan Normal University

  2. Outline of the Proposal

  3. Statement of the Problem Purpose & Motivation • Introduction to Standard Setting • Angoff Standard Setting Procedure • Problems with the Angoff Procedure • What is Centrality? • Research Questions Methods & Analysis • Materials & Participants • Analysis Expected Results

  4. Statement • of the • Problem

  5. Standard setting is the most widely used method to establish cutscores for high stakes examinations. Despite this, many questions remain about how the procedure works and what exactly the meaning of the cutscore is. This is especially true for the Angoff and related methods of standard setting.

  6. It is widely believed that judges in an Angoff standard setting have problems judging the most difficult and the easiest items. That this inability creates centrality for the estimates that judges are required to provide.

  7. Aim of the study Study the impact of questions of different difficulty and different rounds of the standard setting on panelist centrality in a modified-Angoff procedure

  8. Question 1: Does Centrality exist in the modified-Angoff standard setting? • Question 2: How does Centrality change across the rounds of the modified-Angoffprocedure? • Question 3: Is Centrality explained by differences in panelist ratings between extreme (difficult and easy) items and median difficulty item?

  9. Purpose & Motivation

  10. Introduction to Standard Setting Standard setting is the most widely used method to establish cutscores for high stakes examinations. Despite this, many questions remain about how the procedure works and the full meaning of the cutscore. This is especially true for the Angoff and related methods of standard setting.

  11. It is widely believed that judges in an Angoff standard setting have problems judging the most difficult and the easiest items. That this inability creates centrality for the estimates that judges are required to provide.

  12. Standard setting is a procedure used to calculate a cutscore for a test. • Standardis a verbal description of performance. • Cutscoresare the scores on a test needed to separate people taking a test in to the different categories of a standard.

  13. Common European Framework of Reference (CEFR)

  14. B1 Can understand the main points of clear standard input on familiar matters. Can produce simple connected text on familiar topics. What is a “familiar matter” or “simple text”?

  15. A standard setting procedure can help understand these terms so they can be used to decide how a test can be used to determine if the this has been reached.

  16. The Angoff standard setting procedure is one of the most widely used methods in Taiwan and the world to determine the passing score for high stakes tests. • First suggested by William Angoff who attributed the idea to his colleague Ledyard Tucker (Cizek & Bunch, 2007).

  17. There are many different types of standard setting procedures. One recent review (Kaftandjieva, 2010) identified more than 60 different methods for standard setting.

  18. Angoff Standard Setting Procedure Judges are trained to use a description of performance called Performance Level Descriptors (PLDs)and match test items with these descriptors to create a cutscore that can be used to divide test takers in to different levels of performance.

  19. Angoff, Step 1: The BPS BPS Judges are trained to use the PLDs and imagine a ‘barely proficient student’ (BPS) PLDs

  20. Step 2: Item Functioning Judges examine each item to assess difficulty. TEST ITEM

  21. Step 3: Quantifying Estimates 1 BPS “I think a BPS has a 68% chance of answering correctly.” .68 .50 Judges quantifytheir expectations of the outcome as probabilities. 0

  22. Calculating the Cutscore Judge 1 Mean = 67.8 Judge 2 Mean = 72.2 Judge 3 Mean = 65 Judge 4 Mean = 75 Mean across judges = 70 Final Cutscore= 70 Item 1 68 Item 2 43 Item 3 72 Item 4 80 Item 5 76 Mean = 67.8

  23. Angoff, Step 4: Discussion & Feedback Sharing Estimates / Discussion Empirical P-values Conditional ‘P-values’ % Students who would pass

  24. Typical Angoff Procedure Round 1 Round 2 Round 3 Discussion & Feedback Discussion & Feedback Final Cutscore

  25. Problems with the Angoff Procedure?

  26. It is widely reported in standard setting and testing research that even the most experienced and well-trained judge may have problems estimating the difficulty of items on a test.

  27. Is this true for all items? Are judges completely wrong? Are some items easier than others to estimate accurately?

  28. Very difficult and very easy, i.e. extreme items, appear to be more difficult to estimate correctly. Items of moderate difficulty can be judged more accurately. Judges are not using the full range of the scale.

  29. This study is a clarification of the measurement properties associated with this problem.

  30. Does item difficulty and the rounds of the standard setting help us understand the observed centrality of the panelists? What is the effect of item difficulty and the rounds of a standard setting on the observed centrality in an Angoff standard setting?

  31. What is Centrality?

  32. Centrality is a widely accepted concept in the study of rating scales and ratersthat describes the clustering of rater scores around the center of a rating scale. It has not been used to describe the results of an Angoff standard setting before but the similarity between the two situations indicates it could produce important results.

  33. A wide range of definitions have been suggested. These are reviewed in Saal, Downey, and Lahey (1980) Their review is not very helpful. By current standards, their conclusions about the measurement of Centrality are useless.

  34. Why? They include measures that are clearly measuring different things. They include measures that have been used without discussion of what they’re measuring.

  35. Saal, Downey, and Lahey (1980) is focused on classical measures of Centrality.

  36. Wolfe (2004, pp. 39-40) “centrality...results in a concentration of assigned ratings in the middle of the rating scale…”

  37. A number of related terms

  38. “…restricted rangeexists when centrality is combined with leniency or harshness. That is, the restriction of range results in a restricted range around a non-central location on the rating scale. The converse of rater centrality occurs when raters tend to overuse the extreme rating scale categories - a rater effect called extremism.”

  39. Other related terms Central tendency – sometimes used synonymously with Centrality, sometimes it is used differently. Rater effect Rater bias

  40. I will only be dealing with, Wolfe (2004, pp. 39-40) “centrality...results in a concentration of assigned ratings in the middle of the rating scale…”

  41. Item Centrality – ratings from different raters for different items are clustered near the center of a rating scale. Rater Centrality – rating for different items from different raters are clustered near the center of a rating scale

  42. Items/persons p1 p2 p3 …..PnItem i1 ip11 ip12 ip13…..Ip1n Centrality I2 ip21 ip22 ip23…..ip2n I3 ip31 ip32 ip13…..ip3n . . . In ipn1 ipn2 ipn3…..Ipnn Person Centrality

  43. Research Questions

  44. Aim of the study Study the impact of questions of different difficulty and different rounds of the standard setting on panelist Centrality in a modified-Angoff procedure

  45. Question 1: Does Centrality exist in the modified-Angoff standard setting? • Question 2: How does Centrality change across the rounds of the modified-Angoffprocedure? • Question 3: Is Centrality explained by differences in panelist ratings between extreme (difficult and easy) items and median difficulty item?

  46. Methods and Analysis

  47. Materials & Participants Analysis

  48. Methods & Participants

  49. Reliability (Cronbach’s Alpha): ~.90 overall (> .80 for listening/reading subtests) Item Analysis (Rasch Fit, Point Biserials) Construct Validation: Factor Analysis, PCA Linked Exam - EPT Spring Midterm - Annual English Proficiency Test (EPT) to assess gains in student proficiency. Angoff Yes/No: Spring 2009 (97-2 學年) EPT Angoff: Spring 2010 (98-2 學年) EPT ~3000 Examinees per year level ~12,000 Examinees

More Related