Scaling & Grading in Examinations

Scaling & Grading in Examinations Presented by Dr. Manisha Taneja Lecturer in Education R.M.S. College of Education, Behrampur, Sec-74, Gurgaon

Adding marks to decide result • Doctors measure several indicators of health such as height, weight, body temperature, blood pressure, etc, but do not add them to determine the overall health of a patient (why?). • But, teachers add marks obtained by the students in different subjects in order to assess the overall school performance (how strange?). • Doctors know that the measures they obtain assess different traits are not on the same scale, and hence, cannot be added or subtracted.

How can we add or compare? • Only measurementson the same scales can be added or subtracted. We change inches, feet and yards to meters before performing arithmetical operations. • We cannot even compare quantities of traits unless we convert them to a common scale. • Converting raw marks to common scale is necessary.

Standard of an Evaluator • Marks in different subjects vary in overall level (mean score) and spread (SD). • The standard of marking of an evaluator is defined in terms of mean and SD of the raw scores awarded by him while evaluating a given set of answer books. • When marks are combined, the SDs of different components play significant roles in the combination. • The weight of a given component in the combination is proportional to its SD. The weight of a component with SD as 15 will be thrice the weight of the one with an SD of 5 points.

Weaknesses of Essay Examinations • Low validity, low scorer-reliability, high subjectivity, low comparability, unfairness • Optional questions, such as 5 out of ten, reduce comparability • Fail – pass, cut-scores for different divisions/categories are arbitrary. • Combining marks by adding arithmetically is a highlyunscientific. • Before combining, marks the should be scaled or standardized

What is Scaling? • Converting measurements taken on different scales to a common scale is scaling. • The basic idea behind scaling is that ‘the distance of a scaled score from scaled-score mean in terms of scaled-score SD equals the distance of the corresponding raw score from the raw-score mean in terms of raw-score SD’. • It is simply a linear transformation which does not change the original distribution of scores. The common examples of scaled scores are z-scores, T-scores and ETS-scores etc.

Comparing performance in two areas Hindi Maths Mean 50 60 SD 10 12 Student A 60 72 Student B 55 60 Student C 65 65 (How do they compare ?)

Methods of Scaling • There are a few practical methods of scaling. One is nomogram method, and the other is graphical method. • The methods may be explained by the following example: Head Assistant Mean 55 63 SD 10 15 Range 26 – 82 30 – 87 • The assistant’s awards are to be scaled on heads awards which are assumed to be a standard.

Nomogram method • Draw two parallel lines of about the same length and represent the lowest and the highest scores awarded by the Head at the end-points of one line; then divide the line into parts to represent scores. • Do the same thing for assistant’s lowest and highest awards but in reverse direction. • Join the diagonally opposite points by intersecting straight lines; the line originating at any un-scales score point through the point of intersection will meet the opposite line at the scaled score point.

Graphical Method -1 • Convert the lowest and highest score by the assistant to z-scores and multiply by head’s SD and then add head’s mean to each. This gives the scaled (to head’s standard) scores of 33 and 71. • Then select points (30, 33) and (87, 71) on the graph paper and draw a line joining them. • This line may be used to find the scaled score for any un-scaled score. The equationof this line may also be used to compute scaled scores.

Graphical Method -2 • Alternatively, find the points on both the scales one SD below and one SD above the respective means. This would result in the pair of points (78, 65) and (48, 45), which may be plotted and joined to find out the line. These points may also be used to find out the equation of the line which may be used to calculate scaled scores. • The scaled scores can then be combined by arithmetic operations like addition and used fro further analysis and reporting of results.

Grading System • Traditional method of adding scores and placing a student in different performance categories/divisions is arbitrary. • Standard errors in marking vary with subjects, teachers, and time. This is a measure of a chance-variationin marking behavior or subjectivity. • Chance variation justifies grading • A grade is a symbolassociated with a score-rangeindicating more or less the same level of performance providing for randommarking errors.

Standard Error of marking • If the script of an examinee is examined by several examiners independently, there would be a wide variation in marks around the hypothetical true score. • The difference between the true and obtained or awarded score is called the marking error. • For each examiner there would be a marking error for a given answer book. The marking errors being random are likely to be normally distributed. • The standard deviation of marking errors may be called the standard error of marking (SEM). Research studies conducted in 1960s showed that standard error of marking on essay tests was 5-7 %. In a more recent study, it was found to be about 12%.

Interpretation of SEM • If the obtained score of a person is 50, his true score may be anywhere between 29 and 71, if standard error of 7% is accepted. This shows that the concerned candidate may fail or may obtain a first division. • This shows that the performance in the score-range 29 – 71 is more or less of the same level. This forms the basis for awarding grades, rather than marks.

Procedure of grading: two approaches • The grading process depends on factors like nature of the subject, difficulty of questions, and the quality of group being evaluated. • First Approach: Direct grading • Assigning weights and Computation of GPA. • Second Approach: Grading by converting numerical scores into Letter grades or symbols.

Direct grading • In direct grading, the evaluator assigns, by his own judgment, one of the several symbols to a given answer indicating its quality. If there are several answers, their grades are to be combined and GPA is reported. • Sometimes, the number or percentage of students to be placed in each grade is decided in advance. • For computing GPA, the grade/symbol assigned to each question/component is assigned a weight out of 5,4,3,2,1 for A,B,C,D,E respectively, and the sum is divided by the number of components.

Grading via Numerical Scores • For this purpose two approaches are used – absolute grading and relative grading. • In absolute grading, the absolute quality/standard or level of performance (in terms of numerical score-ranges) is attached to each grading category. For example: Grade Score-range (percent) A 95 -100 B 85 – 95 C 75 – 84 D 65 – 74 E Below 65

Absolute grading • In this case, the number of persons to be placed in each grade is not specified in advance. • This is significantly affected by difficulty level of the test and variabilityof the test scores.

Relative grading • In relative grading – also known as norm-referenced grading, relative positions of examinees are considered. • This method is also known as ‘grading on the curve’ because it assumes a distribution of scores – normal or otherwise • It also has two approaches – pre-decidedinterval approach and pre-decided number or percentage approach.

Two approaches • In the first approach, entire scale is divided into score intervals (not necessarily all equal) and number of persons to be assigned each grade is subsequently determined. • In the other approach, the number or percentage of persons to be assigned each grade is fixed inadvance and score-range for each grade is determined subsequently.

Limitations of grading system • There are chances of misclassification, which increase with variability, specifically in the neighborhood of cut-scores. • Subjects abler students to a disadvantage and poorer ones to an advantage, because of lumping them together. • Has limited utility in making certain crucial decisions such as taking selection decisions.

Thank You

Scaling & Grading in Examinations