by Dr. Mohammad H. Omar Department of Mathematical Sciences May 16, 2006

Some Statistics for Equating Multiple Forms of a test by Dr. Mohammad H. OmarDepartment of Mathematical Sciences May 16, 2006 Presented at Statistic Research (STAR) colloquium,King Fahd University of Petroleum & Minerals,Dhahran, Saudi Arabia.

Equating

Brief overview of Talk • Test administration using • Only one form • More than one form • Test Equity • Steps to ensuring equity • Conditions for Equated Score • Data Collection Designs • Equating procedures • Illustration of the Equipercentile Equating process • Use of smoothing techniques • Application of equipercentile equating to data collection design • Standard errors of equipercentile equating • Linear equating • Illustration of the Linear Equating process • Application of linear equating to data collection design • Standard errors of linear equating • Comparison of equating methods

Test Administration using only one Form Advantage Disadvantage If cheating doesn’t occur

Test Administration using more than one form Advantage Disadvantage

Test Equity Definition (laymen’s definition) : Equity "It is a matter of indifference which test form a student took"

Steps to ensuring Equity • Building test forms to the same test content specifications • Test forms should be interchangeable. No one form should have different content specifications than others. • Test length should be the same. No one form should be longer than another Students should not be disadvantaged by taking a longer test form than their peers.

Steps to ensuring Equitycontinued…// • Building test forms to the same test parameter specifications • Test forms should be equally difficult Students should not be disadvantaged by taking test forms that are very difficult compared to what their peers take in the same administration. • Test forms should be equally reliable.

Conditions For Equated Scores The purpose of equating is to establish, as nearly as possible, an effective equivalence between raw scores on two test forms. Because equating is an empirical procedure, it requires a design for data collection and a rule for transforming scores on one test form to scores on another. Many practitioners would agree with Lord (1980) that scores on test X and test Y are equated if the following four conditions are met: Same Ability – the two tests must both be measures of the same characteristic (latent trait, ability or skill). Equity – for every group of examinees of identical ability, the conditional frequency distribution of scores on test Y, after transformation, is the same conditional frequency distribution of scores on test X. Population Invariance – the transformation is the same regardless of the group from of which it is derived. Symmetry – the transformation is invertible, that is, the mapping scores from form X to form Y is the same as the mapping of scores from form Y toform X

Conditions For Equated Scores continued…// • The equity condition is unlikely to be precisely satisfied in practice. • Although it might be possible to build two forms of a test that measured the same characteristic and were equally reliable generally, it is highly unlikely that one could ever build two forms that were equally reliable at every ability level, let alone that which can produce the same conditional frequency distributions.

Data Collection Designs

Equating Data Collection Designs • No statistical procedure can provide completely appropriate adjustments when non-equivalent or naturally occurring groups are used, but • adjustments based on an another test that is as close as possible to the tests to be equated are much more satisfactory than those based on nonparallel tests.

Equating Procedures • Can regression be used to equate scores? • No. Because Y = a+bX does not give us the same conversion function as X = c+mY • To ensure equity, the conversion functions need to be the same. ^ ^

Equating Procedures • Pre-Equating • Equating done on sections of a test, not the final test booklets • Scores are not counted for student • Post-Equating • Equating done on final test booklets, not sections of a test • Equipercentile Equating • Equates percentiles of two score distributions for two test forms • Linear Equating • Equates means and standard deviations of two score distributions for two test forms

Illustration of the Equipercentile Equating Process Equipercentile equating can be thought as a two-stage process (Kolen, 1984). First, the relative cumulative frequency (i.e. percentage of cases below a score interval) distributions are tabulated or plotted for the two forms to be to be equated. Second, equated scores (e.g. scores with identical relative cumulative frequencies) on the two forms are obtained from these cumulative frequency distributions.

Illustration of the Equipercentile Equating Processcontinued…// A graphical method for equipercentile is illustrated in Figure 6.4. First, • the relative cumulative frequency distributions, each based on 471 examinees, for two forms (designated X and Y) of a 60-item number-right-scored test were plotted. • The crosses (and stars) represent the relative cumulative frequency (i.e., percent below) at the lower real limit of each integer score interval (e.g, at i-0.5, for i=1, 2, ……, n, where n is the number of items). Next, • the crosses (stars) were connected with straight line segments. • Graphs constructed in this manner are referred to as linearly interpolated relative cumulative frequency distributions. • The line segments connecting the crosses (stars) need not be linear. • Methods of curvilinear interpolation, such as the use of cubic splines, could also be employed.

Illustration of the Equipercentile Equating Processcontinued…// • Let the form-X equipercentile equivalent of yi, be denoted ex(yi). • The calculation of the form X equipercentile equivalent ex(18) of a number-right score of 18 on form Y is illustrated in Figure 6.4. • The left-hand vertical arrow indicates that the relative cumulative frequency for a score of 18 on form Y is 50. • The short horizontal arrow shows the point on the curve for form X with the same relative cumulative frequency (50). • The right-hand vertical arrow indicates that a score of 30 on form X is associated with this relative cumulative frequency. • Thus, a score of 30 on form X is considered to be equivalent to a score of 18 on form Y. • A plot of the score conversion (equivalent) is given in Figure 6.5.

The equipercentile transformation between two forms, X and Y, of a test will usually be curvilinear. • If form X is more difficult than form Y, the conversion line will tend to be concave downward. • If the distribution of scores on form X is flatter, more platykurtic, than that on form Y, the conversion will tend to be S-shaped. • If the shapes of the score distributions on the two forms are the same (i.e., have the same moments except for the first two), the conversion line will be linear.

Use of Smoothing Techniques • Unsmoothed equipercentile equating uses straight linear interpolation for the ogives • Smoothing techniques can be used with curvilinear interpolation such as cubic splines with different parameters • Smoothing on ogives is known as pre-smoothing method • Smoothing on conversion functions is known as post-smoothing method

Application of Equipercentile-Equating to Data Collection Designs Equipercentile equating can also be carried out for the anchor-test-random-groups design in the following manner: • Using the data for the group taking tests X and V (the anchor test), for each raw score on test V, determine the score on test X with the same percentile rank. • Using the data group taking tests Y and V, for each raw score on test V, determine the score on test Y with the same percentile rank.

Application of Equipercentile-Equating to Data Collection Designscontinued…// • Tabulate pairs of scores on tests X and Y that correspond to the same raw score on test V. • Using data from step 3, for each raw score on test Y, interpolate to determine the equivalent score on test X. The last procedure uses the data on test V to adjust for differences in ability between the two groups. This procedure really involves two equatings, instead of just one, and therefore doubles the variance of equating error.

Standard Errors of Equipercentile Equating

Standard Errors of Equipercentile Equatingcontinued…// Another procedure that may be used to estimate the standard error of an equipercentile equating is the bootstrap method (Efron 1982).

Linear Equating When tests X and Y are not equally reliable, true score x’ and y’ are used instead

Illustration of the Linear Equating Process • Linear equating, like equipercentile equating, can be thought of as two-stage process. • First, compute the sample means (m) and standard deviations (s) of scores on the two forms to be equated. • Second, obtain equated scores on the two forms by substituting these values into linear equating equation. • For example, suppose the raw-score means and the standard deviations for two-forms, X and Y, of a 60-item number-right-scored test administered to a single group of 471 examinees are

Illustration of the Linear Equating Processcontinued…//

Application of Linear Equating to Data Collection Designs • Linear equating can be carried out for the anchor-test-random-groups design in the same manner as for the equivalent-group design, in which case, the data on anchor-test V are ignored. • However, even when the groups are chosen at random, it is inevitable that there will be some differences between them, which, if ignored, will lead to bias in the conversion line. • The data on test V can be used to adjust for differences between groups by means of the maximum-likelihood approach (Lord, 1955a). • Maximum-likelihood estimates of the population means and standard deviations on forms X and Y are as follows:-

Application of Linear Equating to Data Collection Designscontinued..//

Standard Errors of Linear Equating

Equipercentile Equating Adjust for differences in difficulty of test forms Can equate up to the fourth moments of the score distribution Percent of students below a particular score is equated Linear Equating Adjust for differences in difficulty of test forms Only equates up to the first two moments of the score distribution Percent of students scoring below an equated score is not equated Comparison of equating methods

References • Kolen and Brennan (1995) Test equating, springer verlag • Kolen, Peterson, & Hoover’s chapter on test equating in Linn (1993) Educational Measurement, Ace-Oryx publishing

Thank You Thank You

by Dr. Mohammad H. Omar Department of Mathematical Sciences May 16, 2006