CHAPTER 9

CHAPTER 9 Reliability Coefficient for CriterionReferenced Tests

Reliability Coefficients for Criterion Referenced Tests Criterion:What we intend to measure (DV) Norm-Referenced: As in Intelligence tests for Ex.We compare the examinee's score with their norm (Normative IQ) or Deviation IQ. Criterion-Referenced: As in achievement tests we want to know if the examinee achieved a particular domain (math, psych, or a particular behavior).

Reliability Coefficients for Criterion Referenced Tests • Reliability Coefficients for Criterion Referenced Tests are used for 2 different purposes: • 1-Domain Score Estimation or • 2- Mastery Allocations

1. Domain Score Estimation • We use the same type of calculation to determine the reliability coefficient as we did before. Reliability coefficient for Domain Score Estimation of data in table 9.1 is same as table 7.1 • Ex. First we do an ANOVA to find the MS(MS within or MS person, and MS residual) then use the Hoyt’s Method to calculate the reliability coefficient. Next slides

Reliability Coefficients for Criterion Referenced TestsMSperson= MS withinMS items = MS between

Hoyt’s (1941) MethodMS person= MS withinMS items = MS betweenMS residual has its own calculations, it is not =MS total

1. Domain Score Estimation • *1-Domain Score Estimation Domain Score for an examinee is same as Observed Score (X) in Classical theory.It is the proportion of the items in a specific domain that examinee can answer correctly. Ex. Your score of 85 on Test Construction has a D.S. of 85.

Reliability Coefficients for Criterion Referenced Tests • *Decision Consistency It is about the consistency of your decision. Decision Consistency concerns with the extent to which the same decisions are made from different sets of measurements. Consistencyof decisions is based on two different forms of a test (parallel forms test). or, on two administrations of the same test (test-retest). A high reliability coefficient (p) indicates that there is consistencyin examinees scores.

Reliability Coefficients for Criterion ReferencedTests *Factors Affecting Decision Consistency • 1. Test length • 2. Location of the cut-score in the score distributions • 3. Test score generalizability • 4. Similarity of the score distributions for the two forms

Mastery Allocation • *2. Mastery Allocation: Involves comparing the percent-correctscore to an arbitrary established cut score. If the percent-correct score is equal or greater than the cut score, the examinee has mastered that domain.

Mastery Allocation • *Mastery Allocation: Mastering a domain is called Mastery Allocation Ex. EPPP exam cut score in Florida is 70%, If you scored 70% or greater on this exam then you mastered the psychology domain. You get your psychologist licenseand you can call yourself a psychologist.

UNIT III VALIDITY CHAP 10: INTRODUCTION TO VALIDITY CHAP 11: STATISTICAL PROCEDURES FOR PREDICTION AND CLASSIFICATION CHAP 12: BIAS IN SELECTION CHAP 13: FACTOR ANALYSIS

CHAPTER 10INTRODUCTION TO Validity • Validity: Validity refers to the degreethat a test measures what is intended to measure. It is about the quality(accuracy/trueness ) of a test. • *Characteristics of Validity: • 1. Result • 2. Context • 3. Coefficient

*Characteristics of Validity: • 1. Result • Validity refers to the results of a test, not to the test itself. • Ex. If you are taking a statistic test you want to know that the resulting score is valid to measure your knowledge of statistics.

INTRODUCTION TO VALIDITY • 2. Context • Validity of Theresulting score (statistics) must be interpreted within the context in which the test occurs (statistics).

INTRODUCTION TO VALIDITY • 3. Coefficient • Just like reliability coefficient validity coefficient also has degrees of variability from low to high. • P= 0 to 1 • Ex. The validity of the last year Test Construction Exam. p=0.90

Validity • Validity has been described as 'the agreementbetween a test score and the quality it is believed to measure' (Kaplan and Saccuzzo, 2001). In other words, it measures the gapbetween what a test actually measures and what is intended to measure. Next Slide

Validity • This gapcan be caused by two particular circumstances: • (a) the design of the test is insufficient for the intended purpose, (ex. use essays for older examinees) and (b) the test is used in a context or fashion which was not intended in the design (change questions to multiple choice for math).

External &InternalValidity • External Validity External validity addresses the ability to generalizeyour study to other people and other situations. Ex. Correlational studies. The association between stress and depression

External &InternalValidity • Internal Validity Internal validity addresses the "true" causesof the outcomes that you observed in your study. Strong internal validity means that you not only have reliable measures of your independent and dependent variables But a strong justification that causally links your independent variables to your dependent variables (Ex. Experimental studies. The affect of stress on heart attack).

*Major Types of Validity 3Cs Items Stats how well a test estimates/predict a performance teacher’s Math test and the researcher test (fcat) EPPP GRE Test non-observable construct or trait your Dep Test or Clinical interview (underlying construct i.e. Sleeping, eating, hopeless) & BDI-2 score

*Face validity • Face validity is that the test appears to be valid. This is validated using common-sense rules, for example • amathematical test should include some numerical elements.

Face validity • 1. 3+5= • 2. 12-10= • 3. 8-5= • 4. 25-16= • 5. 13+3-8= • Multiple Choice; Please select the best answer. • 6. Judy had 10 pennies. She lost 2. How many pennies does she have left? • A. 2 • B. 8 • C. 10 • D. 12

Face validity

Face Validity • A test can appear to be invalid but actually be perfectly valid, for example where correlations between unrelated items and the desired items have been found. • Ex. Successful pilots in WW2 were found to very often have had an active childhood interest in flying model planes(The association between flying model planesand WW2 successful pilots).

Face Validity • A test that does not have face validity may be rejected by test-takers (if they have that option) and people who are choosing the test to use from amongst a set of options.

Types of Validity • 1. *Content Validity • Measures the knowledge of the content domain of which it was designed to measure. • Ex. If the content domain is statistics the test should measure the statistical knowledge, not English, Math, or psychology etc.,

1. *Content Validity • Instruction: Multiple Choice; Please select the best answer. (structured framework) • 6. Judy had 10 pennies. She lost 2. How many pennies does she have left? • A. 2 • B. 8 • C. 10 • D. 12 • The red part is called “Performance Domain” or Domain Characteristic, which deals with your knowledge of the domain.. • The yellow is called “Matching Item.”

1.Content Validity • *Content Validity • A test has content validity if it sufficiently covers the area that it isintended to cover. This is particularly important in ability or attainment/achievement tests that validate skills or knowledge in a particular domain. • *Content Under-Representation occurs when important areas are missed.*Construct-Irrelevant Variation occurs when irrelevant factors contaminate the test.

1. Content Validity • *Content Validity has 4 Steps • 1. Defining the performance domainof interest • 2. Selecting a panel of qualified experts in the content domain. • 3. Providing a structured framework (instruction) for the process of matching item (Question)to the performance domain (answers.) • 4. Collecting and summarizing the data from the matching process.

1. Content Validity • *Content Validity has 4 Steps • 1. Defining the performance domain of interest • Ex. Ask yourself what am I trying to measure? Psych, Stats, English??

1. Content Validity • 2. Selecting a panel of qualified experts in the content domain. • Ex. Select expert statisticians to review your stats questions. Another ex. Qualifying exam questions.

1. Content Validity • 3. Providing a structured framework (instruction) for the process of matching item (Question)to the performance domain(answers.) • Ex. Go back 4 slides and see Question #3

1. Content Validity • 4. Collecting and summarizing the data from the matching process. Select and collect a sample of these relevant questions (items).

1. Content Validity • *Practical Considerations in Content Validity • *Content validity requires the following 4 decisions (questions). • 1. Should objective be weighted to reflect their importance? Ex. Next slide

1. Content Validity • 2. How should the item-matching task be structured? Ex. Next slide • 3. What aspect of item should be examined? Ex. Next slide • 4. How should results be summarized? • Ex. Next slide

1. Content Validity • 1. Should objective be weighted to reflect their importance? In Content Validity we should rate the importance of objectives. The designer of the test should provide a scale such as a “rubric”for measuring the objectives in a test. This also helps you to measure the inter-rater reliability of a test more accurately.

1. Content Validity • 2. How should the item-matching task be structured? Katz (1958) suggested that the expert reviewers should read the item and identify the correct/best response. Hambleton (1980) idea was that the experts should rate the degree of matchingto a specific objective by using a 5 point scale poor fit 1____2____3____4____5 excellent fit

1. Content Validity • 3. What aspect of item should be examined? • We should have a clear description of item and domain to consider the matching item(s) to a performance domain or domain characteristics. Ex. Go back to Question # 6

1. Content Validity • 4. How should results be summarized There are 5 ways: read p. 221 1. Percentage of items matched to objectives 2. Percentage of items matched to objectives with high “importance” rating 3. Correlation between the importance weighting of objectives and the number of items measuring those objectives 4. Index of item-objective congruence 5. Percentage of objectives not assessed by any of the items on the test

2. Criterion Related Validity • *Criterion Related Validity is a measure of the extent to which a test is related to same criterion or, how well a test estimates/predict a performance • Ex. SAT would be a predictor of college performance, GRE, Graduate performance, EPPP psychologist performance, and Driver License Test, basic traffic signs and signals and/or driving performance.

2. Criterion Related Validity • Criterion Related Validity is concerned with how well a test either estimates current performance (Concurrent Validity) or how well it predicts the future performance (Predictive Validity).Ex. EPPP Exam

Ex. ofConcurrentand PredictiveValidity • Researchers want to know if 6 grade students Math score is valid. They give students a test, designed to measure mathematical aptitude for 6 graders. • They then compare and correlate this scores with the test scores already held by the teachers (midterm scores). r

Ex. of Concurrent and Predictive Validity • They evaluate the accuracy of their test, and decide whether it measures what it is supposed to. The key element is that the two methods were compared at about the same time (Concurrent) or only a few days apart).

CHAPTER 9

CHAPTER 9

Presentation Transcript

Chapter 9

CHAPTER 9

Chapter 9

Chapter 9

Chapter 9

Chapter 9

Chapter 9

Chapter 9

Chapter 9

Chapter 9

Chapter 9

Chapter 9

Chapter 9

Chapter 9

Chapter 9

Chapter 9

Chapter 9

Chapter 9

CHAPTER 9

Chapter 9

Chapter 9

Chapter 9