1 / 19

Unit 7: Validity and Reliability Slides

Validity. Refers to the appropriateness and meaningfulness of the inferences we make from assessment results. Categories of Validity. Content-Related - How adequately does the sample of assessment items represent the task domain to be measured?Criterion-Related - How accurately does performance on

herve
Download Presentation

Unit 7: Validity and Reliability Slides

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. Unit 7: Validity and Reliability Slides

    2. Validity Refers to the appropriateness and meaningfulness of the inferences we make from assessment results

    3. Categories of Validity Content-Related - How adequately does the sample of assessment items represent the task domain to be measured? Criterion-Related - How accurately does performance on the assessment predict future performance or present performance on an alternative measure

    4. Categories of Validity (continued) Construct-Related - How well can performance on the assessment be explained in terms of a psychological construct? Consequences of Using - How well did use of the assessment serve the intended purpose (e.g., improve performance) and avoid adverse effects (e.g., poor study habits)

    5. Enhancing Content-Related Validity Methods: 1. Identify the general and specific objectives to be assessed 2. Construct an outline of subject matter 3. Construct a table of specifications to ensure adequate sampling of items 4. Provide clear assessment directions and proper administration & scoring procedures

    6. Establishing Criterion-Related Validity Methods: Predictive Study (e.g., predicting college GPA based on ACT) Concurrent Study (e.g., comparing validity of ACT with the SAT)

    7. Correlation (Validity) Coefficient 1.00 = Perfect positive relationship .65 to .99 = Strong positive relationship .30 to .64 = Moderate positive relationship .01 to .29 = Weak positive relationship 0.00 = No relationship -.01 to -.29 = Weak negative relationship -.30 to -.64 = Moderate negative relationship -.65 to -.99 = Strong negative relationship -1.00 = Perfect negative relationship

    8. Construct-Related Validity What is a construct? A construct is an individual characteristic that we assume exists in order to explain some aspect of behavior Therefore, Construct-related validity refers to the extent to which an instrument measures a construct

    9. Establishing Construct-Related Validity Convergent Validity - Calculate a correlation coefficient of relationship between two instruments that measure the same construct (e.g., Stanford-Binet and WAIS) Discriminant Validity - Calculate a correlation coefficient between two instruments that measure opposing constructs (e.g., Nach and Naff)

    10. Consequences of Using Assessment Results Did use of Assessment: Improve motivation? Improve performance? Improve self-assessment skills? Contribute to transfer of learning? Encourage independent learning? Encourage good study habits? Contribute to a positive attitude?

    11. What is Reliability? Refers to the consistency of assessment results. . . Would we obtain similar results if we used a different version of a test? Would we obtain similar results if we used the same assessment at a later time? If a performance assessment is rated by different observers, would they rate performance the same?

    12. Definitions Raw Score - the score a student obtains on a test Systematic Errors - external influences that raise or lower ALL the raw scores of students Random Errors - internal influences that raise or lower the INDIVIDUAL raw scores of students Standard Error of Measurement - a measure of the influence of random errors. Reliability Coefficient - a correlation coefficient that indicates relationship between two sets of measurement obtained from the same procedure

    13. Methods of Calculating Reliability Coefficients Test-Retest Method Equivalent Forms Method Test-Retest with Equivalent Forms Method Internal Consistency Method

    14. Internal Consistency Method Procedure: 1. Correlate the odd item scores on the test with the even item scores on the test 2. Calculate the Spearman-Brown reliability coefficient

    15. Standard Error of Measurement Purpose: Used to calculate an estimated range (confidence band) of a person’s score if he or she were to take the test again and again.

    16. Calculating Standard Error of Measurement 1. Calculate the standard deviation of a set of raw scores 2. Calculate the split-half correlation coefficient 3. Calculate the Spearman-Brown reliability coefficient 4. Calculate the confidence band of a person’s score if taken again

    17. Reliability of Criterion-Reference Tests Question: How consistently does the test classify masters and non-masters? In other words, if we gave an equivalent test to the students again, would the same students be identified as having mastered the material?

    18. Calculating Reliability of Criterion-Reference Tests Steps: 1. Construct a two-by-two table with Form A as the rows and Form B as the columns 2. Place the number of students who mastered both forms in the upper right cell, those who mastered only Form A in upper left cell; etc. 3. Compute the percentage of consistency

    19. Calculating the Reliability of Performance Assessment Procedure: 1. Two or more judges rate the performance of students completing a task 2. Construct a table consisting of the scales and the number of students receiving each rating by each judge 3. Calculate the percentage of inter-rater agreement

More Related