Establishing the Reliability and Validity of Outcomes Assessment Measures

Establishing the Reliability and Validity of Outcomes Assessment Measures Anthony R. Napoli, PhD Lanette A. Raymond, MA Office of Institutional Research & Assessment Suffolk County Community College http://sccaix1.sunysuffolk.edu/Web/Central/IT/InstResearch/

Validity defined • The validity of a measure indicates to what extent items measure some aspect of what they are purported to measure

Types of Validity • Face Validity • Content Validity • Construct Validity • Criterion-Related Validity

Face Validity • It looks like a test of *#%* • Not validity in a technical sense

Content Validity • Incorporates quantitative estimates • Domain Sampling • The simple summing or averaging of dissimilar items is inappropriate

Indicated by correspondence of scores to other known valid measures of the underlying theoretical trait Discriminant Validity Convergent Validity Construct Validity

Represents performance in relation to particular tasks of discrete cognitive or behavioral objectives Predictive Validity Concurrent Validity Criterion-Related Validity

Reliability defined • The reliability of a measure indicates the degree to which an instrument consistently measures a particular skill, knowledge base, or construct • Reliability is a precondition for validity

Types of Reliability • Inter-rater (scorer) reliability • Inter-item reliability • Test-retest reliability • Split-half & alternate forms reliability

Validity & Reliability in Plain English • Assessment results must represent the institution, program, or course • Evaluation of the validity and reliability of the assessment instrument and/or rubric will provide the documentation that it does

Content Validity for Subjective Measures • The learning outcomes represent the program/course (domain sampling) • The instrument addresses the learning outcomes • There is a match between the instrument and the rubric • Rubric scores can be applied to the learning outcomes, and indicate the degree of student achievement within the program/course

Inter-Scorer Reliability • Rubric scores can be obtained and applied to the learning outcomes, and indicate the degree of student achievement within the program/course consistently

Content Validity for Objective Measures • The learning outcomes represent the program/course • The items on the instrument address specific learning outcomes • Instrument scores can be applied to the learning outcomes, and indicate the degree of student achievement within the program/course

Inter-Item Reliability • Items that measure the same learning outcomes should consistently exhibit similar scores

Objective I II III IV Description Write and decipher chemical nomenclature Solve both quantitative and qualitative problems Balance equations and solve mathematical problems associated w/ balanced equations Demonstrate an understanding intra-molecular forces Content Validity (CH19) A 12-item test measured students’ mastery of the objectives

Content Validity (CH19)

Objective I II III Description Identify the basic methods of data collection Demonstrate an understanding of basic sociological concepts and social processes that shape human behavior Apply sociological theories to current social issues Content Validity (SO11) A 30-item test measured students’ mastery of the objectives

Content Validity (SO11)

Drawing Design Technique Creativity Artistic Process Aesthetic Criteria Growth Portfolio Presentation Scale: 5 = Excellent 4 = Very Good 3 = Satisfactory 2 = Unsatisfactory 1 = Unacceptable Inter-Rater ReliabilityFine Arts Portfolio

Inter-Rater ReliabilityFine Arts Portfolio

Inter-Item Reliability (PC11) Objective Description Demonstrate a satisfactory knowledge of: 1. the history, terminology, methods, & ethics in psychology 2. concepts associated with the 5 major schools of psychology 3. the basic aspects of human behavior including learning and memory, personality, physiology, emotion, etc… 4. an ability to obtain and critically analyze research in the field of modern psychology A 20-item test measured students’ mastery of the objectives

Embedded-questions methodology Inter-item or internal consistency reliability KR-20, rtt = .71 Mean score = 12.478 Std Dev = 3.482 Std Error = 0.513 Mean grade = 62.4% Inter-Item Reliability (PC11)

Inter-Item Reliability (PC11)Motivational Comparison • 2 Groups Graded Embedded Questions Non-Graded Form & Motivational Speech • Mundane Realism

Inter-Item Reliability (PC11)Motivational Comparison • Graded condition produces higher scores (t(78) = 5.62, p < .001). • Large effect size (d = 1.27).

Inter-Item Reliability (PC11)Motivational Comparison • Minimum competency 70% or better • Graded condition produces greater competency (Z = 5.69, p < .001).

Inter-Item Reliability (PC11)Motivational Comparison • In the non-graded condition this measure is neither reliable nor valid KR-20N-g = 0.29

Criterion-Related Concurrent Validity (PC11)

“I am ill at these numbers.” -- Hamlet --

“When you can measure what you are speaking about and express it in numbers, you know something about it; but when you cannot express it in numbers, your knowledge is of a meager and unsatisfactory kind.” -- Lord Kelvin -- “There are three kinds of lies: lies, damned lies, and statistics.” -- Benjamin Disraeli --

Establishing the Reliability and Validity of Outcomes Assessment Measures

Establishing the Reliability and Validity of Outcomes Assessment Measures

Presentation Transcript

Reliability and Validity

Reliability and Validity

Reliability and Validity

Reliability and Validity

VALIDITY AND RELIABILITY

Reliability and Validity of Dependent Measures

Reliability and Validity

Validity and Reliability

Validity and Reliability

Reliability and Validity

Validity and reliability

Validity and Reliability

Validity and Reliability

Reliability and Validity

Validity and Reliability

Reliability and Validity

Validity and Reliability

Reliability and Validity

Reliability and Validity

Validity and Reliability