Chapter 7 evaluating what a test really measures
Download
1 / 42

Chapter 7 Evaluating What a Test Really Measures - PowerPoint PPT Presentation


  • 141 Views
  • Uploaded on

Chapter 7 Evaluating What a Test Really Measures. Validity. APA – Standards for Educational and Psychological Testing (1985) – Recognized three ways of deciding whether a test is sufficiently valid to be useful.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Chapter 7 Evaluating What a Test Really Measures' - theophilia


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Chapter 7 evaluating what a test really measures

Chapter 7Evaluating What a Test Really Measures


Validity
Validity

  • APA – Standards for Educational and Psychological Testing (1985) – Recognized three ways of deciding whether a test is sufficiently valid to be useful.


Chapter 7 evaluating what a test really measures

Validity: Does the test measure what it claims to measure?The appropriateness with which inferences can be made on the basis of test results.


Validity1
Validity

  • There is no single type of validity appropriate for all testing purposes.

  • Validity is not a matter of all or nothing, but a matter of degree.


Types of validity
Types of Validity

  • Content

  • Criterion-Related (concurrent or predictive)

  • Construct

  • Face


Content validity
Content Validity

  • Whether items (questions) on a test are representative of the domain (material) that should be covered by the test.

  • Most appropriate for test like achievement tests (i.e., concrete attributes)


Content validity1
Content Validity

Guiding Questions:

  • Are the test questions appropriate and does the test measure the domain of interest?

  • Does the test contain enough information to cover appropriately what it is supposed to measure?

  • What is the level of master at which the content is being assessed?

    ***NOTE – Content validity does not involve statistical analysis.


Obtaining content validity
Obtaining Content Validity

Two ways:

  • Define the testing universe and administer the test.

  • Have experts rate “how essential” each question is. (1- essential, 2-useful, but not essential, and 3-not necessary) Questions are considered valid if more than ½ experts indicate question is “essential”.


Defining the testing universe
Defining the Testing Universe

  • What is the body of knowledge or behaviors that the test represents?

  • What are the intended outcomes (skills, knowledge)?


Developing a test plan
Developing A Test Plan

Step 1

  • Define testing universe

    • Locate theoretical or empirical research on the attribute

    • Interview experts

      Step 2

  • Develop test specifications

    • Identify content areas (topics to be covered in test)

    • Identify instructional objectives (what one should be able to do with these topics)

      Step 3

  • Establish a test format

    Step 4

  • Construct test questions


Attributes

Concrete Attributes

Attributes that can be described in terms of specific behaviors.

e.g., ability to play piano, do math problems

Abstract Attributes

More difficult to describe in terms of behaviors because people might disagree on what the behaviors present

e.g., intelligence, creativity, personality

Attributes


Chapter 8 using tests to make decisions criterion related validity

Chapter 8Using Tests to Make Decisions:Criterion-Related Validity


What is a criterion
What is a criterion?

  • This is the standard by which your measure is being judged or evaluated.

  • The measure of performance that is correlated with test scores.

  • An evaluative standard that can be used to measure a person’s performance, attitude, or motivation.


Two ways to demonstrate criterion related validity
Two Ways to Demonstrate Criterion-Related Validity

  • Predictive Method

  • Concurrent Method


Criterion related validity
Criterion-Related Validity

  • Predictive validity – correlating test scores with future behavior on the behavior…after examinees have had a chance to exhibit the predicted behavior; e.g., success on the job.


Chapter 7 evaluating what a test really measures

Concurrent validity – correlating test scores with an independent measure of the same trait that the test is designed to measure – currently available.

Or being able to distinguish between groups known to be different; i.e., significantly different mean scores on the test.


Examples of concurrent validity
Examples of Concurrent Validity

E.g.1, Teachers’ ratings of reading ability validated by correlating with reading test scores.

E.g.2, validate an index of self-reported delinquency by comparing responses to office police records on the respondents.


Chapter 7 evaluating what a test really measures

  • In both predictive and concurrent validity, we validate by comparing scores with a criterion (the standard by which your measure is being judged or evaluated).

  • Most appropriate for tests that claim to predict outcomes.

  • Evidence of criterion-related validity depends on empirical or quantitative methods of data analysis.


Example of how to determine predictive validity
Example of How To Determine Predictive Validity comparing scores with a criterion (the standard by which your measure is being judged or evaluated).

  • Give test to applicants for a position.

  • For all those hired, compare their test scores to supervisors’ rating after 6 months on the job.

  • The supervisors’ ratings are the criterion.

  • If employees scored on the test similarly to supervisors’ ratings, then predictive validity of test is supported.


Problems with using predictive validity
Problems with using predictive validity comparing scores with a criterion (the standard by which your measure is being judged or evaluated).

  • Restricted range of scores on either predictor or criterion measure will cause an artificially lower correlation.

  • Attrition of criterion scores; i.e., some folks drop out before you can measure them on the criterion measure (e.g., 6 months later).


Selecting a criterion
Selecting a Criterion comparing scores with a criterion (the standard by which your measure is being judged or evaluated).

  • Objective criteria: observable and measurable; e.g., sales figures, number of accidents, etc.

  • Subjective criteria: based on a person’s judgment; e.g., employee job ratings. Example…


Chapter 7 evaluating what a test really measures

CRITERION MEASUREMENTS MUST THEMSELVES BE VALID! comparing scores with a criterion (the standard by which your measure is being judged or evaluated).

  • Criteria must be representative of the events that they are supposed to measure.

    • i.e., sales ability – not just $ amount, but also # of sales calls made, size of target population, etc.

  • Criterion Contamination – If the criterion measures more dimensions than those measured by the test.


Chapter 7 evaluating what a test really measures

BOTH PREDICTOR AND CRITERION MEASURES MUST BE RELIABLE comparing scores with a criterion (the standard by which your measure is being judged or evaluated).FIRST!

  • E.g., inter-rater reliability obtained by supervisors rating the same employees independently.

  • Reliability estimates of predictors can be obtained by one of the 4 methods covered in Chapter 6.


Calculating estimating validity coefficients
Calculating & Estimating Validity Coefficients comparing scores with a criterion (the standard by which your measure is being judged or evaluated).

  • Validity Coefficient – Predictive and concurrent validity also represented by correlation coefficients. Represents the amount or strength of criterion-related validity that can be attributed to the test.


Two methods for evaluating validity coefficients
Two Methods for Evaluating Validity Coefficients comparing scores with a criterion (the standard by which your measure is being judged or evaluated).

  • Test of significance: A process of determining what the probability is that the study would have yielded the validity coefficient calculated by chance.

    -Requires that you take into account the size of the group (N) from whom we obtained our data.

    -When researchers or test developers report a validity coefficient, they should also report its level of significance.

  • must be demonstrated to be greater than zero

  • p < .05. Look up in table.


Two methods for evaluating validity coefficients1
Two Methods for Evaluating Validity Coefficients comparing scores with a criterion (the standard by which your measure is being judged or evaluated).

  • Coefficient of determination: The amount of variance shared by two variables being correlated, such as test and criterion, obtained by squaring the validity coefficient.

    r2 tells us how much covariation exists between predictor and criterion; e.g., if r = .7, then 49% of the variance is common to both.

    i.e., If correlation (r) is .30, then the coefficient of determination (r2) is .09. (This means that the test and criterion have 9% of their variation in common.)


Using validity information to make predictions
Using Validity Information To Make Predictions comparing scores with a criterion (the standard by which your measure is being judged or evaluated).

  • Linear regression: predicting Y from X.

  • Set a “pass” or acceptance score on Y.

  • Determine what minimum X score (“cutting score”) will produce that Y score or better (“success” on the job)

  • Examples…


Outcomes of prediction
Outcomes of Prediction comparing scores with a criterion (the standard by which your measure is being judged or evaluated).

Hits: a) True positives - predicted to succeed and did.

b) True negatives - predicted to fail and did.

Misses: a) False positives - predicted to succeed and didn’t.

b) False negatives - predicted to fail and would have succeeded.

WE WANT TO MAXIMIZE TRUE HITS AND MINIMIZE MISSES!



Chapter 9 construct validity
Chapter 9 predictionConstruct Validity


Construct validity
Construct Validity prediction

  • The extent to which the test measures a theoretical construct.

  • Most appropriate when a test measures an abstract construct (i.e., marital satisfaction)


What is a construct
What is a construct? prediction

  • An attribute that exists in theory, but is not directly observable or measurable. (Remember there are 2 kinds: concrete and abstract.)

  • We can observe & measure the behaviors that show evidence of these constructs.

  • Definitions of constructs can vary from person to person.

    • i.e., Self-efficacy

  • Example…


Chapter 7 evaluating what a test really measures


Construct validity1
Construct Validity defined you must use indirect measures of the construct, e.g., a scale which references behaviors that we consider evidence of the construct.

  • Evidence of construct validity of a scale may be provided by comparing high vs. low scoring people on behavior implied by the construct, e.g., Do high scorers on the Attitudes Toward Church Going Scale actually attend church more often than low scorers?

  • Or by comparing groups known to differ on the construct; e.g., comparing pro-life members with pro-choice members on Attitudes Toward Abortion scale.


Construct validity cont d
Construct Validity (cont’d) defined you must use indirect measures of the construct, e.g., a scale which references behaviors that we consider evidence of the construct.

  • Factor analysis also gives you a look at the unidimensionality of the construct being measured; i.e., homogeneity of items.

  • As does the split-half reliability coefficient.

  • ONLY ONE CONSTRUCT CAN BE MEASURED BY ONE SCALE!


Convergent validity
Convergent Validity defined you must use indirect measures of the construct, e.g., a scale which references behaviors that we consider evidence of the construct.

  • Evidence that the scores on a test correlate strongly with scores on other tests that measure the same construct.

    • i.e.,would expect two measures on general self-efficacy to yield strong, positive, and statistically significant correlations.


Discriminant validity
Discriminant Validity defined you must use indirect measures of the construct, e.g., a scale which references behaviors that we consider evidence of the construct.

  • When the test scores are not correlated with unrelated constructs.


Multitrait multimethod method
Multitrait-Multimethod Method defined you must use indirect measures of the construct, e.g., a scale which references behaviors that we consider evidence of the construct.

  • Searching for convergence across different measures of the same thing and for divergence between measures of different things.


Face validity
Face Validity defined you must use indirect measures of the construct, e.g., a scale which references behaviors that we consider evidence of the construct.

  • The items look like they reflect whatever is being measured.

  • The extent to which the test taker perceives that the test measures what it is supposed to measure.

  • The attractiveness and appropriateness of the test at perceived by the test takers.

  • Influences how test takers approach the test.

  • Uses experts to evaluate.


Which type of validity would be most suitable for the following
Which type of validity would be most suitable for the following?

a) mathematics test

b) intelligence test

c) vocational interest inventory

d) music aptitude test


Discuss the value of predictive validity to each of the following
Discuss the value of predictive validity to each of the following?

a) personnel manager

b) teacher or principal

c) college admissions officer

d) prison warden

e) psychiatrist

f) guidance counselor

g) veterinary dermatologist

h) professor in medical school