chapter 7 evaluating what a test really measures n.
Skip this Video
Loading SlideShow in 5 Seconds..
Chapter 7 Evaluating What a Test Really Measures PowerPoint Presentation
Download Presentation
Chapter 7 Evaluating What a Test Really Measures

Loading in 2 Seconds...

play fullscreen
1 / 42

Chapter 7 Evaluating What a Test Really Measures - PowerPoint PPT Presentation

  • Uploaded on

Chapter 7 Evaluating What a Test Really Measures. Validity. APA – Standards for Educational and Psychological Testing (1985) – Recognized three ways of deciding whether a test is sufficiently valid to be useful.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Chapter 7 Evaluating What a Test Really Measures' - dara-williams

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
  • APA – Standards for Educational and Psychological Testing (1985) – Recognized three ways of deciding whether a test is sufficiently valid to be useful.

Validity: Does the test measure what it claims to measure?The appropriateness with which inferences can be made on the basis of test results.

  • There is no single type of validity appropriate for all testing purposes.
  • Validity is not a matter of all or nothing, but a matter of degree.
types of validity
Types of Validity
  • Content
  • Criterion-Related (concurrent or predictive)
  • Construct
  • Face
content validity
Content Validity
  • Whether items (questions) on a test are representative of the domain (material) that should be covered by the test.
  • Most appropriate for test like achievement tests (i.e., concrete attributes)
content validity1
Content Validity

Guiding Questions:

  • Are the test questions appropriate and does the test measure the domain of interest?
  • Does the test contain enough information to cover appropriately what it is supposed to measure?
  • What is the level of master at which the content is being assessed?

***NOTE – Content validity does not involve statistical analysis.

obtaining content validity
Obtaining Content Validity

Two ways:

  • Define the testing universe and administer the test.
  • Have experts rate “how essential” each question is. (1- essential, 2-useful, but not essential, and 3-not necessary) Questions are considered valid if more than ½ experts indicate question is “essential”.
defining the testing universe
Defining the Testing Universe
  • What is the body of knowledge or behaviors that the test represents?
  • What are the intended outcomes (skills, knowledge)?
developing a test plan
Developing A Test Plan

Step 1

  • Define testing universe
    • Locate theoretical or empirical research on the attribute
    • Interview experts

Step 2

  • Develop test specifications
    • Identify content areas (topics to be covered in test)
    • Identify instructional objectives (what one should be able to do with these topics)

Step 3

  • Establish a test format

Step 4

  • Construct test questions
Concrete Attributes

Attributes that can be described in terms of specific behaviors.

e.g., ability to play piano, do math problems

Abstract Attributes

More difficult to describe in terms of behaviors because people might disagree on what the behaviors present

e.g., intelligence, creativity, personality

what is a criterion
What is a criterion?
  • This is the standard by which your measure is being judged or evaluated.
  • The measure of performance that is correlated with test scores.
  • An evaluative standard that can be used to measure a person’s performance, attitude, or motivation.
two ways to demonstrate criterion related validity
Two Ways to Demonstrate Criterion-Related Validity
  • Predictive Method
  • Concurrent Method
criterion related validity
Criterion-Related Validity
  • Predictive validity – correlating test scores with future behavior on the behavior…after examinees have had a chance to exhibit the predicted behavior; e.g., success on the job.
Concurrent validity – correlating test scores with an independent measure of the same trait that the test is designed to measure – currently available.

Or being able to distinguish between groups known to be different; i.e., significantly different mean scores on the test.

examples of concurrent validity
Examples of Concurrent Validity

E.g.1, Teachers’ ratings of reading ability validated by correlating with reading test scores.

E.g.2, validate an index of self-reported delinquency by comparing responses to office police records on the respondents.

In both predictive and concurrent validity, we validate by comparing scores with a criterion (the standard by which your measure is being judged or evaluated).
  • Most appropriate for tests that claim to predict outcomes.
  • Evidence of criterion-related validity depends on empirical or quantitative methods of data analysis.
example of how to determine predictive validity
Example of How To Determine Predictive Validity
  • Give test to applicants for a position.
  • For all those hired, compare their test scores to supervisors’ rating after 6 months on the job.
  • The supervisors’ ratings are the criterion.
  • If employees scored on the test similarly to supervisors’ ratings, then predictive validity of test is supported.
problems with using predictive validity
Problems with using predictive validity
  • Restricted range of scores on either predictor or criterion measure will cause an artificially lower correlation.
  • Attrition of criterion scores; i.e., some folks drop out before you can measure them on the criterion measure (e.g., 6 months later).
selecting a criterion
Selecting a Criterion
  • Objective criteria: observable and measurable; e.g., sales figures, number of accidents, etc.
  • Subjective criteria: based on a person’s judgment; e.g., employee job ratings. Example…
  • Criteria must be representative of the events that they are supposed to measure.
    • i.e., sales ability – not just $ amount, but also # of sales calls made, size of target population, etc.
  • Criterion Contamination – If the criterion measures more dimensions than those measured by the test.
  • E.g., inter-rater reliability obtained by supervisors rating the same employees independently.
  • Reliability estimates of predictors can be obtained by one of the 4 methods covered in Chapter 6.
calculating estimating validity coefficients
Calculating & Estimating Validity Coefficients
  • Validity Coefficient – Predictive and concurrent validity also represented by correlation coefficients. Represents the amount or strength of criterion-related validity that can be attributed to the test.
two methods for evaluating validity coefficients
Two Methods for Evaluating Validity Coefficients
  • Test of significance: A process of determining what the probability is that the study would have yielded the validity coefficient calculated by chance.

-Requires that you take into account the size of the group (N) from whom we obtained our data.

-When researchers or test developers report a validity coefficient, they should also report its level of significance.

  • must be demonstrated to be greater than zero
  • p < .05. Look up in table.
two methods for evaluating validity coefficients1
Two Methods for Evaluating Validity Coefficients
  • Coefficient of determination: The amount of variance shared by two variables being correlated, such as test and criterion, obtained by squaring the validity coefficient.

r2 tells us how much covariation exists between predictor and criterion; e.g., if r = .7, then 49% of the variance is common to both.

i.e., If correlation (r) is .30, then the coefficient of determination (r2) is .09. (This means that the test and criterion have 9% of their variation in common.)

using validity information to make predictions
Using Validity Information To Make Predictions
  • Linear regression: predicting Y from X.
  • Set a “pass” or acceptance score on Y.
  • Determine what minimum X score (“cutting score”) will produce that Y score or better (“success” on the job)
  • Examples…
outcomes of prediction
Outcomes of Prediction

Hits: a) True positives - predicted to succeed and did.

b) True negatives - predicted to fail and did.

Misses: a) False positives - predicted to succeed and didn’t.

b) False negatives - predicted to fail and would have succeeded.


construct validity
Construct Validity
  • The extent to which the test measures a theoretical construct.
  • Most appropriate when a test measures an abstract construct (i.e., marital satisfaction)
what is a construct
What is a construct?
  • An attribute that exists in theory, but is not directly observable or measurable. (Remember there are 2 kinds: concrete and abstract.)
  • We can observe & measure the behaviors that show evidence of these constructs.
  • Definitions of constructs can vary from person to person.
    • i.e., Self-efficacy
  • Example…
When some trait, attribute or quality is not operationally defined you must use indirect measures of the construct, e.g., a scale which references behaviors that we consider evidence of the construct.
  • But how can we validate that scale?
construct validity1
Construct Validity
  • Evidence of construct validity of a scale may be provided by comparing high vs. low scoring people on behavior implied by the construct, e.g., Do high scorers on the Attitudes Toward Church Going Scale actually attend church more often than low scorers?
  • Or by comparing groups known to differ on the construct; e.g., comparing pro-life members with pro-choice members on Attitudes Toward Abortion scale.
construct validity cont d
Construct Validity (cont’d)
  • Factor analysis also gives you a look at the unidimensionality of the construct being measured; i.e., homogeneity of items.
  • As does the split-half reliability coefficient.
convergent validity
Convergent Validity
  • Evidence that the scores on a test correlate strongly with scores on other tests that measure the same construct.
    • i.e.,would expect two measures on general self-efficacy to yield strong, positive, and statistically significant correlations.
discriminant validity
Discriminant Validity
  • When the test scores are not correlated with unrelated constructs.
multitrait multimethod method
Multitrait-Multimethod Method
  • Searching for convergence across different measures of the same thing and for divergence between measures of different things.
face validity
Face Validity
  • The items look like they reflect whatever is being measured.
  • The extent to which the test taker perceives that the test measures what it is supposed to measure.
  • The attractiveness and appropriateness of the test at perceived by the test takers.
  • Influences how test takers approach the test.
  • Uses experts to evaluate.
which type of validity would be most suitable for the following
Which type of validity would be most suitable for the following?

a) mathematics test

b) intelligence test

c) vocational interest inventory

d) music aptitude test

discuss the value of predictive validity to each of the following
Discuss the value of predictive validity to each of the following?

a) personnel manager

b) teacher or principal

c) college admissions officer

d) prison warden

e) psychiatrist

f) guidance counselor

g) veterinary dermatologist

h) professor in medical school