230 likes | 452 Views
Introductions . NameProgram? Year in Program?Educational
E N D
1. Adult Psychological Assessment Mark J. Sergi, Ph.D.
2. Introductions Name
Program? Year in Program?
Educational & Career Goals
Assessment Experience
3. Volunteer Assessment Participant Friend of a Friend
Mentally Healthy
No Feedback
about 8 hours of testing
4. Who May Conduct Assessments?
Competence
APA’s Guidelines for Test User Qualifications
Training
Knowledge of Psychometrics and Measurement
Practica Experiences
Doctoral Level Psychologists
School Psychologists (Axs within scope of training)
OK to give child a WISC-III
Not OK to give mother an MMPI-2
Guild Issue
Assessment is unique to Psychologists
Ethics
APA’s Ethical Principles
“Ethics Made Simple” <In Class Assignment>
5. Brief Review of Psychometric Theories and Constructs
6. Test Construction Approaches to Test Construction
1. Criterion-Keyed Approach
Strictly Empirical Approach
Select Items That Discriminate Between Two Populations Regardless Of Content
Ex. MMPI-2 Basic Scales
2. Analytic Approach To Test Construction
Begin With Items Based On Theory
Administer To Large Sample
Factor Analyze To Determine Relatedness Of Items
3. Rational/Deductive Method Of Test Construction
Combines The Empirical And Analytic Approaches
Most Frequently Used Approach Of Modern Test Developers
7. The Process of Modern Test Construction Specify The Test’s Purpose
Generate Test Items
Based On Theory (Relational Models Th.) Or Established Construct (e.g., Major Depression)
Administering Draft Test, Item Analysis, Revise (Iterative Process)
Evaluating The Test’s Reliability And Validity
8. Item Analysis Item Selection
Item’s Relevance
Item’s Difficulty Level
Item’s Discriminability
Ability To Discriminate Between Persons With Different Levels Of The Characteristic Being Assessed
9. Item Relevance “The Extent To Which The Test Items Contribute To Achieving The Stated Goals Of Testing”
Based On A Qualitative Judgment That Considers:
Content Appropriateness
Does The Item Actually Assess The Content Domain That The Test Is Designed To Evaluate?
Taxonomic Level
Does The Item Reflect The Appropriate Cognitive Level Of The Target Population?
Does The Item Reflect The Level Of Pathology You Are Interest In (Test Of Major Depression Vs Sad Mood)?
Extraneous Abilities
To What Extent Does The Item Require Knowledge, Skills, Or Abilities Outside The Domain Of Interest?
10. Item Difficulty Item Difficulty Index (p)
Ranges From 0 To 1
p = 1.0 -> All Examinees Correctly Answered The Item
p = 0.0 -> None Of The Examinees Correctly Answered The Item
For Many Tests, Items With Moderate Difficulty (p = .5) Are Retained In Order To:
Increase Test Score Variability
Insure A Normal Distribution Of Scores
Maximize Differentiation Between Ss
Low p Items May Be Useful In Distinguishing Between High Performers Or Most Severely Affected
Item Difficulty Is Affected By
Rate Of Correct Guessing
T/F Tests …..Average Item Difficulty Of .75 Is Desired
Proportion Of Examinees To Be Selected
If You’re Selecting The Top 10% Of Your Class, Then Your Average p Should Be Set Near .10
11. Item Discrimination “The Extent To Which An Item Differentiates Between Examinees Who Obtain Poor Or High Scores On The Test As A Whole”
Item Discrimination Index (D)
D = U – L
Where U = The Percent Of “High Scorers” (e.g., Upper 50th Percentile) Who Answered The Item Correctly
And L = The Percent Of “Low Scorers” (e.g., Lower 50th Percentile) Who Answered The Item Correctly
D Ranges From –1.0 To +1.0
For Most Tests, D Of .35 Or Higher Is Acceptable
Items With Moderate D (.50) Have The Greatest Potential For Optimal Differentiation Between Examinees
12. Theories of Test Construction Classical Test Theory
Views An Obtained Test Score As Reflecting A Combination Of Truth And Error
Item And Test Parameters Are Sample-Dependent (i.e., Item Difficulty Index and Reliability Coefficient Are Likely To Vary From Sample To Sample)
Item Response Theory (Latent Trait Model)
Parameters Are Sample Invariant (Same From Sample To Sample)
Item Characteristic Curve Is Derived For Each Item By Plotting The Proportion Of Examinees Who Answered The Item Correctly Against Either The Total Test Score, Performance On An External Criterion, Or A Mathematically-derived Estimate Of A Latent Ability Or Trait
13. Reliability Reliability = Consistency
According To Classical Test Theory, An Examinee’s Obtained Test Score (X) Is Composed Of Two Components, A True Score Component (T) And An Error Component (E)
X = T + E
T = The Examinee’s Status With Regard To The Construct Being Assessed
E = Measurement Error (Random Error Due To Irrelevant Factors)
Reliability Coefficients Range From 0.0 To 1.0
0.0 => All Variability In Obtained Test Scores Is Due To Measurement Error
1.0 => All Variability In Obtained Test Scores Reflects True Score Variability (Differences In The Construct Amongst Examinees)
Reliability Coefficient Is Interpreted Directly As The Proportion Of Variability In Obtained Test Scores That Reflects True Score Variability
e.g., r = .84 Means That 84% Of The Variance In Scores Is Due To True Score Variability And The Remaining 16% Of The Variability In Test Scores Is Due To Measurement Error
14. Methods For Estimating Reliability Test-Retest Reliability (Consistency Over Time)
The Same Test Is Given To The Same Examinees On Two Occasions
Good For Measures Of Stable Characteristics
Good For Measures Of Characteristics Not Affected By Repeated Measurement
Alternate Forms Reliability
Two Equivalent Forms Of A Test Are Administered To The Same Group Of Examinees And The Two Sets Of Scores Are Correlated
May Also Be Used To Assess Consistency Over Time
Considered To Be The Best Way Of Estimating Reliability By Many Experts
Internal Consistency Reliability (e.g., Split-Half, Coefficient Alpha)
Single Administration To A Group Of Examinees
Coefficient Alpha: Special Formula Looks At All Possible Split Halves
Kuder-Richardson Formula 20: Variation Of Coefficient Alpha Used When Items Are Scored Dichotomously
Inter-Rater Reliability (Consistency Across Scorers)
Projective Tests, Behavior Observation Techniques
Sources Of Error
Lack Of Motivation
Rater Biases
Observer Drift/ Consensual Observer Drift
15. Factors Affecting the Reliability Coefficient Test Length
Increase Test Length, Increase Reliability
Range Of Test Scores
Increase Range Of Test Scores, Increase Reliability
Guessing
Increase Possibility Of Accurate Guessing, Decrease Reliability
16. Interpretation of Reliability Reliability Coefficient
.80 or greater is acceptable for clinical measures
.70 or greater is acceptable for research measures
will vary from sample to sample
Standard Error of Measurement
used to interpret an examinee’s obtained score
confidence interval around examinee’s score
17. Validity Validity = Accuracy
Validity = The Extent To Which A Test Measures What It Intends To Measure
18. Establishing Validity Content Validity
Extent To Which A Test Adequately Samples The Content Domain It Purports To Sample
Reliance On Experts
Tests With Content Validity
Have High Internal Consistency
Correlate Highly With Tests That Purport To Measure The Same Domain
19. Establishing Validity Construct Validity
Does The Test Measure A Distinct And Coherent Construct?
Evidence Of Construct Validity
High Internal Consistency (Items Relate To Total Score)
Test Distinguishes Between Persons Know To Have Different Levels Of The Construct
Test Scores Change After Intervention Designed To Affect The Construct
Evidence of Convergent Vailidity (Association With Measures Assessing The Same Construct) And Discriminant Validity (Lack Of Association Between Measures Assessing Different Constructs)
Evaluate With Multitrait-Multimethod Matrix (Campbell & Fiske, 1959)
Monotrait-Monomethod Coef (Reliability Coef)
Monotrait-Heteromethod Coef (If Large, Evidence Of Convergent Validity)
Heterotrait-Monomethod Coef (If Small, Evidence Of Discriminant Validity)
Heterotrait-Heteromethod Coef (If Small, Evidence Of Discriminant Validity)
20. Establishing Validity Criterion-Related Validity
Estimates Or Predicts An Examinee’s Standing Or Performance On An External Criterion
Assessed By Correlating The Scores Of A Sample Of Individuals On The Predictor With Their Scores On The Criterion
Concurrent Vs Predictive Validity
Time Difference
Concurrent Validity Is Being Assessed When The Predictor And Criterion Variables Are Assessed At The Same Time
Predictive Validity Is Being Assessed When The Criterion Is Assessed Some Time After The Predictor Is Assessed
Interpretation Of The Criterion-related Validity Coefficient
Rarely Exceeds .60; .20 To .30 May Be Acceptable
21. Psychological Assessment Terms Standardized Test
A Test Is Standardized When The Testing Parameters (Apparatus, Tester Behavior, Administration Procedures, Scoring Procedures) Are Fixed So That Tests Administered At Different Times Or To Different Persons May Be Compared.
Standardized Procedures Allows One To Establish “Norms”
“Norm-referenced Testing”
Objective Vs Subjective Tests
Objective Tests
Fixed, Well-defined Scoring Procedures
e.g., Multiple Choice Tests, True-False Tests
Subjective Tests
Open-ended Scoring System
TAT W/ Response Examples
22. Psychological Assessment Terms Speed vs Power Tests
Speed Tests
All Items Are Within The Ability Of All Intended Subjects
Administered With Strict Time Limit So Differences Between Subjects Reflect Differences In Speed Of Responding
e.g., Single Digit Addition Tests In Elementary School…Speed Reflects Mastery
Power Tests
Generous Time Limit
Items Range In Difficulty Level
Differences In Scores Reflect Differences In Ability Or Knowledge
23. Psychological Assessment Terms Traditional vs Behavioral Assessment
Traditional Assessment
Basic Assumptions
Personality And Behavior Are The Product Of Stable Internal Factors
Intrapsychic Variables Drive Behavior)
Purpose
Describe The Personality Or Its Etiology, To Diagnose, Or Make Predictions
Behavioral Assessment
Basic Assumptions
Situational Or Interactionist
Situational Approach: Behavior Is A Product Of The Environment
Interactionist Approach: Behavior Is A Product Of An Interaction Between The Individual And The Environment
Purpose
Describe The Target Behavior, Its Maintaining Conditions, Select An Appropriate Treatment, Or Evaluate The Effectiveness Of A Treatment
Functional Analysis
Identification Of The Environmental Variables (Antecedents And Consequences) That Control A Behavior (A –B – C)
24. Psychological Assessment Terms Decision-Making: Actuarial vs Clinical Prediction
Actuarial (Statistical) Prediction
Based On Empirically Validated Relationships Between The Test Results And Target Criteria
Use Of Regression Analysis To Predict Target Behavior
Clinical Prediction
Based On The Clinician Intuition/Judgment
Meehl (1954): Actuarial Prediction Always Equal To Or Better Than Clinical Prediction
Combining Actuarial And Clinical Prediction May Be The Best Approach
Increase Valid Information, Increase Prediction
Multiple Sources Increases Prediction
Base Rate Behavior
Observation
History
Test Results