Download Presentation
## Test Construction and Measurement

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**An Experiment**• Researcher gave students the Diagnostic Inventory Blank • Hobbies, reading interests, secret hopes and ambitions • Then gave students typed descriptions of their personalities • Asked students to rate how well personality sketch described them**Sample Personality Description**• You have a need for other people to like and admire you • You have a tendency to be critical of yourself • While you have some personality weaknesses, you are generally able to compensate for them**Disciplined and self-controlled outside, you tend to be**worrisome and insecure inside • You pride yourself as an independent thinker and do not accept others’ statements without proof • At times you are extraverted and sociable, while at other times you are introverted and reserved**Result**• Almost all students were very impressed with how well DIB described them • Rated DIB as very accurate personality test**Problem**• Every student was given exactly the same personality description**The Lesson**• Beware of the Barnum effect • Tendency of people to see vague, universal statements as descriptive of themselves**Major Point**• Real psychological measurement is a complicated and difficult process**A Preview**• Correlation • Steps in constructing a psychological test • Reliability and validity • Factor analysis**Correlational Research**• Focuses on relationships among variables • Changes in one variable are associated with changes in another variable**Correlation Coefficient**• Number which expresses the direction and strength of the relationship between 2 variables • Ranges from -1 to 1 • Index of the degree to which scores on one measure can be used to predict scores on a 2nd measure**Direction**• Indicated by + or - sign (slope) • Positive correlation • as one variable goes up, so does the other • Negative correlation • As one variable goes up, the other goes down**Strength**• Indicated by absolute value • perfect positive relationship = 1 • perfect negative relationship = -1 • no relationship = 0**Percent of Variance**• Percent of variance in “measure A” that can be accounted for “measure B” • square correlation coefficient and multiply by 100 • Correlation of .50 means we can account for 25% of variance**Causality**• Correlation just tells you that 2 variables are related • Can’t make causal interpretations**Fact: Time spent on the internet is positively correlated**with depression • Possible interpretations • Spending lots of time on internet causes depression • Being depressed causes you to spend lots of time on internet • Some third variable, such as living by one’s self, causes both**Major Point**• It is difficult, but not impossible, to construct a meaningful psychological test**Steps in Test Construction**1. Decide what to measure • Identify construct • Idea that helps us makes sense of world around us • Not directly observable • Examples: intelligence, extraversion, racism, pessimism, creativity**Steps (continued)**2. Develop a set of items/questions • Search literature • Get experts or lay people to tell us what construct means to them**Steps (continued)**3. Get sample of people to answer items • From population you want to use test for**Steps (continued)**4. Evaluate each item • Correlate each item with mean of whole set • Correlate each item with item directly assessing self-reported racism • drop bad items**Steps (continued)**5. Select a set of items for further study • Want normal distribution • Drop high YES and high NO items**Steps (continued)**6. Assess reliability of entire test • Consistency of measurement • 3 major types**Reliability**1. Inter-rater: • Extent to which different people scoring same test get same result • Correlate set of tests scored by one rater with same set of tests scored by different rater**Reliability**2. Test-retest: • Extent to which people get same results if take test again • Subjects take test twice. Correlate set of time 1 scores with time 2 scores**Reliability**3. Internal consistency: • Split-half: correlation between one half of test and other half • Coefficient alpha: average of all possible split-half reliabilities**How high should reliability correlations be?**• Expect r = .80 or better**Factors that influence reliability**• Clarity of items • Motivation of test taker • Number of items**Steps (continued)**7. Assess validity of entire test • Extent to which test measures what it is supposed to measure • Face validity not sufficient • Do series of validity studies**Ways to Measure Validity**1. Criterion validity • Correlation between test and concrete, directly observable criterion • Example: correlate self-report of weight with actual weight on scale**Ways to Measure Validity (continued)**2. Content validity • Adequate coverage of target domain • Example: test of chapters 1-4 which only covers chapter 2 and 3 lacks content validity**Ways to Measure Validity (continued)**3. Convergent validity • Agreement among alternative measures of same construct • Example: correlation between ACT and SAT 4. Discriminant validity • Lack of correlation between tests that are intended to measure different constructs • Example: expect low correlation between ACT and test of aggression**Threats to Validity**Response tendency • Assign numbers to items for reasons that have little to do with the construct the item is intended to measure**Response Tendencies**• Extremity tendency • Use end of scales • Acquiescence tendency • Agree with questions • Social desirability • Answer in a way that makes you look good**Factor Analysis**• Statistical technique that examines pattern of correlations among multiple tests or items • Tests or items that correlate strongly with one another are considered to represent a common, underlying factor**Interpreting Factor Analysis**• Each item has a factor loading: correlation between item and factor • Marker variable • item that has high factor loading (correlation) with given factor • closely related to meaning of factor • Blend • item that loads moderately high on more than one factor • not a pure measure of factor, related to two or more factors