Lecture 4 CONSTRUCT VALIDITY

1. 1 Lecture 4CONSTRUCT VALIDITY

2. 2 Validity A test is said to be VALID if it measures what it is supposed to measure.

3. 3 Summary � There have been many different interpretations of validity. There are FOUR main approaches: FACE VALIDITY CONTENT VALIDITY PREDICTIVE VALIDITY CONSTRUCT VALIDITY.

4. 4 Tests in action Psychometric tests are now widely used in job selection. There, the emphasis is upon PREDICTIVE validity. I have 100 applications for three places on a course in electronics. Which applicant shall I choose? I know very little about any of the applicants. I have an hour or so to make a decision.

5. 5 A valid test Fortunately, I have a test which enables me to predict success on the course. The test is highly reliable; moreover, there is a large body of data showing that those who do best on the test tend to perform best on the electronic course itself. My test is not only RELIABLE but also VALID.

6. 6 Theory What exactly is the test measuring? Perhaps it doesn�t really matter. It is simply an instrument I use to help select the right candidate. There is practical justification for saying, �This test measures whatever ability (or abilities) the course requires�!

7. 7 Practice or theory? The usefulness of a test, that is, its PREDICTIVE VALIDITY, is improved by continuously modifying its items so that it meets STATISTICAL criteria. But the items that perform best may not seem theoretically to be the best measures of what the test was originally supposed to be measuring. Thus there can be a TENSION between considerations of psychometric PERFORMANCE and the building of sound THEORY.

8. 8 History The mental testing movement received an enormous boost from the two world wars. New recruits had to be assigned at short notice to activities they could perform. Not everyone can be a navigator in a bomber crew, for example. In such circumstances, theoretical considerations about what exactly the tests were measuring seemed largely irrelevant, as long as they helped to assign the right person to the right job.

9. 9 Methodology Cognitive psychology makes greatest use of EXPERIMENTAL METHOD, because that approach enables the researcher to identify the key variables. Psychometrics is an essentially CORRELATIONAL enterprise. It is very difficult to identify crucial variables from correlational data. It is therefore difficult to map the results of psychometric research on to those of cognitive psychology.

10. 10 4. Construct validity The extent to which a test can be shown to measure a hypothetical construct is known as its CONSTRUCT VALIDITY. Here the emphasis switches from PREDICTION to THEORY. Of the various kinds of validity, construct validity is by far the most difficult to demonstrate.

11. 11 Demonstration of construct validity Your test must CORRELATE substantially with SOME other variables (CONVERGENCE). But your Your measure must also show DISSOCIATION from other variables (DIVERGENCE). Where expected, your measure should also show AGE DIFFERENTIATION. Cognitive ability, for example, increases with age and any supposed test of cognitive ability should reflect this developmental trend.

12. 12 Field Dependence-Independence Witkin held that people vary on a hypothetical psychological dimension he called FIELD-DEPENDENCE-INDEPENDENCE. The field-independent person is supposed to be able to analyse the total �field� of experience into its component parts and manipulate the parts independently of the overall organisation in order to solve a variety of problems. This analytic capacity is claimed to be wide-ranging and to pervade most aspects of a person�s mental life.

13. 13 Witkin�s tests I described three of Witkin�s tests: The Rod-and-frame Test (RFT); The Embedded Figures Test (EFT); The Body Adjustment Test (BAT).

14. 14 Convergence? The person who can adjust the rod to the true vertical (in the RFT) should be able to see the embedded figure (in the EFT). Such people should also be able to adjust their chairs to the upright position (BAT), despite the tilt of the walls of the artificial �room�.

15. 15 Convergence � Since they are supposed to be measuring the same hypothetical construct (field-dependence-independence), Witkin�s tests should certainly correlate highly with one another. Since they are cognitive tests, however, they could also be expected to correlate positively with at least SOME of the abilities that are required for performance on an intelligence test.

16. 16 Witkin�s evidence Witkin (and many others) have shown that there are indeed substantial positive CORRELATIONS among the EFT, BAT and RFT tests. The person who adjusts the rod to the true vertical can also make the chair upright and quickly spot the embedded figures. The person who cannot spot the embedded figure insists that the rod is vertical when it is actually aligned with the long axis of the frame and claims that a chair is truly upright when it is actually aligned with the tilted room.

17. 17 Convergent validation Each of the three measures correlates significantly and substantially with the other two. The correlations in the table below are typical of those found in many studies by many different teams of researchers. The criterion of CONVERGENCE is met by Witkin�s tests.

18. 18 Just intelligence? Witkin�s measures correlate positively with the Full Scale WAIS IQ. For example, one study (Witkin, 1965) showed that EFT and WAIS IQ correlated significantly: r(72) = .36; p < .01 . Is Witkin�s hypothetical construct simply INTELLIGENCE? Is there really a separate dimension of Field-Dependence-Independence? To make his case, Witkin must also show theoretically meaningful DISSOCIATION, or DIVERGENCE, of his measures from other cognitive activities.

19. 19 The WAIS items Information. Picture Completion. Digit Span. Picture arrangement. Vocabulary. Block Design. Arithmetic. Object Assembly. Comprehension. Digit Symbol. Similarities.

20. 20 The �analytical� subgroup Consider: Block Design Picture Arrangement. Object Assembly. According to Witkin, these three tests all require the participant to analyse the field into its component parts and reassemble them to solve the problem. This is not true of other subtests, such as Vocabulary, Comprehension or Digit Span. Witkin therefore predicted that the EFT should correlate highly with the tests in the �analytical� subgroup, but not significantly with the other WAIS items.

21. 21 Divergence The EFT does indeed correlate highly with the Kohs blocks, from the analytical subgroup. But the correlation with non-analytic items such as Vocabulary is insubstantial and insignificant. Witkin has demonstrated the DIVERGENCE he needs to demonstrate the CONSTRUCT VALIDITY of his tests as measuring a distinct dimension of cognition.

22. 22 Construct validity of Witkin�s tests Witkin has made a cogent case for the construct validity of his tests of field-dependence-independence. There is CONVERGENCE: the tests correlate substantially among themselves; and they also correlate significantly with IQ, as they should do. But there is also DIVERGENCE: the tests correlate strongly with the analytical subgroup of WAIS tests; but they do not correlate with �non-analytic� items such as vocabulary and arithmetic.

23. 23 Nonverbal working memory In the first lecture, I described two measures of non-verbal working memory: The Corsi Blocks Test; The Visual Patterns Test.

24. 24 The Corsi and Visual spans The Corsi Span is the length of the longest sequence of tapped blocks that the participant can correctly reproduce. The Visual Span is the size of the largest pattern that the participant can correctly reproduce.

25. 25 The Visual Patterns Test: Does it have construct validity? It is claimed that the Visual Patterns Test measures visual storage in purer form than the Corsi Blocks Test, which measures visual plus spatial working memory. But could both tests be measuring the same functions?

26. 26 Convergence The VP and the CB should correlate positively and significantly. But, since the CB taps more than visual memory, the correlation should be far from perfect. This is, in fact, the case. There is a significant correlation between the VP and CB tests: r(74) = .27; p < .01. This value of r is similar to the correlation between Field-Dependence-Independence and IQ: although significant, it is suitably small. This correlation accounts for less than 10% of the variance (CD = r2 = .09).

27. 27 Divergence The claim is that the Corsi and Patterns tests are not measuring the same functions. If we can manipulate a theoretically relevant variable and demonstrate differential effects upon the Corsi and Pattern spans, we shall have produced evidence to confirm this claim.

28. 28 An experiment Della Sala, S., Gray, C., Baddeley, A., Allamano, N., & Wilson, L. (1999) Pattern span: A tool for unwelding visuo-spatial memory. Neuropsychologia, 37, 1189-1199.

29. 29 The experiment First, we obtained the Corsi and Visual Patterns spans. Next, the participants performed an interference task. Finally, the Corsi and Visual Patterns spans were redetermined. As expected, the new spans were shorter, as a result of the interference.

30. 30 Interference tasks There were two kinds of interference tasks: 1. Visual; 2. Spatial. We should find that Visual interference has a greater effect upon the Visual Patterns span; but Spatial interference should have more effect upon the Corsi span.

31. 31 A graph showing the differential effects of interference

32. 32 The dissociation pattern Visual interference has much greater shortening effect upon the Pattern Span than upon the Corsi Span. Spatial interference has a much greater shortening effect upon the Corsi Span than it does upon the Pattern Span. Such DIVERGENCE supports the claim that the Patterns and Corsi tests measure different kinds of nonverbal working memory.

33. 33 Age differentiation If a test is supposed to measure a cognitive function, performance on the test should show a typical age trajectory. The Visual Patterns test does indeed show the expected decline from early adulthood: r(345) = -.55; p <.01. The Corsi Blocks test also shows a similarly substantial negative correlation with age.

34. 34 The Colours Test Psychological tests are widely used in industry. The test I am about to described is used in the oil industry to help to assign an employee to the role in a team for which he is best suited. The attributes supposedly measured by the test are letter and colour-coded and the management take note of colour codes when assigning employees to team projects.

35. 35 Four team functions A (RED). Directing and leading. B(YELLOW). Sociability. C(BLUE). Troubleshooting. D(GREEN). Thinking and planning.

36. 36 The Test Instrument The response sheet has 28 boxes to be completed In each box, circle the response that you are Most like Least like (Your �instinctive response� is probably the most accurate. First thoughts are best, here. So try to answer the questions quickly.)

37. 37

38. 38 Analysis Transfer your �Difference� scores to this sheet. Draw a line through the scores. The highest values on the page are your �Dominant� colours. This person�s dominant colours are A and D. This person is a leader and a planner.

39. 39 Interpretation

40. 40 A reliability study I have carried out an informal investigation of the test-retest reliability of the colours test. I gave the Colours Test twice to this class, leaving a week between each session. I obtained sixty-one pairs of responses.

41. 41 Preliminary analysis The profiles are based on the four difference scores. Here is the test-retest reliability for each of these four measures.

42. 42 Directing (A; Red) The scatterplot is a narrow ellipse. There should be a very high correlation. Indeed there is: r (61) = .90; p <.01. This level of reliability is very acceptable.

43. 43 Thinking (D; Green) The scatterplot is a narrow ellipse. The correlation should be high. It is high: r(61) = .85; p < .01. This level of reliability is also very acceptable.

44. 44 Relating (C; Blue) The scatterplot is a narrow ellipse. The correlation should be high. It is: r(61) = .83; p < .01. This level of reliability is very acceptable.

45. 45 Sociability (B; Yellow) This time the scatterplot is messier: there are some outliers. We cannot expect the value of r to be so high. Indeed, it is not: r(61) = .76; p < .01. This level of reliability is just acceptable.

46. 46 Appraisal The Colours Test would appear to be reliable, at least when used with Level 2 students at this university. What is needed is another (larger) study with oil workers. THE NORMS FOR A TEST SHOULD ALWAYS BE GATHERED FROM THE POPULATION IN WHICH THE TEST IS TO BE USED.

47. 47 Appraisal � On the basis of the evidence we have, the test appears to be reliable. But is it also VALID? Do the PROFILES match up with the employees� ACTUAL PERFORMANCE in the team roles to which they have been assigned? Managers think they do; but the validity of the Colours Test has yet to be confirmed statistically.

48. 48 Summary A test is VALID if it measures what it is supposed to measure. This simple definition, however, is open to a variety of interpretations. Today, I have considered CONSTRUCT VALIDITY, the kind of validity that is the most problematic of all. To demonstrate the construct validity of a test, the researcher must show, not only that the test correlates with the �right� variables, but also that it dissociates from the �wrong� ones. These two essential properties are known as CONVERGENCE and DIVERGENCE.

49. 49 Summary � Witkin�s tests of Field-Dependence-Independence show convergence with other �analytical� cognitive tests and dissociation from �non-analytical� tests. The Visual Patterns and Corsi tests of nonverbal working memory correlate to some extent (convergence) but the Corsi and Pattern spans are affected in opposite directions by visual and spatial interference (divergence).

50. 50 Practice question What is meant by the validity of a psychological test? What is the relationship between the two properties? Describe one approach to the determination of validity.

Lecture 4 CONSTRUCT VALIDITY

Lecture 4 CONSTRUCT VALIDITY

Presentation Transcript

Chapter 4. Validity:

Construct Validity and Reliability

Construct validity

Validity Lecture Overview

Construct Validity: A Universal Validity System

Lecture 3-Quantifiers, Predicates and Validity

Construct Validity and Measurement

Validity. Test Validity & Experiment Validity.

Construct Validity And its Threats

Construct Validity

Construct Validity of Classroom Observations: Items, Factors, Raters, and Achievement

Construct validity - feedback

Measurement Theory & Construct Validity

construct

The end of construct validity

Chapter 4 Validity

Face, Content & Construct Validity

Lecture 8 – Internal Validity Threats continued

Threats to Construct Validity

Chapter 8 – Lecture 8 Hypothesis Testing, Validity & Threats to Validity

Construct and External Validity in Experimental Research ♣

Construct Validity

Lecture 4 CONSTRUCT VALIDITY