1 / 88

Anita L. Stewart Institute for Health & Aging University of California, San Francisco

Class 3 Methods of Developing New Measures and How to Select Measures for Your Study October 8, 2009. Anita L. Stewart Institute for Health & Aging University of California, San Francisco. Overview of Class 3. Overview: sequence of developing new measures Rationale for multi-item measures

Download Presentation

Anita L. Stewart Institute for Health & Aging University of California, San Francisco

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Class 3 Methods of Developing New Measures and How to Select Measures for Your StudyOctober 8, 2009 Anita L. Stewart Institute for Health & Aging University of California, San Francisco

  2. Overview of Class 3 • Overview: sequence of developing new measures • Rationale for multi-item measures • Scale construction methods • Steps in choosing appropriate measures for your study

  3. Typical Sequence of Developing New Self-Report Measures Develop/define concept Create item pool Pretest/revise Field survey Psychometric analyses Final measures

  4. Sequence: Develop Item Pool • Generate a large set of items that reflect the concept definition • For multidimensional concepts, items for each dimension • Item sources: • Other measures of similar concepts • Qualitative research such as focus groups • Researchers’ ideas about concept

  5. Considerations in Writing Item Pool • Items from various sources will have different formats, response choices, and instructions • Have to determine consistent approach

  6. Reduce Item Pool to Manageable Number • Review items against concept until “best” ones remain for pretesting • Judgment of investigators • Expert panels • Achieve good representation of all dimensions • Have more items than final goal

  7. Revised Interpersonal Processes of Care Concepts and Item Pool IPC Version I frameworkin Milbank Quarterly Draft IPC II conceptual framework 19 focus groups -African American, Latino,and White adults Literature review of quality of care in diverse groups

  8. IPC Item Pool Original IPC DraftIPC II conceptual framework IPC II Item pool (1,006 items) 19 focus groups Literature review 160 items selected for pre-testing

  9. Sample Item These questions are about your experiences talking with your doctors at ___ over the past 12 months 1. How often did doctors use words that were hard to understand? -- never -- rarely -- sometimes -- usually -- always

  10. Sequence: Pretest/Revise • Pretest, pretest, pretest • Numerous methods • For new measures, pretesting essential • Obtain reactions and comments of individuals targeted for study • Results in revisions of items and response choices

  11. Pretest in Target Population • Pretesting essential for measures being applied to any new population group • Especially priority measures (e.g., outcomes) • Pretest is to identify: • problems with procedures • method of administration, respondent burden • problems with questions • Item stems, response choices, and instructions

  12. Problems with Questions or Response Choices • Are all words/phrases understood as intended? • Are questions interpreted similarly by all respondents? • Are some questions not answered? • Are any questions offensive or irrelevant? • Does each closed-ended question have an answer that applies to each respondent? • Are the response choices adequate?

  13. Types of Pretests • General debriefing pretest (N=10) • In-depth cognitive interviewing pretest (N=5-10 each group)

  14. Sequence: Field Survey/Questionnaire • Administer survey to large enough sample to test psychometric characteristics • Two approaches • Preliminary field test (N= about 100) • Administer in main study – conduct psychometric studies on study data • Some items may not be used in final scales

  15. Sequence: Psychometric Analyses • Evaluate items - variability, % missing • Create multi-item scales according to scale construction criteria • Evaluate scale characteristics • Variability, reliability • Validity

  16. Sequence: “Final Measures” • When to publish: depends on • authors’ standards • sample size of psychometric analyses • Many measures published with very little iterative work • Single sample testing

  17. Overview of Class 3 • Overview: sequence of developing new measures • Rationale for multi-item measures • Scale construction methods • Steps in choosing appropriate measures for your study

  18. Single-item Measures - Usually Ordinal • Advantages • Response choices interpretable • Disadvantages • Impossible to assess complex concept • Very limited variability, often skewed • Reliability usually low

  19. Multi-Item Measures or Scales Multi-item scales are created by combining two or more items into an overall measure or scale score Sometimes called summated ratings scales

  20. Advantages of Multi-item Measures (Over Single Items) • More scale values (improves score distribution) • Reduces # of scores to measure a concept • Improves reliability (reduces random error) • Reduces % with missing data (can estimate score if items are missing) • More likely to reflect concept (content validity)

  21. One Major Exception: Self-rated Health

  22. Review of 27 Studies of Self-rated Health and Mortality • Independently predicted mortality in nearly all studies • Despite controlling for numerous specific health indicators and other predictors of mortality Idler EI et al. J Health Soc Beh, 1997;38:21-37

  23. Overview of Class 3 • Overview: sequence of developing new measures • Rationale for multi-item measures • Scale construction methods • Steps in choosing appropriate measures for your study

  24. Methods for Creating Multi-item Scales • Two Basic Scale Construction Approaches • Multitrait scaling • Factor analysis • Classical test theory approaches

  25. How much of the time .... tired? 1 - All of the time 2 - Most of the time 3 - Some of the time 4 - A little of the time 5 - None of the time How much of the time …. full of energy? 1 - All of the time 2 - Most of the time 3 - Some of the time 4 - A little of the time 5 - None of the time Example of a 2-item Summated Ratings Scale

  26. How much of the time .... tired? 1 - All of the time 2 - Most of the time 3 - Some of the time 4 - A little of the time 5 - None of the time How much of the time …. full of energy? 1=5 All of the time 2=4 Most of the time 3=3 Some of the time 4=2 A little of the time 5=1 None of the time Step 1: Reverse One Item So They Are in the Same Direction Reverse “energy” item so high score = more energy

  27. How much of the time .... tired? 1 - All of the time 2 - Most of the time 3 - Some of the time 4 - A little of the time 5 - None of the time How much of the time …. full of energy? 5 - All of the time 4 - Most of the time 3 - Some of the time 2 - A little of the time 1 - None of the time Step 2: Sum the Items Lowest = 2 (tired all of the time, full of energy none of the time) Highest = 10 (tired none of the time, full of energy all of the time)

  28. How much of the time .... tired? 1 - All of the time 2 - Most of the time 3 - Some of the time 4 - A little of the time 5 - None of the time How much of the time …. full of energy? 5 - All of the time 4 - Most of the time 3 - Some of the time 2 - A little of the time 1 - None of the time Step 2: Can Also Average the Two Items Lowest = 1.0 (tired all of the time, full of energy none of the time) Highest = 5.0 (tired none of the time, full of energy all of the time)

  29. Summed or Averaged: Increases Number of Levels from 5 (per item) to 9

  30. Summated Rating Scales: Scaling Analyses • To create a summated rating scale, set of items need to meet several criteria • Need to test whether the items hypothesized to measure a concept can be combined • i.e., that items form a single concept

  31. Five Criteria to Qualify as a Summated Ratings Scale • Item convergence • Item discrimination • No unhypothesized dimensions • Items contribute similar proportion of information to score • Items have equal variances

  32. First Criterion: Item Convergence • Each item correlates substantially with the total score of all items • with the item taken out or “corrected for overlap” • Typical criterion is > .30 • for well-developed scales, often > .40

  33. Example: Analyzing Item Convergence for Adaptive Coping Scale Item-scale correlations Adaptive coping (alpha = .70) 5 Get emotional support from others .49 11 See it in a different light .62 18 Accept the reality of it .25 20 Find comfort in religion .58 13 Get comfort from someone .45 21 Learn to live with it .21 23 Pray or meditate .39 Moody-Ayers SY et al. J Amer Geriatr Soc, 2005;53:2202-08.

  34. Example: Analyzing Item Convergence for Adaptive Coping Scale Item-scale correlations Adaptive coping (alpha = .70) 5 Get emotional support from others .49 11 See it in a different light .62 18 Accept the reality of it .25 <.30 20 Find comfort in religion .58 13 Get comfort from someone .45 21 Learn to live with it .21 <.30 23 Pray or meditate .39

  35. Example: Split Into Two Scales • Item-scale correlations • Adaptive coping (alpha = .76) • 5 Get emotional support from others .45 • 11 See it in a different light .59 • 20 Find comfort in religion .73 • 13 Get comfort from someone .45 • Pray or meditate .51 • Acceptance (alpha = .67) • Learn to live with it .50 • 18 Accept the reality of it .50

  36. Can Examine Item Convergence Using Any Statistical Software • Programs to calculate internal consistency reliability • Provide estimated coefficient alpha • Produce item-scale correlations corrected for overlap

  37. Second Criterion: Item Discrimination • Each item correlates significantly higher with the construct it is hypothesized to measure than with other constructs • Item discrimination • Statistical significance is determined by standard error of the correlation • Determined by sample size

  38. Example: Two Subscales Being Developed Using Multitrait Scaling • Depression and Anxiety subscales of MOS Psychological Distress measure

  39. Example of Multitrait Scaling Matrix: Hypothesized Scales ANXIETYDEPRESSION ANXIETY Nervous person .80 .65 Tense, high strung .83 .70 Anxious, worried .78 .78 Restless, fidgety .76 .68 DEPRESSION Low spirits .75 .89 Downhearted .74 .88 Depressed .76 .90 Moody .77 .82

  40. Example of Multitrait Scaling Matrix: Item Convergence ANXIETYDEPRESSION ANXIETY Nervous person .80* .65 Tense, high strung .83* .70 Anxious, worried .78* .78 Restless, fidgety .76* .68 DEPRESSION Low spirits .75 .89* Downhearted .74 .88* Depressed .76 .90* Moody .77 .82*

  41. Example of Multitrait Scaling Matrix: Item Discrimination ANXIETYDEPRESSION ANXIETY Nervous person .80* .65 Tense, high strung .83* .70 Anxious, worried .78* .78 Restless, fidgety .76* .68 DEPRESSION Low spirits .75 .89* Downhearted .74 .88* Depressed .76 .90* Moody .77 .82*

  42. Multitrait Scaling to Develop New “Expectations of Aging” Measure • Pretested initial 94-item version (N=58) • Eliminated items with • Missing data • Poor distributions • Low item-scale correlations • Field tested 56-item version (N=588) • Eliminated more items • Low item-scale correlations • Weak item discriminant validity • Field tested again (N=429) • 38 items, final scales Sarkisian CA et al. Gerontologist 2002;42:534-542

  43. Multitrait Scaling - An Approach to Constructing Summated Rating Scales • Confirms whether hypothesized item groupings can be summed into a scale score • Examines extent to which all five criteria are met • Reports characteristics of resulting scales • A confirmatory method • Requires strong conceptual basis for hypothesized scales • Typically used for scales well along in testing

  44. Multitrait Scaling Methods • Used at RAND in all health measurement development (e.g., MOS measures) • Method described in reading #1 for class 3 • Stewart and Ware, 1992, pp 67-80

  45. Multitrait Scaling Analysis Described by Ron Hays (UCLA/RAND) • Hays RD & Wang E. (1992, April). Multitrait  Scaling Program: MULTI. Proceedings of the  Seventeenth Annual SAS Users Group International Conference, 1151-1156. • Hays RD et al. Behavior Research Methods, Instruments, and Computers, 1990;22:167-175

  46. SAS Macro Available • Ron Hays also makes available a SAS macro for conducting multitrait scaling • You don’t have to purchase software http://gim.med.ucla.edu/FacultyPages/Hays/util.htm Go to MULTI Sample program including macro call: MULTI.sas and its output: MULTI.out

  47. Using Factor Analysis to Develop Multi-Item Scales • For new measures in early developmental stages • Exploratory factor analysis of items can identify possible dimensions • Useful when starting with item pool with uncertainty about subdimensions

  48. Patient Satisfaction with Pharmacy Services • No measures – started from scratch • Phase 1: pretested 44 items (N=30) • Revised items • Phase 2: field tested 45 items (N=313) • Exploratory factor analysis - 7 factors • Revised items MacKeigan LD et al. Med Care 1989;27:522

  49. Patient Satisfaction with Pharmacy Services • Phase 3: field tested 44 items (N=389) • Exploratory factor analysis - 8 factors (56% of variance) • Items retained with factor loadings >0.40 MacKeigan LD et al. Med Care 1989;27:522

  50. Item Reduction by Analysis

More Related