660 likes | 842 Views
Evaluating Health-Related Quality of Life Measures. Ron D. Hays, Ph.D. UCLA GIM & HSR February 10, 2014 (9:00-11:50 am) HPM 214, Los Angeles, CA. Where are we now in HPM 214? http://hpm214.med.ucla.edu/. Introduction Profile Measures Preference-Based Measures Designing Measures
E N D
Evaluating Health-Related Quality of Life Measures Ron D. Hays, Ph.D. UCLA GIM & HSR February 10, 2014 (9:00-11:50 am) HPM 214, Los Angeles, CA
Where are we now in HPM 214?http://hpm214.med.ucla.edu/ • Introduction • Profile Measures • Preference-Based Measures • Designing Measures • Evaluating Measures • Use of Measures in HIV/AIDS • PROMIS/IRT • Course Review (Cognitive interview assignment due) • Final Exam (3/17/14)
Four Levels of Measurement • Nominal (categorical) • Ordinal (rank) • Interval (numerical) • Ratio (numerical)
Ordinal Scale • In general, how would you rate your health is … • Excellent? • Very good? • Good? • Fair? • Poor?
Ordinal Scale • In general, how would you rate your health is … • 100 = Excellent? • 075 = Very good? [85] • 050 = Good? [60] • 025 = Fair? • 000 = Poor?
Interval Scales • “Everyday” Temperature Scales • Fahrenheit • Centigrade • 20°C + 20° C = 40°C • 40° C ≠2 times as hot as 20°C A 4- year old is twice as old as a 2-year old. If you subtract 1 from both of their ages, then 4 becomes 3 and 2 becomes 1. The 4-year old is still twice as old as the 2-year old despite the new age values being 3 versus 1 (i.e., “0” no longer means zero years).
Ratio Scales • Kelvin Temperature Scale (absolute 0) • Age • Days spent in hospital in last 30 days
Measurement Range for HRQOL Measures Nominal Ordinal Interval Ratio
Four Types of Data Collection Errors Coverage Error • Does each person in target population have an equal chance of selection? Sampling Error • Only some members of the target population are sampled. Nonresponse Error • Do people in the sample who respond differ from those who do not? Measurement Error • Inaccuracy in answers given to survey questions. 11
Characteristics of Good Measures • Acceptability • Variability • Reliability • Validity • Interpretability
Indicators of Acceptability • Response rate • Administration time • Missing data (item, scale)
Variability • Responses fall in each response category • Distribution approximates bell-shaped “normal” curve (68.2%, 95.4%, and 99.6%)
Reliability Reliability is the degree to which the same score is obtained for thing being measured (person, plant or whatever) when that thing hasn’t changed. • Ratio of signal to noise
Flavors of Reliability • Inter-rater (rater) • Need 2 or more raters of the thing being measured • Test-retest (administrations) • Need 2 or more time points • Internal consistency (items) • Need 2 or more items
Reliability Minimum Standards 0.70 or above (for group comparisons) 0.90 or higher (for individual assessment) SEM = SD (1- reliability)1/2 95% CI = true score +/- 1.96 x SEM if z-score = 0, then CI: -.62 to +.62 when reliability = 0.90 Width of CI is 1.24 z-score units
Hypothetical Ratings of Performance of Six Students in HPM 214 by Two Raters Using Excellent to Poor Scale [1 = Poor; 2 = Fair; 3 = Good; 4 = Very good; 5 = Excellent] 1= John (Good, Very Good) 2= Ida (Very Good, Excellent) 3= Di (Good, Good) 4= Claire (Fair, Poor) 5= Adriane (Excellent, Very Good) 6= Ara (Fair, Fair) (Target = 6 students; assessed by 2 raters)
Cross-Tab of Ratings Rater 2
Weighted Kappa(Linear and Quadratic) Wl = 1 – ( i/ (k – 1)) W q = 1 – (i2 / (k – 1) 2) i = number of categories ratings differ by k = n of categories Linear weighted kappa = 0.52; Quadratic weighted kappa = 0.77
Intraclass Correlation and Reliability Model Reliability Intraclass Correlation One-way Two-way mixed Two-way random BMS = Between Ratee Mean Square N = n of ratees WMS = Within Mean Square k = n of items or raters JMS = Item or Rater Mean Square EMS = Ratee x Item (Rater) Mean Square 25
01 13 01 24 02 14 02 25 03 13 03 23 04 12 04 21 05 15 05 24 06 12 06 22 Two-Way Random Effects (Reliability of Performance Ratings) Students (BMS) 5 15.67 3.13 Raters (JMS) 1 0.00 0.00 Stud. x Raters (EMS) 5 2.00 0.40 Total 11 17.67 df Source SS MS 6 (3.13 - 0.40) = 0.89 2-way R = ICC = 0.80 6 (3.13) + 0.00 - 0.40
Responses of Students to Two Questions about Their Health 1= John (Good, Very Good) 2= Ida (Very Good, Excellent) 3= Di (Good, Good) 4= Claire (Fair, Poor) 5= Adriane (Excellent, Very Good) 6= Ara(Fair, Fair) (Target = 6 students; assessed by 2 items)
Two-Way Mixed Effects (Cronbach’s Alpha) 01 34 02 45 03 33 04 21 05 54 06 22 Respondents (BMS) 5 15.67 3.13 Items (JMS) 1 0.00 0.00 Resp. x Items (EMS) 5 2.00 0.40 Total 11 17.67 Source SS MS df 3.13 - 0.40 = 2.93 = 0.87 Alpha = ICC = 0.77 3.13 3.13
Rating of 6 Students’ Health by 12 Family Members (2 per student) 1. John (fam1: Good, fam2: Very Good) 2. Ida (fam3: Very Good, fam4: Excellent) 3. Di (fam5: Good, fam6: Good) 4. Claire (fam7: Fair, fam8: Poor) 5. Adriane (fam9: Excellent, fam10: Very Good) 6. Ara (fam11: Fair, fam12: Fair) (Target = 6 students; assessed by 2 family members each)
01 13 01 24 02 34 02 45 03 53 03 63 04 72 04 81 05 95 05 04 06 12 06 22 One-Way ANOVA (Reliability of Ratings of Students) Respondents (BMS) 5 15.67 3.13 Within (WMS) 6 2.00 0.33 Total 11 17.67 Source MS SS df 3.13 - 0.33 = 2.80 = 0.89 1-way = 3.13 3.13
Standardized Alpha for Different Numbers of Items and Average Inter-item Correlation Average Inter-item Correlation ( r ) Number of Items (k) .0 .2 .4 .6 .8 1.0 2 .000 .333 .572 .750.889 1.000 4 .000 .500 .727 .857 .941 1.000 6 .000 .600 .800.900 .960 1.000 8 .000 .666 .842 .924 .970 1.000 Alphast = k * r 1 + (k -1) * r
Spearman-Brown Prophecy Formula ) ( N • alpha x alpha = y 1 + (N - 1) * alpha x N = how much longer scale y is than scale x
Number of Items and Reliability: Three Versions of the Mental Health Inventory (MHI)
Multitrait Scaling Analysis • Internal consistency reliability • Item convergence • Item discrimination
Validity • Does instrument measure what it is supposed to measure? • A “validated” instrument is a holy grail
Threats to Validity • Acquiescent Response Set • Socially Desirable Response Set
Listed below are a few statements about your relationships with others. How much is each statement TRUE or FALSE for you? 1. I am always courteous even to people who are disagreeable. 2. There have been occasions when I took advantage of someone. 3. I sometimes try to get even rather than forgive and forget. 4. I sometimes feel resentful when I don’t get my way. 5. No matter who I’m talking to, I’m always a good listener. Definitely true; Most true; Don’t know; Mostly false; Definitely false
Two Types of Validity • Content Validity • Includes face validity • Construct Validity • Many Synonyms
Content Validity • Does the measure adequately represent the domain? • Do items operationalize concept? • Do items cover all aspects of concept? • Does scale name represent item content? • Face validity is extent to which measure “appears” to reflect what it is intended to • E.g., by expert judges or by patient focus groups
Construct Validity • Do scores on a measure relate to other variables in ways consistent with hypotheses?
Evaluating Construct Validity Cohen effect size rules of thumb (d = 0.2, 0.5, and 0.8): Small correlation = 0.100 Medium correlation = 0.243 Large correlation = 0.371 r = d / [(d2 + 4).5] = 0.8 / [(0.82 + 4).5] = 0.8 / [(0.64 + 4).5] = 0.8 / [( 4.64).5] = 0.8 / 2.154 = 0.371 (Beware r’s of 0.10, 0.30 and 0.50 are often cited as small, medium, and large.)
Average HRQOL Scores for Comparison Groups and Deviation Scores for Patients With Chronic Conditions From Stewart AL Greenfield S, Hays RD, et al. Functional stth chronic conditions. JAMA 1989;262:907-913.
Relative Validity Analyses • Form of "known groups" validity • Relative sensitivity of measure to important clinical difference • One-way between group ANOVA
Responsiveness to Change • HRQOL measures should be responsive to interventions that changes HRQOL • Need external indicators of change (Anchors)
Self-Report Indicator of Change • Overall has there been any change in your asthma since the beginning of the study? Much improved; Moderately improved; Minimally improved No change Minimally worse; Moderately worse; Much worse