Generic measures: limitations of use within specific settings ?

Generic measures:limitations of use within specific settings ? J Freeman Institute of Health Studies Plymouth University

Properties • Clinical • feasibility • Psychometric • reliability • validity • appropriateness • responsiveness

Validity Does it measure what it says it measures? • Content validity • Criterion validity • Construct (convergent and discriminant) Bowling 1997

Construct validity • The extent to which empirical data supports hypotheses concerning the attributes being measured • detective work • jigsaw puzzle

Appropriateness • Is the range of the construct measured within the sample similar to the range covered by the instrument? Van der putten et al 1999

The 36-item Short Form Health Survey (SF-36): • Gold-standard generic self-report measure of health status • Adopted & disseminated world-wide • Standardised UK and US version

SF-36 dimensions

The SF-36 • Relatively few studies have evaluated its use as an outcome measure for clinical practice or clinical trials in MS

Aim of study • To explore the reliability, validity, clinical appropriateness, and responsiveness of the SF-36 in MS patients within a health care setting

Methods • 150 patients with clinically definite MS • Broad spectrum of disease severity • Assessments completed in 106 patients once, twice in 44 rehabilitation inpatients

Assessments • Disease severity: EDSS • Health Status: SF-36 • Disability: FIM • Handicap:LHS • Emotional well-being:GHQ

Assessment of construct validity... • Convergent validity Correlation's between SF-36 dimensions & instruments measuring similar & different constructs • Group differences validity • ANOVA to differentiate between different groups

...Assessment of construct validity • Hypothesis testing T-tests to investigate whether results in line with theoretical expectation

Assessment of appropriateness • Examination of the scale score distributions of the 8 dimensions and the 2 summary components of the SF-36 and all other measures range, mean, sd, floor, ceiling

Sample characteristics • Mean age 45 (24 - 78yrs) • Female 68% • Disease pattern • SP 50% RR 33% • PP 11% Benign 6% • Mean yr’s since diagnosis 11 (0.1 - 38) • Mean EDSS 5.7 (1 -9)

Results: convergent & discriminat validity • Convergent & discriminant validity supported Substantial correlation’s with related scales, e.g. FIM with SF-36 physical function (r = 0.68), EDSS (r = 0.82) Weak correlation's with unrelated scales e.g. GHQ with SF-36 physical function (r =0.26)

Results: group differences validity • Group differences validity supported Significant differences demonstrated in health status at different level of disease severity (p<0.05)

Results: hypothesis testing As hypothesised: • Patients requiring carer assistance reported lower physical scores(p<0.0001) • Patients scoring > 5 GHQ points reported lower SF-36 emotional scores (p<0.0001)

Results: appropriateness • Scores span the entire spectrum of available range • Significant floor and ceiling effects (>20%) in - physical function - physical role limitations - emotional role limitations - bodily pain

Results: appropriateness • Floor & ceiling effects particularly marked when patient selection restricted to narrow range - physical dimensions 52% floor in severe group - physical role limitations 84% floor in severe group - role limitations 45% ceiling in mild group

Implications Score range floor ceiling

Implications Spectrum of SF-36 scale too limited to detect changes which may occur in pwMS  likely to limit its potential responsiveness  limited usefulness within specific MS populations /settings

Recommendations • Generic measures should be tested for specific populations and for specific purposes • When evaluating health status in MS the SF-36 should be supplemented with other relevant & validated measures to ensure comprehensive & valid measurement

Recommendations Clinicians & researchers should understand the properties of an outcome measure when choosing an instrument and interpreting the information it generates ...the measure you choose is key in determining effectiveness

Properties of Outcome Measures • Clinical • feasibility • Psychometric • reliability • validity • appropriateness • responsiveness

Reliability of gait measurements using CODAmpx30 motion analysis system Veronica Maynard Institute of Health Studies University of Plymouth

Reliability • Reliability refers to the • consistency or repeatability of • a measurement taken under the • same conditions

Factors affecting reliability • instrumental reliability - reliability of measurement device • rater reliability - reliability of rater administering measurement device • response reliability - reliability/stability of variable being measured

Sources of error • Measurement error • difference between a measurement & its true value • Systematic error • bias resulting from one or more processes • Random error

Reliability • 3 broad categories of reliability: • equivalence (reproducibility) • stability (repeatability) • internal consistency (homogeneity)

Types of reliability & how they are determined Reliability Equivalence or Reproducibility Stability or consistency Internal consistency Inter-rater reliability Intra-rater or test-retest reliability Split half reliability & item analysis (Adapted from: Sim & Wright 2000, p.132)

Aim of study • To determine intra-rater • and inter-rater reliability • of gait measurements using • CODA mpx30 motion • analysis system

Reliability studies (I) • Intra-rater reliability study: • 10 healthy subjects • mean age 39.2 (29-52) yrs • 3 recordings • single trained observer

Reliability studies (II) • Inter-rater reliability study: • 19 healthy subjects • mean age 34.4 (20-49) yrs • 3 trained observers

Procedure • self-selected speed • Investigators blind • Points for analysis: • i) initial contact (IC) • ii) mid-stance and (MSt) • iii) mid swing (MSw)

1) 2) 3) Stick figure illustrations of position of right leg (red) at 1) IC 2) MSt and 3) MSw. Joint angles, moments and powers were determined at these points in the gait cycle.

Procedure (cont) • Spatiotemporal parameters: • walking velocity • duration of stance • duration of swing • Kinematic variables: • hip, knee & ankle angles at IC, MSt & MSw • Kinetic variables: • moments & power at hip, knee, ankle at IC and MSt

Analysis • Sagittal plane data • Bland & Altman methods • Intraclass correlation coefficient (ICC) to determine consistency and agreement among ratings

IC MSt TO MSw Graphical illustration of sagittal plane joint movement of the ankle during a single gait cycle (dorsiflexion positive, plantarflexion negative). IC= Initial contact; MSt = Mid stance; TO = Toe off; MSw = Mid swing

Results (I) • Intra-rater study: • Good agreement for spatio-temporal • Generally low ICC values (ICC < 0.75) for all parameters • Bland & Altman plots reasonable agreement for kinematic data at ankle and knee

Summary of key findings (II) • Inter-rater study: • Generally good agreement for spatio-temporal parameters (ICC > 0.70) • Lower ICC values & wide limits of agreement for kinematic data (especially hip)

1) 2) Examples of distribution plots from Bland & Altman test for am-pm repeatability showing mean measurements against differences between measurements for ankle range of motion (ºs) at 1) initial contact 2) mid stance.

Factors affecting reliabilty • Errors associated with marker placement • Soft tissue motion • Natural variation in individual gait cycle • Sampling rate

Recommendations • Standard protocol for marker placement • Training of observers • Averaging of min 3 gait cycles (Winter 1984) • Interpret with caution data from single cycle

General Recommendations • Standard protocol • Training • Averaging may be required • Determine level of error • Assess reliability before use in research/clinically • Assess reliability in population under study

Responsiveness S.K. Spooner PhD BSc SRCh Scheme Co-ordinator Podiatry

Properties • Clinical • feasibility • Psychometric • reliability • validity • appropriateness • responsiveness

Responsiveness to Change • HRQOL measures should be responsive to interventions that change HRQOL • Evaluating responsiveness requires assessing HRQOL relative to an external indicator of change

Testing for Responsiveness • Measurement tools should be tested on patients receiving treatment of known efficacy • Capable of detecting treatment effects?

Generic measures: limitations of use within specific settings ?