Sponsored Links
This presentation is the property of its rightful owner.
1 / 41


  • Uploaded on
  • Presentation posted in: General

THE FOLLOWING LECTURE HAS BEEN APPROVED FOR ALL STUDENTS BY BIRMINGHAM CITY UNIVERSITY This lecture may contain information, ideas, concepts and discursive anecdotes that may be thought provoking and challenging.

Download Presentation


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript




This lecture may contain information, ideas, concepts and discursive anecdotes that may be thought provoking and challenging

Any issues raised in the lecture may require the viewer to engage in further thought, insight, reflection or critical evaluation

Validity & Variability

of Back Pain Assessments

Dr. Craig Jackson

Senior Lecturer in Psychology

School of Psychology

Faculty of Education Law &Social Science


Who is Observing What?

The validity of any observation depends upon who is observing whom

Heisenberg’s uncertainty principle



Assessment Criteria



Low Back Pain Assessments

Appropriateness & Feasibility

Between-Observer Variability and Consistency

The Future: Mathematical Models?

Validity without Psychology?


Specificity of Defined Field + Repeatable Measurement = Valid Measures

S R = V


Problem of between-observer variation remains

GP eliciting signs in respiratory disease

Neurologist evaluating diagnosis of multiple sclerosis

Geriatrician assessing stroke rehab.

Anaesthetist determining fitness for operation

1. Judgements might be made differently by other observers

2. Judgements might be made differently by same on repeated occasions

Between-Observer Variability

Variation between observers

Seriously compromise research / clinical findings

Worst example:

Patients with condition A - all examined by Dr X

Patients with condition B - all examined by Dr Y

One observer examine all patients ?

Not possible / practical

Examples of Between-Observer Variability

Diagnostic classification for multiple sclerosis for 149 patients

By two clinicians (observers)

diagnosticNeurologist Bclass













Neurologist A

Examples of Between-Observer Variability

Circum-corneal hyperaemia (scored 0,1,2,3,4) by four ophthalmologists








  • Systematic error by observer C - consistently higher

  • Observer B sticks to mid-ranges

  • No patient on whom there is total agreement

Examples of Between-Observer Variability

Iris hyperaemia (scored 0,1,2,3,4) by four ophthalmologists








  • Observer C uses only extremes of scale

  • Observer C introduces spurious code

  • Observer D avoids extreme codes

  • Only 2 cases with difference of 1 between highest and lowest scores

Reducing Between-Observer Variability

Use expert panel / reference library – they evaluate all procedures

Compare rival observation methods in small pilot studies

Suspect observer at all times - how may s/he be biased?

Train observers / assessors

Standardised techniques & judgement criteria

Consider severity of disagreements

Randomise patients out to multiple observers / multiple observations

Appoint external assessor

Assessment Criteria

With any assessment – observation, questionnaire or equipment – we ask:

Utility Is it useful?

Reliability Is it dependable?

Validity Does it do what it is supposed to?

Sensitivity Can it identify patients with a condition?

Specificity Can it identify those that do not have the condition?

Responsiveness Can it measure differences over time?

Purpose of Assessment Instruments

Is the purpose of the instrument clearly stated?

Is it discriminative?

Is it evaluative?

Is it prognostic?

Which population is it appropriate for?Clinical


Research / Epidemiological

Reliability & Validity


The degree to which an instrument measures what it is intended to measure

Reliability is a necessary but insufficient condition for validity

The approximate truth about inferences regarding causal relationships


A degree of consistency of a measure

The degree to which a test is free of random error

A measure that produces consistent results is said to have high reliability

Validity in Research


Poor repeatability of examination implies a poor validity

How repeatable are results by the same observer:

On two (or more) occasions by same observer?

(Temporal Stability)

Or repeated occasions by different observers?

Applies equally to Clinical Practice and Research

A clinical sign carries no info if it is assessed differently when re-examined

  • Measures for Clinical Use

  • Questionnaires

    • General health status

    • Pain

    • Functional status

    • Patient satisfaction

  • Physiological outcomes

  • Utilization measures

  • Cost measures

  • Mathematical Modelling

Face Validity

Are items measured in a sensible way ?

How specific are the questions ?

Do questions have a specific time frame / frame of reference ?

Are questions performance related ? (do you do it?)

Are questions capacity related ? (can you do it?)

How is the index scored ?

Weighting of items ?

Content Validity

Content validity is concerned with “representativeness”

Are all relevant dimensions of functionality included ?


Was method for choosing items appropriate ?

Draw an inference from test scores to a

larger domain of functionality

e.g. the abilities covered by the test items

should be representative to the larger

domain of abilities and function

Construct Validity

What is the bigger concept that the assessment is trying to measure?

“Theoretical Construct”

Does assessment perform satisfactorily when compared with other measures

Is that concept a real one?

e.g. does specific local pain prevent general functioning?

Measured by correlation between the intended independent variable (back

health) and a proxy independent variable (specific test performance) that is

actually used

Construct Validity

For example:

Company physician wants to study the relationship between general back

health and job performance

However, the physician may not be able to administer a comprehensive back

health test to every worker

In this case, s/he can use a proxy variable such as “performance on a specific

functional test" as an indirect indicator of back health

Administer the proxy test AND comprehensive back test to a portion of


If finding a strong correlation between general back health and the specific

test, the proxy test can be used with the larger group because its construct

validity is established

Criterion Validity

Drawing an inference from specific test scores to general performance

Criterion validity is about prediction rather than explanation.

Prediction is concerned with non-casual dependence

Explanation is pertaining to causal or logical dependence

E.g. one can predict the weather based on the height of mercury inside a

Barometer. However, one cannot use the behaviour of mercury height to

explain why the weather changes.

Responsive to Change

Is measure sensitive enough to detect clinically relevant change?

Essential for evaluative measurements

  • Examples: Pain Perception

  • Visual Analogue ScalesReliable and Valid (Jensen & Karoly 1993)

  • Advantages over other pain assessment methods

  • (Scott & Huskisson 1976, Price et al. 1994)

  • Quadruple Visual Analogue Scales – 4 specific factors – Von Korff et al. 1992

  • CURRENT Pain Level

  • AVERAGE or TYPICAL Pain Level Ratings are averaged

  • x 10

  • Pain level at its BEST = TOTAL SCORE

    • (Range 0 – 100)

  • Pain level at its WORST

  • Condition-Specific Assessment – Low Back Pain

  • 40+ low back functional questionnaires exist

  • 5 identified as “gold standard” (Kopec & Esdaile, 1995)

    • 1. Sickness Impact Profile (Bergner et al. 1981)

    • 2. Roland-Morris Disability Questionnaire (Roland and Morris, 1983)

    • 3. Oswestry Low Back Pain Disability Questionnaire (Fairbank et al. 1980)

    • 4. Million Visual Analogue Scale (Million et al. 1982)

    • 5. Waddell Disability Index (Waddell, 1984)

  • Condition-Specific Assessment – Low Back Pain

  • 2 of the “gold standards” (Kopec & Esdaile, 1995)

    • 1. Roland-Morris Disability Questionnaire (Roland and Morris, 1983)

    • 2. Oswestry Low Back Pain Disability Questionnaire (Fairbank et al. 1980)

    • + Quebec Back Pain Disability Scale (Kopec et al. 1995)

Roland Morris Disability Questionnaire (RMQ)

Purpose:Acute and Chronic population of low back pain sufferers

An evaluative measure in clinical trials

Face Validity: + 24 Yes No questions

+ Moderate specificity

+ Today is the frame of reference

+ Performance related

+ Double negatives

+ “Yes response” scores – score out of 24

Content Validity:MobilityDressing / grooming




  • Roland Morris Disability Questionnaire (RMQ)

  • “The best single study of assessing short-term outcomes of primary care

  • patients with low back pain“(Von Korff & Saunders, 1996)

  • Scores > than 13 = Significant disability associated with an unfavorable outcome

    • (Von Korff & Saunders, 1996)

  • Any change of less than 4 points is both too small to matter and too small to

  • be reliable(Stratford et al. 1996)

Oswestry Disability Questionnaire (revised)

Purpose:Acute and Chronic population of low back pain sufferers

Discriminate between chronic and acute low back pain

An evaluative measure in clinical trials

Used to predict different rates of improvement

Face Validity:+ Measured 0 – 5 by degree of difficulty

+ Very specific questions

+ No specific frame of reference

+ Capacity related questions

+ Score by summing all items = percentage score

Content Validity:Pain intensityPersonal careLiftingWalkingSittingStandingSleepingSex / social lifeTravelling

Oswestry Disability Questionnaire (revised)

Content Validity: Omits:bendingkneeling


emotional state sudden movement

“Sex life” reduced response rates

(Hudson-Cook et al. 1989)

Scoring issues:11% is a cut off score (Erhard et al. 1994)

00 - 20% Minimal Disability

20 - 40% Moderate Disability

40 - 60% Severe Disability

60 - 80% Crippled

80 - 100%Bed Bound or Exaggerating

Stratford et al. 1988

Quebec Back Pain Disability Scale

Purpose:Acute and Chronic population of low back pain sufferers

Assess level of functional disability

Designed as discriminative, evaluative and predictive

Face Validity:+ Response on rating scale 0 - 5

+ Very specific questions

+ “Today” as frame of reference

+ Performance related questions

+ Score by summing all items = percentage score

Content Validity:MobilityTravellingSleepingSittingStanding RunningLiftingBending

Quebec Back Pain Disability Scale

Content Validity: Omits:twistingturning

emotional state sudden movement

sex life


Has test-retest reliability been established ?

Measure reproducible on repeated use on stable patient ?

Internal consistency ?

Do items correlate with others ?

Alpha (reliability score)


Roland-Morris Disability Questionnaire 0.89 - 0.93

Oswestry Disability Questionnaire 0.77 - 0.93

Waddell Disability Index 0.76

Quebec Back Pain Disability Scale 0.95

Back Performance Scale (BPS)

5 Tests of sagittal-plane mobility

A) Sock testB) Pick up test C) Fingertip-to-floor test

D) Roll-up testE) Lift test

Sum scores to obtain performance measure

of mobility-related activities


Develop a sum scale

Discriminative ability

Sensitivite to change

Strand et al. 2002

Back Performance Scale (BPS) – Evaluation of . . .

Correlations among 5 tests of sagittal-plane mobility:

Correlations among 5 tests and BPS total:

Cronbach Alpha (reliability):

Sum Scores Discrimination:


Back Performance Scale (BPS) – Evaluation of . . .

Correlations among 5 tests of sagittal-plane mobility:

Ranged from:0.27 – 0.50

Correlations among 5 tests and BPS total:

Ranged from:0.63 – 0.73

Cronbach Alpha (reliability):


Sum Scores Discrimination:

Higher scores in patients not returning to work

Higher scores in patients with back pain rather than MSD


Effect size high (1.33) for patients who returned to work

Effect size low (0.31) for patients who had not returned to work

Back Performance Scale (BPS) – Evaluation of . . .

1. BPS sum more responsive than separate tests

2. Measures aspects of performance of clinical importance to back pain

3. Quick, simple and cheap to administer

4. No costly equipment

Future research:

Could tests with lateral bending and twisting be added?

Could twisting / lateral tests replace any of the sagittal bending tests?

  • Yellow Flags of Low Back Pain

  • Indicative of long term chronicity and disability

  • Negative attitude – back pain is harmful and disabling

  • Fear avoidance

  • Reduced activity

  • Expects passive treatment to be better than active treatment

  • Tendency to low morale, depression and social withdrawal

  • Social / Financial problems

  • Should these psychosocial aspects be included in assessment scale?

  • What validity does any scale have when omitting these constructs?

Appropriateness / Feasibility

Is administration format suitable ?

Time take to complete questionnaire appropriate ?

Questions easy to understand ?

Questions acceptable to patient ?

Clinical relevance ?

Mathematical Models

Leg length differences and MSDs

Two measurement methods

1) Direct measurement / observationMRIUltrasonics

2) Regression equations

Early stages

Not cost-effective

Complimentary at present

Physiologically valid

Requires physiological uniformity

Not valid with clinical populations Ashford & Marlbrook, 2003

  • Summary of Reliability & Validity

  • There can be validity without reliability

  • Reliability is an aspect of construct validity - as assessment becomes less

  • standardized, distinctions between reliability and validity blur

  • In many situations assessors are not trained to agree on a common set of

  • criteria and standards

  • Inconsistency in performance across tasks does not invalidate the

  • assessment

  • Rather it becomes an empirical puzzle to be solved by searching for a more

  • comprehensive interpretation

  • Initial disagreement does not invalidate any assessment - provides impetus

  • for dialog

  • Moss, 1994

  • Implications for Back Pain Assessments

  • Development of 1 single valid universal test may be pointless

  • No grand Unifying Theory of measurements

  • If something is easy to measure validly,

  • it would’ve been done by now

  • Functional assessments seem alive and well

  • (for now)

  • Functional assessments must develop and

  • include psychosocial aspects

  • Login