- 99 Views
- Uploaded on
- Presentation posted in: General

Critical Appraisal

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Critical Appraisal

Dr. Chris Hall – Facilitator

Dr. Dave Dyck R3

March 20/2003

- Review study design and the advantages/ disadvantages of each
- Review key concepts in hypothesis, measurement, and analysis
- Article appraisal
- Treatment articles
- Diagnosis articles
- Harm articles
- Overviews/meta-analysis

- Survive the next hour and still be able to smile

- Ecological studies
- Case Reports
- Case Series
- Cross-Sectional Studies
- Case Control and Retrospective Cohort Studies
- Prospective Cohort Studies
- Randomized Controlled Trials

- Studies of a group rather than individual subjects
- Supplies data on exposure and disease as a summary measure of the total population as an aggregate eg. Incidence studies
- Berkson’s Bias: ie. The correlation between the variables is not the same on the individual level as it is for the group. Therefore you cannot link exposures to disease on an individual basis
- Also, difficult to account for confounding variables

- Submission of individual cases with rare or interesting findings
- ++++ subject to bias (selection / submission and publication)
- Should not infer causality or suggest practice change

- A group of “consecutive cases” with unifying features
- Selection bias = what constitutes a case, is it truly consecutive, response bias
- Publication bias
- Measurement bias (presence of ‘disease’ or exposure may be variable)

- Ie. Prevalence study
- Presence or absence of a specific disease compared with one or several variables within a defined population at a specific point in time

- Subject to selection bias (see HO)
- Cause and effect cannot be determined (see HO) (ie. Don’t know whether the exposure occurs before the outcome or the outcome occurs before the exposure)
- Temporal trends may be missed (seasonal variations)
- Previous deaths, drop-outs, and migration are not counted; and short lived, transient outcomes are underrepresented. Thus, CSS are best suited to study chronic, non-fatal conditions.

- Can do quickly
- May provide enough of an association between an exposure/outcome to generate a hypothesis which can be studied by another method.
- Useful for descriptive/analytical studies

- Starts now and goes back in time
- Start with the outcome and ask or find out about prior exposure
- Specific hypothesis usually tested
- Select all cases of a specific disease during a certain time and select a number of controls who represent general population then determine exposure to factor in each odds ratio
- May match controls to patients (but can never be sure of similar baseline states)

- Odds ratio provides an estimate of the relative risk (esp when disease is rare)
- Thus, use CCS only when disease is rare (< 10% of population)
- As OR increases (>1) greater risk
- As OR decreases (<1) reduced risk

- Small # needed (good for rare diseases or when outcomes are rare or delayed)
- Quick
- Inexpensive
- Can study many factors

- Problems selecting/matching controls
- Only an estimate of relative risk
- No incidence rates
- Biases (? Unequal ascertainment of exposure between cases and controls)
- Ie recall bias= cases are more likely to remember exposure than controls
- Selection bias = cases and controls should be selected according to predetermined, strict, objective criteria

- Start with 2 groups free of disease and follow forward for a period of time
- 1 group has the factor (eg. Smoking) the other group does not
- Define 1 or more outcomes (eg. Lung CA)
- Tabulate the # of persons who develop the outcome
- Provides estimates of incidence, relative risk, and attributable risk

- Relative risk = measures the strength of association between exposure and disease
- Attributable risk = measures the number of cases of disease that can be attributed to exposure
- Given a constant relative risk, attributable risk rises with incidence of the disease in members of the population who are not exposed

- Cannot by itself establish causation, but can show an association between a factor and an outcome
- Generally provides stronger evidence for causation than case control studies

- Lack of bias in factor
- Uncovers natural history
- Can study many diseases
- Yields incidence rates, relative, and attributable risk
- Allows for more control of confounding variables

- Possible bias in ascertainment of disease.
- Need large numbers and long follow-up
- Easy to lose patients in follow-up (attrition of subjects). This may introduce bias if lost subjects are different from those who continue to be followed
- Hard to maintain comparable follow-up for all levels of exposure

- Expensive
- Locked into the factor(s) measured
- Measurement bias (eg. Unblinded physician who looks harder for + outcomes in the exposed pt)
- Confounding variables still present

- To test the hypothesis that an intervention (treatment or manipulation) makes a difference.
- An experimental group is manipulated while a control group receives a placebo or standard procedure
- All other conditions are kept the same between the groups

- Goals=
- Prevention (to decrease risk of disease or death)
- Therapeutic (decrease symptoms, prevent recurrences, decrease mortality)
- Diagnostic (evaluate new diagnostic procedures)

- Ethical issues
- Difficulty to test an intervention that is already widely used
- Randomization
- Blinding techniques (may be difficult due to common SE of drugs)
- Control group (placebo, conventional tx, specific tx)
- Subject selection and issues of generalizability
- Are refusers different in some way

- Sensitivity= proportion with the disease identified by the test
- Specificity= proportion without the disease with a negative test

Sensitivity= a/a+c

Specificity=d/b+d

- Positive Predictive Value= This is the probability of having the disease given a positive test (a/a+b)
- Negative Predictive Value= The probability of not having the disease given a negative test (d/c+d)

- Null Hypothesis
- Hypothesis of no difference between a test group and a control group (ie. There is no association between the disease and the risk factor in the population)

- Alternative Hypothesis
- Hypothesis that there is some difference between a test group and control group

- Sampling bias = selecting a sample that does not truly represent the population
- Sampling size = contributes to the credibility of “positive” studies and the power of “negative studies”. Increasing the sample size decreases the probability of making type I and type II errors.

- Type I Error (alpha error) = the probability that a null hypothesis is considered false when it is actually true. (ie. Declaring an effect to be present when it is not)
This probability is represented by the p value or alpha; the probability the difference is due to chance alone.

- Type II Error (Beta Error) = the probability of accepting a null hypothesis as true when it is actually false (ie. Declaring a difference/effect to be absent when it is present)
- The probability that a difference truly exists
- Reflects the power (1-Beta) of a study

- Statistical Significance: determination by a statistical test that there is evidence against the null hypothesis.
- The level of significance depends on the values chosen for alpha error
- Usually alpha<.05 and beta<.20 (studies rarely aim for power >80%)

- Clinical Significance: statistical significance is necessary but not sufficient for clinical significance which reflects the meaningfulness of the difference (eg. A statistically significant 1mm Hg BP reduction is not clinically significant)
- Also includes such factors as cost, SE.

- Accuracy= how closely a measurement approaches the true value
- Reliability= how consistent or reproducible a measurement is when performed by different observers under the same conditions or the same observer under different conditions
- Validity= describes the accuracy and reliability of a test (ie. The extent to which a measurement approaches what it is designed to measure)

- 3 basic stages
- 1) the validity – are the conclusions justified?
- 2) the message – what are the results?
- 3) the utility – can I generalise the findings to my patients?

- Primary guides
- Was the assignment of patients to treatment randomized?
- Were all patients who entered the trial properly accounted for and attributed at its conclusion?
- Was follow-up complete?
- Were patients analyzed in the groups to which they were randomized? Ie. Intention to treat analysis

- Secondary guides:
- Were patients, their clinicians, and study personnel “blind” to treatment? (avoids bias)
- Were the groups similar at the start of the trial? (randomization not always effective if sample size small)
- Aside from the experimental intervention, were the groups treated equally? (ie. Cointerventions)

- How large was the treatment effect?
- Relative risk reduction vs absolute risk reduction

- Baseline risk of death without therapy=20/100 = .20 = 20% (X = .20)
- Risk with therapy reduced to 15/100 = .15 = 15% (Y = .15)
- Absolute Risk Reduction = (X-Y) = .20-.15 = .05 (5%)
- Relative Risk = (Y/X) = .15/.20 = .75
- Relative Risk Reduction = [1-(Y/X)] x 100% = [1-(.75)] x 100% = 25%

- To calculate simply take the inverse of the absolute risk reduction
- In last example= 1/.05 = 20 is the NNT

- How precise was the estimate of treatment effect?
- Use confidence intervals (CI) = a range of values reflecting the statistical precision of an estimate (eg. A 95% CI has a 95% chance of including the true value)
- CI narrow as sample size increases eg. In last example of 100 patients with 20 pts dying in the control group and 15 in the tx group the 95%CI for the RRR was -38% - 59%. If 1000 patients were enrolled in each group with 200 dying in the controls and 150 in the tx group the 95% CI for the RRR is 9%-41%.

- If CI cross 0 they are generally unhelpful in making conclusions
- When is the sample size big enough?
- If the lower boundary of the CI is still clinically significant to you (in + studies)
- (or if the upper CI boundary is not clinically significant in negative studies)

- 1) use the p value = as the p value decreased below .05, the lower bound of the 95% confidence limit for the RRR rises above 0
- 2) If the standard error (SE) of the RRR is presented it is easy to calculate the CI as 2xSE +/- point estimate (RRR)
- 3) Calculate CI yourself or with a statistician

- Can the results be applied to my patient population?
- Were all clinically important outcomes considered? Ie. Mortality, morbitity, quality of life endpoints
- Are the likely treatment benefits worth the potential harm and costs? Ie. What is the patient’s baseline risk if left untreated. (NNT is helpful here)

- Primary guides:
- Was there an independent, blind comparison with a reference standard? (ie. Gold standard)
- Did the patient sample include an appropriate spectrum of patients to whom the diagnostic test will be applied in clinical practice?

- Secondary guides
- Did the results of the test being evaluated influence the decision to perform the reference standard? Ie verification bias eg. Pioped = normal, near normal, low prob V/Q scans had only 69% going on for pulmonary angiogram whereas more positive V/Q scans had 92% going on for angiograms
- Were the methods for performing the test described in sufficient detail to permit replication?

- Are likelihood ratios for the test results presented or data necessary for their calculation included?
- Likelihood ratio = the ratio between the likelihoods of having the disease, and not having the disease, with a + test

- LR>10 and <.1 generate large and often conclusive changes from pretest to posttest probability
- LR of 5-10 and .1-.2 generate moderate shifts in pretest and posttest probability
- LR of 2-5 and .5-.2 generate small (but sometimes important) changes in probability
- LR of 1-2 and .5-1 are generally insignificant

- Makes use of LR to change pretest probabilities to posttest probabilities. (can use Fagan’s nomogram):

- Will the reproducibility of the test result and its interpretation be satisfactory in my setting?
- Are the results applicable to my patient?
- Will the results change my management?
- Will patients be better off as a result of the test?

- 1st – what is the study design (RCT, cohort, case control, case series, etc)
- Most important is that there is an appropriate control population

- Were the exposures and outcomes measured in the same way in the groups being compared? (minimize recall/interviewer bias)
- Was follow-up sufficiently long and complete?
- Is the temporal relationship correct?
- Is there a dose response gradient?

- How strong is the association between exposure and outcome? Ie. Relative risk (if >1= increase in risk associated with exposure and <1= decrease in risk associated with exposure)
- How precise is the estimate of risk? Ie. CI

- Are the results applicable to my practice?
- What is the magnitude of the risk?
- Should I attempt to stop the exposure?

- Did the overview address a focussed clinical question?
- Were the criteria used to select articles for inclusion appropriate? - these should be revealed in the paper
- Is it unlikely that important, relevant studies were missed? (avoids publication bias- a higher likelihood for studies with positive results to be published)
- Was the validity of the included studies appraised? (peer review does not guarantee the validity of published research)

- Were assessments of studies reproducible? (better if there are more reviewers who are deciding which articles to include)
- Were the results similar from study to study? (can use “tests of homogeneity” statistical analysis)

- What are the overall results of the overview? (are studies weighted according to their size?)
- There should be a summary measure which clearly conveys the practical importance of the result – eg. RRR, LR, NNT etc.
- How precise were the results? CI still very helpful

- Can the results be applied to my patient care? (subgroup analysis should be critiqued closely)
- Were all clinically important outcomes considered? ( a clinical decision will require considering all outcomes both good and bad)
- Are the benefits worth the harms and costs?