Reproducibility of QUOROM Checklist: Are Meta-Analyses in Good Hands? Introdução à Medicina II Turma 17 27 de Maio de 2008
A Meta-Analysis is a review in which bias has been reduced by the systematic identification, appraisal, synthesis and statistical aggregation of all relevant studies on a specific topic according to a predetermined and explicit method. Systematic Review Overview Meta-Analysis Introduction What is a Meta-Analysis?
In 1987, a survey showed that only 24, out of 86 English-language meta-analyses, reported all the six areas considered important to be part of a meta-analysis: In 1987 Study Design Control of Bias Combinality Statistical Analysis Sensitivity Analysis Application of Results
In 1992 this survey was updated with 78 meta-analyses and the researchers noted that methodology has definitely improved since their first survey; However it needed better searches of the: Literature; Quality evaluations of trials; Synthesis of the results. In 1992
So, in 1999, several researchers suggested and created the Quality of Reporting of Meta-Analyses (QUOROM) Statement to improve and standardise reporting. The QUOROM Statement – that includes a checklist and a trialflowdiagram – describes the preferred way to present the different sections of a report of a Meta-Analysis. It is organized into 21 headings and subheadings. In 1999
The number of published meta-analyses has definitely increased over time. According to a study, after the QUOROM statement theestimated mean quality score of the reports increased from 2.8 (95% CI: 2.3-3.2) to 3.7 (95% CI: 3.3-4.1), representing an estimated improvement of 0.96 (95% CI: 0.4-1.6; p = 0.0018 two sided T-test). However, the QUOROM group admits itself that this checklist requires continuous research in order to improve the quality of a meta-analysis. QUOROM
Reproducibility But what is Reproducibility? Why is it so important? Reproducibility is one of the main principles of the scientific method, which refers to the ability of a test or experiment to be accurately reproduced by someone else working independently.
Reproducibility • The lack of reproducibility can lead to major consequences: • A failure in the reproducibility will most probably end in results' heterogeneity; • At a clinical level, if a diagnostic test is not reproducible there is the risk of a patient being wrongly diagnosed; • Non-reproducible items of a checklist can lead to a decrease on its credibility and, consequently, of the meta-analyses that used it as a model.
Aims The question we want to answer is if the QUOROMChecklist is a reproducible method in the evaluation of Meta-Analysis. Primary Aim: Evaluate the reproducibility degree of the QUOROM Checklist
Aims Secondary Aims: Specify which points of the QUOROM Checklist are less reproducible; Verify if there are differences in the reproducibility between the evaluation of meta-analysis from Low Impact Factor journals and from High Impact Factor ones.
Methods - Selection of Studies Our target population was the meta-analyses. We had to select a considerable sample of meta-analyses, so we decided to select a total of 52. Our inclusion criteria were: • The article being published in a medicine subjects’ journal; • The article being published in a journal with impact factor ≤ 2 or ≥ 8; • The article reporting a meta-analysis; • The article being published in the last three years (2005-2008); • Having access to online full text.
Low IF Journals High IF Journals Methods - Selection of Studies First, we separated 40 journals using a Stratified Sampling Method. From Journals of ISI Web of Knowledge that fit our criteria, we selected: 20 Journals 20 Journals 0 < IF ≤ 2 (1234 Journals) IF ≥ 8 (82 Journals) IF – Impact Factor
We repeated the whole process of selection of the journals until we had 26 meta-analyses in each pool. 26 26 Methods - Selection of Studies Low IF Journals: 48 Meta-Analyses High IF Journals: 219 Meta-Analyses After this, we proceeded to the selection of the Meta-Analyses. For that, we used a Multi-Stage Sampling Method. The totality of the Journals’ articles were removed from each stratum, following the inclusion criteria previously described. Pool n.2 Pool n.1 High IF Meta-Analyses Low IF Meta-Analyses
Methods - Selection of Studies The Impact Factor of the journal from where each Meta-Analysis came, the Name of the Journal, the Authors and the Year of Publication were recorded in a database, which was kept secret until the evaluation of the checklist was concluded. It was used only at the end to find out if Reproducibility and Impact Factor were related. Pool n.2 Pool n.1 High IF Meta-Analyses Low IF Meta-Analyses
26 26 52 Methods - Selection of Studies Pool n.1 Pool n.2 Low IF Meta-Analyses High IF Meta-Analyses Finally, we mixed all the articles in a single pool, occulting the strata from each one came. Pool n.3 52 Meta-Analyses
Methods - Study Procedures • Before analyzing we established some rules that helped us understanding each item of the checklist: • If a certain item was present in the meta-analysis, but not in the place the checklist determines, we would not consider the item present; • When a item had more than one point, we would only consider it present if the meta-analysis answered to more than half of the points;
Methods - Study Procedures • At the item (e), we would give more importance to the point that ensures the replication of the methods; • At the item (o), the meta-analysis had to have a diagram describing the trial flow, so that the item could be considered.
Evaluation consisted in attributing a number to each item: • Number 1 to those which were covered in the Meta-Analyses; • Number 0 to those which were not. • This data was inserted in SPSS program. Methods - Study Procedures Articles were mixed again 1st Evaluation: 4 Articles per Student 2nd Evaluation: 4 Articles per Student Each Student could not analyse the same article twice This way each student analysed 8 different articles
Methods - Study Procedures Thus, our study can be classified as an observational, cross sectional study, whose methods are characteristic of a survey study, and whose purpose is to study the reproducibility.
Methods - Variables Description Our variables are: • The actualImpact Factor of the journals from which we randomly selected the articles; • The year of publication of the articles; • The Impact Factor of the journals from which we randomly selected the articles at the year of publication; • The classification of each item of the checklist: we considered thirty-six categorical variables, which can have two numerical codes: 1 or 0. These were our expected outcome of research.
Methods - Variables Description From the classification of the items we had other variables: • Summation of the present items by observer 1; • Summation of the present items by observer 2; • Average of the two summations; • Difference between the summations; • Absolute value of difference between the summations; • Number of agreements between the two observers by article.
Methods - Statistical Analysis • Global Reproducibility • The comparison of the summation of each observer was done using theICC method(Intraclass Correlation Coefficient). • Then we represented the Limits of Agreement (95% CI) of the “difference between the summations” in a scatterplot: • For that, we had to be sure that this variable followed a normal distribution and, if so, to calculate the mean and the standard deviation, all this by making an histogram. • We also compared the two variables “Absolute value of difference between the summations” and “Number of agreements between the two observers by article” in a scatterplot.
Methods - Statistical Analysis • Agreement in each Item of the Checklist • (reproducibility of each Item) • We made eighteen crosstabs to calculate: • The proportion of agreement and 95% confidence intervals*; • Positive proportion of agreement; • Negative proportion of agreement; • Kappa Factor. • * we used a normal distribution but with those whose limit of confidence intervals was over one, we used a binomial distribution.
Methods - Statistical Analysis • Relation between IF and Reproducibility • For this analysis we didn’t use the actual impact factor, but the one at the year of publication of the articles*. • We made two scatterplots, to see if there was correlation between: • The “difference between the summations” and impact factor; • The “number of agreements between the two observers by article” and the impact factor. • * As the ISI Web of Knowledge database wasn’t updated with the impact factors of 2007, in the articles published in that year we used the impact factor of 2006.
Results • We analysed 52 meta-analyses, which score had mean equal to 13,97, with a standard deviation of 2,95. Global analysis of the QUOROM checklist • ICC = 0,729 ; 95% CI = [ 0,571 ; 0,835 ]. • The ICC method revealed that 72,9% of the total variance is explained by the variance between the articles.
Results • L.A.: [- 4,934 ; 4,434]; • 95% of the cases were within this interval. • Histogram: differences between the summations
Results Comparison between number of agreements and absolute value of difference
Although the items (h) and (r) have a high proportion of agreement, they have the negative proportion of agreement equal to zero, because the two observers had never agreed in the negative (observer 1 always considered these items present in all articles). Being one of the variables constant, kappa was not applicable. The kappa and the proportion of agreement vary approximately the same way. However, there are some items that present a considerable disparity, such as item (p). The positive proportion of agreement is higher than the negative, which means that the observers agreed more in the positive than in the negative. The item that presents higher proportion of agreement was (q). It was the only item in which the observers always agreed with each other (100% PA). The item that presented lower proportion of agreement was (k). It also presents the lowest kappa, i.e. only 5% of agreement is not due to hazard. Results Analysis of each item of the QUOROM checklist
No correlation was found between these two variables and Impact Factor: In both scatterplots there was no preferential orientation of the points. Results Correlation between impact factor and reproducibility r = 0,108 ; p = 0,448 r = – 0,002 ; p = 0,986
Discussion Global analysis of the QUOROM checklist • The ICC we got can be seen as a good one, but this has to be interpreted carefully: the ICC could be increased by our result’s considerable high variance (heterogeneity). The limits of agreement are considerably high, allowing us to conclude about the QUOROM checklist’s weak global reproducibility. We also note that the mean of “difference between the summations” is lower than zero.
Discussion • This means that there was a systematic error during the study: Generally, the Summation of the 2nd Evaluation was higher than the 1st Evaluation, which explains the negative mean. • This error may be related to the fact that during the second analysis of the articles, the evaluators had a greater confidence, facility and dexterity in the application of the checklist. This way they could find some items in the meta-analyses that were not found at the first observation. Difference = Sum of 1st Evaluation – Sum of 2nd Evaluation
Discussion • In the scatterplot we can see that some values of “diff” are below the line equation, which means that, despite being low, they do not correspond to correct values of agreement. • This means that a pair of observers whose summations had the same value, didn’t necessarily agree in the same topics of the checklist. • So, the limits of agreement could be even higher.
Discussion Analysis of each item of the QUOROM checklist Item (q) • Quantitative data synthesis in the Results section; • We thought that this would be one of the less reproducible items of the list because it includes many sub-items; • Highest P.A.; • Objective and explicit item; easy to identify.
Discussion Item (a) - Title • almost total agreement; • a simple item , easy to understand. Items (h) and (r) • Introduction and Discussion respectively ; • Almost total P.A.; • Essential in articles, so it is easy to agree about their presence.
Discussion Item (e) • Review Methods in the Abstract section; • Low PA; • Many sub-items; • Quantitative data synthesis in sufficient detail to permit replication.
Discussion Item (m) • Study characteristics on the Methods section; • Low P.A.; • Not so clear as desirable:“participants’ characteristics”; • Many sub-items:“how clinical heterogeneity was assessed”. We also think that the observers may be confused by the existence of two items with the same name – Study characteristics – one in the Methods section and another on the Results section.
Discussion Item (k) • Validity assessment on the Methods section; • Lowest P.A.; • Not an explicit item; • The value of kappa is so low that it seems its qualification was done by hazard. • The positive P.A. was always higher than the negative. This tells us that we were more sure when we said yes that when we said no. The existence of many sub-items lead to doubts in qualifying items that presented only some sub-items and not all of them.
Discussion Correlation between impact factor and reproducibility • Despite what we expected, there was no correlation between Impact Factor and Reproducibility. We thought that the analyses of the Articles that came from high Impact Factor Journals would present more concordance between our two reviewers, because, on our regard, those Articles were submitted to a more severe revision. Thus, they would probably satisfy more topics of the Quorum Checklist. • However, this was not verified.
Conclusion The QUOROM checklist is reasonably reproducible However, some items should be re-evaluated and we propose a change in order to achieve a better degree of reproducibility No correlation was found between Reproducibility and Impact Factor
Turma 17 Ana Elisabete Costa Ana Rita Miranda Beatriz Carvalho Isabel Bravo João Moura Mariana Pereira Miguel Teles Pedro Marcos Sara Costa Sara Leite Sílvia Paredes Tatiana Gomes Valter Moreira Professora Cristina Santos