You are on page 1of 13

SPINE Volume 34, Number 18, pp 1929 1941

2009, Lippincott Williams & Wilkins

2009 Updated Method Guidelines for Systematic


Reviews in the Cochrane Back Review Group
Andrea D. Furlan, MD, PhD,* Victoria Pennick, RN, MHSc,*
Claire Bombardier, MD, FRCP,* and Maurits van Tulder, PhD,
from the Editorial Board of the Cochrane Back Review Group

Study Design. Method guidelines for systematic reviews of trials of treatments for neck and back pain.
Objective. To help review authors design, conduct and
report systematic reviews of trials in this field.
Summary of Background Data. In 1997, the Cochrane
Back Review Group published Method Guidelines for Systematic Reviews, which was updated in 2003. Since then,
new methodologic evidence has emerged and standards
have changed. Coupled with the upcoming revisions to
the software and methods required by The Cochrane Collaboration, it was clear that revisions were needed to the
existing guidelines.
Methods. The Cochrane Back Review Group editorial
and advisory boards met in June 2006 to review the relevant new methodologic evidence and determine how it
should be incorporated. Based on the discussion, the
guidelines were revised and circulated for comment. As
sections of the new Cochrane Handbook for Systematic
Reviews of Interventions were made available, the guidelines were checked for consistency. A working draft was
made available to review authors in The Cochrane Library
2008, issue 3.
Results. The final recommendations are divided into 7
categories: objectives, literature search, inclusion criteria,
risk of bias assessment, data extraction, data analysis,
and updating your review. Each recommendation is classified into minimum criteria (mandatory) and further
guidance (optional). Instead of recommending Levels of
Evidence, this update adopts the GRADE approach to

From the *Institute for Work and Health, Toronto, Ontario, Canada;
University of Toronto, Toronto, Ontario, Canada; Toronto Rehabilitation Institute, Toronto, Ontario, Canada; and VU University,
Amsterdam, the Netherlands.
The manuscript submitted does not contain information about medical
device(s)/drug(s).
No funds were received in support of this work. No benefits in any
form have been or will be received from a commercial party related
directly or indirectly to the subject of this manuscript.
Supported by operational funds from The Institute for Work & Health,
Canadian Institutes of Health Research (CIHR), Canadian Agency for
Drugs and Technologies in Health to Cochrane Back Review Group
These guidelines expand on the methodology outlined in: Bombardier
C, van Tulder MW, Pennick V, Bronfort G, Corbin T, Deyo RA, de Bie
R, Furlan AD, Guillemin F, Malmivaara A, Peul W, Schoene M, Shekelle PG, Tomlinson G. Cochrane Back Group. About The Cochrane
Collaboration (Cochrane Review Groups (CRGs)) 2008, Issue 3. Art.
No.: BACK. Copyright Cochrane Collaboration, reproduced with permission.
The following are the editorial board members of the Cochrane Back
Review Group: Co-editors: Claire Bombardier and Maurits van Tulder; Managing editor: Victoria Pennick; Editors: Gert Brnfort, Rob
deBie, Terry Corbin, Rick Deyo, Andrea Furlan, Francis Guillemin,
Antti Malmivaara, Wilco Peul, Mark Schoene, Paul Shekelle, George
Tomlinson.
Address correspondence and reprint requests to Andrea D. Furlan,
Institute for Work & Health, 481 University Av, Suite 800, Toronto,
Ontario, Canada; E-mail: afurlan@iwh.on.ca

determine the overall quality of the evidence for important patient-centered outcomes across studies and includes a new section on updating reviews.
Conclusion. Citations of previous versions of the
method guidelines in published scientific articles (1997:
254 citations; 2003: 209 citations, searched February 10,
2009) suggest that others may find these guidelines useful to plan, conduct, or evaluate systematic reviews in the
field of spinal disorders.
Key words: systematic reviews, meta-analysis, Cochrane Collaboration, method guidelines, back pain, neck
pain. Spine 2009;34:1929 1941

The current interest in evidence-based health care has led


to an extensive increase in the publication of systematic
reviews. In 1999, the QUOROM statement was developed to improve the standards for the report of systematic reviews.1 Several leading medical journals (e.g.,
BMJ, JAMA, Lancet) have adopted the QUOROM recommendations for the reporting of abstract, introduction, methods, results, and discussion sections of systematic reviews. However, it has been shown that many
reviews in the field of back and neck pain are of low
methodologic quality and that their reports often lack
essential components.2 4
In 1997, the Cochrane Back Review Group (CBRG)
Editorial Board published method guidelines for systematic reviews in the field of spinal disorders.5 These guidelines were updated in 20036 and addressed the main
steps in conducting a systematic review: literature search,
inclusion criteria, methodologic quality, data extraction,
and data analysis. The purpose of the method guidelines
was to offer guidance to researchers preparing, conducting, or reporting a systematic review and to readers evaluating these reviews. The guidelines were operationalized specifically for the field of back and neck pain. They
included certain minimum criteria for which either empirical evidence existed that confirmed they were associated with bias in systematic reviews, or there was consensus among the CBRG Editorial Board that they were
likely to be associated with bias. Further guidance was
presented to enhance the quality of systematic reviews.
The CBRG was established in 1998. Forty-six systematic reviews and 8 protocols for reviews of various treatments for spinal disorders are published in The Cochrane Library 2008, issue 4. Many of these reviews are
copublished in Spine (more information available at:
www.cochrane.iwh.on.ca). Because new evidence on review methodology has emerged since 2003, new guid1929

1930 Spine Volume 34 Number 18 2009

ance was introduced in the February 2008 version of the


Cochrane Handbook for Systematic Reviews of Interventions7 and the CBRG has acquired more experience
in preparing, conducting, and updating systematic Cochrane reviews, the Editorial Board felt it was time to
update the 2003 method guidelines.
It should be emphasized that these guidelines are not a
gold standard but merely an indication of the current
state-of-the-art of review methods. The method guidelines build on the information provided in The Cochrane
Handbook for Systematic Reviews of Interventions7
available at: http://www.cochrane.org/resources/
handbook/index.htm (accessed September 17, 2008),
rather than replace it. They are useful to plan, conduct,
or evaluate systematic reviews in the field of back and
neck pain within and outside the framework of the
CBRG. The usefulness of the 1997 and 2003 method
guidelines is reflected in the number of citations in published scientific articles: 254 citations and 209, respectively (ISI web of science cited reference searched February 10th, 2009). Please note that since the Cochrane
Handbook for Systematic Reviews of Interventions is
updated on a regular basis, readers are advised to consult
the most current version before starting or updating their
reviews.
Materials and Methods
In June 2006, the editorial and advisory boards of the CBRG
met in Amsterdam at the VIII International Forum for Primary
Care Research on Low-Back Pain to discuss the update. They
recognized that some challenging topics in the 2003 method
guidelines needed revision (e.g., levels of evidence, clinical relevance of the results, and recommendations for updates).
After the meeting, a draft of the revised method guidelines was
circulated among the editors. Each editor was given a chance to
comment on additions, deletions, or other changes that were
made since the last update. Of the 13 editors, 8 participated in this
process. Feedback was incorporated into a second draft of the
guidelines and circulated among all CBRG editors and advisory
board members for comments. The second draft was presented
and discussed at the IX International Forum for Primary Care
Research on Low-Back Pain in Palma de Mallorca, Spain, in October 2007. A working draft was made available to review authors in The Cochrane Library 2008, issue 3. Publication of the
final version was delayed to be able to incorporate the new Cochrane Handbook for Systematic Reviews of Interventions,
which covered the new GRADE approach and the new Review
Manager 5 and GRADEprofiler software.

Method Guidelines
Review Objective. Reviews with the Cochrane Back Review
Group start with a clinically relevant question that is clearly
defined in the objectives. The objectives should outline the intervention and participants. The Editorial Board recommends
that reviews focus specifically on (sub)acute or chronic back or
neck pain. It is also recommended that reviews focus separately
on nonspecific back or neck pain, sciatica or radicular symptoms, or specific causes (e.g., spinal stenosis, scoliosis). In addition, review authors should outline the comparisons that will
be evaluated in the review (Figure 1).

Literature Search
Minimum Criteria. One of the main principles underpinning a
systematic review is to include all available evidence. Therefore, once the research question has been defined, the literature
search is the next, very important step in conducting a systematic review. The starting point for the literature search is to
decide which articles should be retrieved, ensuring that as many
relevant trials as possible are identified. The search strategy
should relate directly to the research question(s) of the review
at issue and should be based on the inclusion criteria with
respect to study design, participants, interventions, and outcomes (see Inclusion Criteria section). Searching only MEDLINE
is clearly insufficient since it has been shown that in general,
approximately only half of the available RCTs will be identified
if MEDLINE is the only databases searched.8 It has been suggested that at least MEDLINE and EMBASE must be used to
ensure a comprehensive literature search, because overlap between these databases is small.9 11 Especially in the field of low
back pain, EMBASE has been shown to retrieve more clinical
trials than MEDLINE.12
Therefore, we recommend the following as a minimum
search strategy:
1. A computer-aided search of the MEDLINE and EMBASE databases since their inception for new reviews
and since the date of the previous search for updates of
reviews.7,8 The highly sensitive search strategies for retrieval of reports of controlled trials should be run in conjunction with a specific search for spinal disorders and the
intervention at issue (Appendix 1, Supplemental Digital
Content 1, available at: http://links.lww.com/BRS/A373
and Appendix 2, Supplemental Digital Content 2, available
at: http://links.lww.com/BRS/A374). It has been demonstrated that simple search strategies (i.e., strategies with a
few terms) are not adequate for systematic reviews.13
2. A search of the Cochrane Central Register of Controlled
Trials (CENTRAL) that is included in the most recent
issue of The Cochrane Library.
3. A search of the CBRG Trials Register by contacting the
editorial base of the Cochrane Back Review Group.
4. Screening references listed in relevant systematic reviews
and identified RCTs.
The search strategy should not be limited by language.
Unless they have easy access to a health sciences librarian who
is experienced in searching electronic databases, we suggest that
review authors contact the CBRG (Cochrane@iwh.on.ca) for assistance in developing and conducting the literature search. We
recommend that 2 review authors independently apply the inclusion criteria to select the potentially relevant trials from the titles,
abstracts, and keywords of the references retrieved by the literature search. Articles selected in this first round, articles for which
disagreement exist, and articles for which title, abstract, and keywords provide insufficient information for a decision should be
obtained so that the final decision about whether they meet the
inclusion criteria is based on the full paper. A consensus method
should be used to select the potentially relevant trials at both
steps. If disagreements persist, a third review author should be
consulted.
Reviews should be submitted within a year of the latest search
date. Because some reviews can take longer than a year to complete, the CBRG recommends that the authors update the search

2009 Updated Guidelines for Systematic Reviews Furlan et al 1931


when the review is submitted to determine if important trials have
been published since the last search. The authors may contact the
CBRG Trials Search Coordinator for assistance. The review authors can decide if it is feasible to include newly identified trials in
the current review, or a future update.
If one of the review authors is a (co-)author of one of the
potentially relevant trials, this person should not be involved in
any decisions about inclusion of the trial at issue.
Further Guidance. Depending on the intervention at issue, and
if available, specific databases should be searched, for example:
Mantis (Manual Alternative and Natural Therapy Index
System) for chiropractic interventions (http://www.
chiroaccess.com/MANTISDatabaseOverview.html)
Complementary and Alternative Medicine Specialist Library, from the National Library of Health, UK) for complementary medicine interventions (http://www.library.nhs.uk/cam/)
PsycINFO for psychological interventions (http://
www.apa.org/psycinfo/)
PEDro (Physiotherapy Evidence Database) for physiotherapy interventions (http://www.pedro.fhs.usyd.edu.au/)
Cumulative Index of Nursing and Allied Health (CINAHL) for allied health interventions (http://www.
ebscohost.com/cinahl/)
Index to Chiropractic Literature (http://www.chiroindex.
org/)

Other search strategies are recommended, but are not essential,


such as:
Identification of ongoing trials. The CBRG Trials Search
Coordinator (TSC) will identify ongoing trials that are registered on the WHO International Clinical Trials Registry
Platform (http://www.who.int/ictrp/en/); these should be included in the reference section of the review (ongoing studies) in addition to those identified by the review authors
through their own contacts.
Personal communication with content experts in the field
and with authors of identified RCTs.14 It is up to the discretion of the review authors to identify who the experts are on
a specific topic and to describe the process and results of the
contact in the review.
15
Citation tracking of the identified RCTs.
The value of
using citation tracking has not yet been established, but it
may be especially useful to identify additional studies of
topics that are poorly indexed in MEDLINE and EMBASE.
The Editorial Board recommends using the search strategy suggested by Golder et al16 to find reports of adverse
events of their interventions, and the search strategy suggested by Furlan et al, if review authors plan to include
observational studies.17 Contact a health sciences librarian
or the CBRG Trials Search coordinator for help in developing these search strategies.

Inclusion Criteria
Minimum Criteria
Study Design. RCTs with clearly reported and appropriate
randomization should be included. If the article only reports
that the trial is a randomized trial or that the participants were
randomly allocated to the intervention groups without a clear
description of the method of randomization, the authors
should be contacted for further information. Examples of ap-

propriate randomization techniques are: computer-generated


random sequence, sequentially-ordered vials, telephone call to
a central office, and preordered list of treatment assignments.
Participants. Participants of trials that will be included in
the systematic review should be defined explicitly in terms of
age, gender, type, duration, localization and severity of symptoms, setting, and recruitment procedure. It is particularly important to report if participants with acute (less than 6 weeks),
subacute (six to 12 weeks), or chronic (12 weeks or more) back or
neck pain are included. It is also important to report if the participants have nonspecific back or neck pain or radiating symptoms.
If there is a reason to collapse the duration of symptoms, the
categories should be (sub)acute (less than 12 weeks) and chronic
(longer than 12 weeks). In most reviews, it is preferable to include
a homogeneous population. However, in some reviews it might be
appropriate to include mixed populations, and the reasons for this
should be given. Mixed populations refer to a population that has
a combination of acute, subacute, or chronic pain, or a population
with both back and neck pain.
Interventions. It is recommended that a definition and potential mechanism of action related to prevention or treatment
of back or neck pain of the intervention under study is included
and referenced in the review. The type, intensity, dosage, frequency and duration of both the index and comparison interventions to be included in the review should be explicitly described. If appropriate for the intervention, the skills, training
and experience of the provider should also be included.
Comparisons should include a clear contrast for the index
intervention, so that the independent effects of the intervention
can be assessed. For example, a comparison of traction plus
exercises versus the same exercises alone is a relevant comparison in a review on traction, while a comparison of traction
plus exercises versus spinal manipulation is not a relevant comparison in the same review.
Outcomes. The outcome measures and instruments that will
be included in the review should be explicitly described. Important patient-centered outcomes, as outlined in Deyo et al18 such
as: symptoms (e.g., pain), overall improvement or satisfaction
with treatment, back-specific functional status (e.g., measured
with the Roland Morris Questionnaire, Oswestry Disability
Index), well-being (e.g., quality of lifemeasured with the SF36, SF-12, EuroQuol), and disability (e.g., ability to perform
activities of daily living, return-to-work status, work absenteeism). Adverse events (intended and nonintended) should always be included in a systematic review of back or neck pain if
they are reported in the original trials. If explicit adverse events
are to be investigated, observational studies reporting on these
adverse events should also be included. Depending on the intervention, specific outcomes may be relevant, for example:
depression for a review of antidepressants, knowledge gain for
a review of patient education, or radiologic outcomes for a
review of surgical intervention.
The timing of measuring outcomes should be explicitly described. Outcomes should be separately reported for at least shortterm (closest to 4 weeks) and long-term (closest to 1 year).
Language. The empirical evidence on excluding trials published in languages other than English is conflicting.19 24 The
Editorial Board recommends including studies published in
languages other than English, for example, by finding native
speakers and meeting with them to assess the risk of bias and

1932 Spine Volume 34 Number 18 2009


extract the data together. However, we acknowledge that it
may not always be feasible and may depend on the time and
resources available. Potential articles retrieved in languages
outside the linguistic skills of the review team (or their local
sources) should be brought to the attention of the CBRG editorial staff, who will try to find translators. If trials published in
other languages are excluded from the review, these trials
should be listed in the section on excluded trials. We strongly
recommend having an international group of (co-) authors
with different language skills involved in a systematic review to
enable the inclusion of trials in languages other than English.
This is particularly recommended for topics where there are
likely to be a significant number of non-English language publications (e.g., the Asian literature on acupuncture), in which
case, we suggest including review authors with the relevant
language skills.
Further Guidance
Study Design. Authors wishing to include studies besides
RCTs (with appropriate and clearly reported randomization
methods) should outline the reasons for this. Examples of other
study designs that are acceptable are:
RCTs that do not clearly report the method of randomization.
Quasi-RCTsQuasi-RCTs may be included if there are
fewer than 5 RCTs. Quasi-RCTs are controlled clinical trials that use methods of allocation that are not random, and
therefore, may be open to bias. Examples of quasirandomization techniques are: alternation, birth date, social
insurance/security number, date in which they are invited to
participate in the study, and hospital registration number.
Studies without a control group and publications that are
only expert opinion should not be included (Table 1).25
Outcomes. Outcomes of physical examination (range of
motion, spinal flexibility, degrees of straight leg raising, or
muscle strength), care-provider-centered outcomes (e.g., outcome assessors global improvement), and other outcomes
(medication use, healthcare utilization) may be included as secondary outcomes where appropriate, depending on the aim of
the intervention at issue.

Assessing Risk of Bias


Minimum Criteria. Risk of bias in the studies should be independently assessed by at least 2 review authors. Currently,
there is empirical evidence that inadequate concealment of
treatment allocation,26,27 inadequate double-blinding (of participants and outcome assessors), and a high drop-out rate, or
differences in number or reasons for dropping out between
groups, are associated with bias.26 31 This evidence is collected
in fields other than back and neck pain.
We recommend assessing the risk of bias in the studies by
using the criteria presented in Table 2 and the instructions
presented in Table 3. These instructions are adapted from van
Tulder,6 Boutron et al (CLEAR NPT),32 and the Cochrane
Handbook of Reviews of Interventions.7 Of these criteria, 11
have already been used in 26 (65%) and 10 have been used in 7
(18%) systematic reviews within the CBRG (The Cochrane
Library 2008, issue 4). These criteria are also considered important by others who study nonpharmaceutical interventions.32,33 Internal validity criteria refer to characteristics of
the study that might be related to selection bias (criteria 1, 2, 9),

performance bias (criteria 3, 4, 10, 11), attrition bias (criteria 6,


7), and detection bias (criteria 5, 12). Each criterion should be
scored as yes, unclear, or no, where yes indicates the criterion
has been met and therefore suggests a low risk of bias. The
Cochrane Handbook for Systematic Reviews of Interventions7
recommends that review authors assess at least 5 issues associated with risk of bias: sequence generation, allocation concealment, blinding of participants, personnel and outcome assessors, incomplete outcome data, selective outcome reporting,
and other potential threats to validity not already identified.
The criteria recommended by the CBRG are aligned with the
new Handbook, except for selective reporting of outcomes. We suggest adding this item as the 12th internal validity criterion.
We recommend that the studies are rated as having a low
risk of bias when at least 6 of the 12 CBRG criteria have been
met and the study has no serious flaws (e.g., 80% drop-out rate in
1 group). Studies with serious flaws, or those in which fewer than
6 of the criteria are met should be rated as having a high risk of
bias. There is empirical evidence from a methodologic study conducted with data from the CBRG that a compliance threshold of
less than 50% of the criteria is associated with bias.34
The results of the assessment, including the rationale for the
decision, should be presented in the Risk of Bias table, which
is included with the Characteristics of included studies table
in Review Manager 5. If one of the review authors is an author
or coauthor of one of the included trials, this person should not
be involved in any decisions regarding the risk of bias assessment of the trial at issue.
Further Guidance. Some empirical evidence suggests that
blinded risk of bias assessment, that is, removing the names of
authors, institution, and journals from the articles when assessing the risk of bias, resulted in more consistent and higher
rating of bias than open assessment.35 However, 2 other studies did not find an association between blinded assessment of
studies and bias.33,36 It is difficult to achieve true blinding,
because experts are usually involved in the risk of bias assessment of the studies. Therefore, the CBRG leaves it to the discretion of the review authors to decide whether or not to perform a blinded risk of bias assessment. Because assessment by
content experts may be biased by prior opinions, it may be
desirable to have both a clinical content expert and a nonexpert
(but with a methodologic background) assess the risk of bias in
the studies. In systematic reviews where there is likely to be a
conflict of interest (e.g., chiropractors or manual therapists reviewing spinal manipulation, or physiotherapists reviewing exercise therapy), it may be desirable to also mask the studies for
results and conclusions, or to include someone who has no
potential conflict of interest in the risk of bias assessment.
We recommend that review authors pilot-test the risk of
bias assessment on some similar articles (regarding another
intervention or disorder) that will not be included in the review.
It is important for review authors to agree on a common interpretation of the items and their operationalization.
We recommend using a consensus method to discuss and
solve disagreements between the review authors. If disagreement persists, another independent person should be consulted
who is an expert in review methodology. The initial interobserver reliability (e.g., Kappa) of the risk of bias assessment
should be evaluated and reported.
A study in the field of rheumatology showed that some trials
that inadequately reported the method of randomization and

2009 Updated Guidelines for Systematic Reviews Furlan et al 1933

Table 1. Taxonomy of Study Design of Studies Assessing the Effects of Health-Care Interventions
Experimental studies with control group (clinical
trials or trials): The investigator has control
over the decision concerning the allocation of
participants to different intervention groups.

Randomized controlled trial (RCT)

Controlled clinical trial (CCT)

Observational studies with control group: The


investigators intention is to observe and not
to interfere with routine care.

Cohort study

Survival cohort study

Case control study

Cross-sectional study

Uncontrolled studies (without a separate control


group): can be experimental or observational
in nature.

Case series

Case reports

Reprinted with permission from J Clin Epidemiol.25 Copyright 2008, Elsevier.

(A) Reported method of randomization and the method


is adequate (see text for examples).
(B) Did not report methods of randomization. Only the
phrase randomized study, random allocation or
other similar expression is reported.
(A) Reported the method of allocation and this method
is inadequate (see text for examples). Synonym:
quasi-randomized controlled trial (q-RCT).
(B) Did not report the method of allocation and there is
no phrase or expression indicating that the
allocation was randomized.
Synonyms: Longitudinal study (emphasizing that people
are followed over time); Prospective study (implying
the forward direction of the research question);
Incidence study (calling attention to the basic
measure of new diseases events over time).
The cases are selected based on exposure to the
interventions. It involves measuring the occurrence of
disease within 1 or more group of individuals who are
followed or traced over a period of time. A true
cohort study is an inception cohort study, to
differentiate from survival cohorts.
Cohort studies can be prospective or retrospective
with regards to the data collection: prospective
means that the study is planned before any data is
collected; and retrospective means that when the
study is planned, all (or part of) the data is already
collected.
Synonym: available patients cohort.
People are included in a study because they both
have a disease and are currently availableperhaps
they are being seen in a specialized clinic. Survival
cohorts are misleading if they are presented as true
cohorts. In a survival cohort, people are assembled at
various times in the course of their disease, rather
than at the beginning as in a true cohort study. Their
clinical course is then described by going back in
time and seeing how they have fared up to the
present.
The cases are selected based on the outcomes. A
research design in which all group selection, pretest
data, and posttest data are collected after completion
of the treatment. The evaluator is thus not involved in
the selection or placement of individuals into
comparison or control groups. All evaluation decisions
are made retrospectively. Individuals are matched on
variables thought to be critical in determining the
outcome, therefore the groups are equivalent except
for the interventions.
All of the information refers to the same point in time.
There is no follow-up. They are usually conducted by
collecting data from administrative databases (census,
hospital discharges and workers compensation
databases).
The participants are described as a group
(A) Case study: A single group is studied only once,
subsequent to some agent or treatment presumed to
cause change.
(B) Before and after: a single group is studied before
and after some agent or treatment presumed to
cause change.
(A) Case reports: the participants are described
individually.
(B) N-of-1 randomized trial: the patient undergoes
pairs of treatment periods randomized so that 1
period involves the use of experimental treatment
and the other involves the use of an alternate or
placebo therapy.

1934 Spine Volume 34 Number 18 2009

Table 2. Sources of Risk of Bias


A
B
C

E
F

1. Was the method of randomization adequate?


2. Was the treatment allocation concealed?
Was knowledge of the allocated interventions
adequately prevented during the study?
3. Was the patient blinded to the intervention?
4. Was the care provider blinded to the
intervention?
5. Was the outcome assessor blinded to the
intervention?
Were incomplete outcome data adequately
addressed?
6. Was the drop-out rate described and
acceptable?
7. Were all randomized participants analysed
in the group to which they were allocated?
8. Are reports of the study free of suggestion
of selective outcome reporting?
Other sources of potential bias:
9. Were the groups similar at baseline
regarding the most important prognostic
indicators?
10. Were co-interventions avoided or similar?
11. Was the compliance acceptable in all
groups?
12. Was the timing of the outcome assessment
similar in all groups?

Yes/No/Unsure
Yes/No/Unsure

The Editorial Board refers the reader to Chapter 8 in the


Cochrane Handbook of Systematic Reviews for Interventions7
for further details on assessing risk of bias.

Data Extraction
Yes/No/Unsure
Yes/No/Unsure
Yes/No/Unsure

Yes/No/Unsure
Yes/No/Unsure
Yes/No/Unsure
Yes/No/Unsure
Yes/No/Unsure
Yes/No/Unsure
Yes/No/Unsure

allocation concealment had actually performed them adequately.37


Therefore, if the article does not contain information on (one or
more of) the internal validity criteria, the authors may be contacted for additional information. If the authors cannot be contacted or if the information is no longer available, the criteria
should be scored as unclear, with an explanation.
Different risks of bias may explain the variation in the results of the studies included in a systematic review and can
result in over- or underestimation of the effectiveness of the
intervention at issue. However, there are no strict guidelines for
the use of risk of bias assessment in systematic reviews. In
general, we recommend choosing one of the options listed below and clearly describe the logic behind the choice.38,39
First, based on 1 or more domains, the risk of bias can be
used as an additional inclusion criterion for studies in the review (e.g., only include adequately randomized RCTs or double-blinded RCTs) or based on the number of criteria met (e.g.,
only include studies that adequately fulfill six of the 12 validity
criteria and have no serious flaws). Second, a stratified analysis
can be performed in which the results are separately presented
for different strata of studies (e.g., studies that meet specific
criteria, or studies with a low or high risk of bias). Third, a
sensitivity analysis can be performed to determine whether
the overall results are the same when studies with different
definitions of low or high risk of bias are analyzed. Fourth,
weights can be applied in the analysis to studies according to
the risk of bias, so that studies with a lower risk of bias have
more impact on the overall results. Obviously, choosing
weights involves additional arbitrary decisions. Fifth, a cumulative meta-analysis can be performed by examining the
impact on the overall results as studies with increasing risk
of bias are included one at a time. And last, a metaregression can be performed to explore the relation between
criteria met and the magnitude of effect across outcomes and
studies. The first 4 options are also available when statistical
pooling is not feasible; the last 2 apply specifically to statistical pooling.

Minimum Criteria. At least 2 review authors should independently extract the data. Data describing study characteristics
that include characteristics of participants, interventions, comparisons, outcomes, analysis, results, and study sponsorship
should be extracted and presented in a table (see inclusion
criteria for full details). Cointerventions and other confounders
should be described in as much detail as possible to enable
accurate comparison.
If one of the review authors is an author or coauthor of one
of the included trials, this person should not be involved in any
decisions regarding the data extraction of the trial at issue.

Further Guidance
The CBRG recommends that authors use a standardized form
for data extraction that will facilitate the comparison process.
It is advisable to pilot test the data extraction form to minimize
misinterpretations or later disagreements. If there are disagreements, consensus should be achieved by discussion among the
review authors. If disagreements persist, an independent person
should be consulted. If the article does not contain sufficient
information, the authors may be contacted.
Data extraction forms will vary across different systematic
reviews, but there will also be similarities among the forms
needed for reviews on back and neck pain. Because designing a
data extraction form is time-consuming, and given the important function of data extraction forms, it may be helpful to
profit and learn from experiences of others. Examples of data
extraction forms used in other reviews can be obtained from
the CBRG website: www.cochrane.iwh.on.ca.

Data Analysis
Minimum Criteria. Regardless of whether the authors use a
quantitative analysis (meta-analysis) or not, the results from
studies should only be combined when they are judged to be
sufficiently clinically similar to yield meaningful results. This
means review authors should avoid combining studies that are
clinically heterogeneous for populations, interventions, comparisons, or outcomes. A meta-analysis should be conducted
whenever trials measuring a specific outcome at similar follow-up (short-term and/or long-term) report sufficient data to
do so. When a meta-analysis is performed with only a subset of
trials, review authors should assess whether the results of the
studies not reported quantitatively are consistent with the
meta-analysis. The analysis should include an explicit description of the comparisons (Figure 1).
Short-term follow-up refers to outcomes that are measured
closest to 4 weeks after randomization; it could be as short as 7
days in a trial of analgesics and as long as 12 weeks in a trial of
exercise therapy. Intermediate follow-up refers to measures taken
closest to 6 months. Long-term follow-up refers to measures taken
closest to 1 year. Long-term surgical outcomes should be measured at 5 years. Unless otherwise stated, outcomes are assumed to
be measured after the treatment is completed.
The Editorial Board refers the reader to Chapter 9 of the
Cochrane Handbook for Systematic Reviews of Interventions7
for further guidance on data analysis.
The primary analysis of the review should only be based on the
results from RCTs (Table 1). If review authors include designs

2009 Updated Guidelines for Systematic Reviews Furlan et al 1935

Table 3. Criteria for a Judgment of Yes for the Sources of Risk of Bias
1

2
3
4
5

7
8
9
10
11

12

A random (unpredictable) assignment sequence. Examples of adequate methods are coin toss (for studies with 2 groups), rolling a dice
(for studies with 2 or more groups), drawing of balls of different colors, drawing of ballots with the study group labels from a dark
bag, computer-generated random sequence, pre-ordered sealed envelops, sequentially-ordered vials, telephone call to a central
office, and pre-ordered list of treatment assignments Examples of inadequate methods are: alternation, birth date, social insurance/
security number, date in which they are invited to participate in the study, and hospital registration number.
Assignment generated by an independent person not responsible for determining the eligibility of the patients. This person has no
information about the persons included in the trial and has no influence on the assignment sequence or on the decision about
eligibility of the patient.
This item should be scored yes if the index and control groups are indistinguishable for the patients or if the success of blinding was
tested among the patients and it was successful.
This item should be scored yes if the index and control groups are indistinguishable for the care providers or if the success of
blinding was tested among the care providers and it was successful.
Adequacy of blinding should be assessed for the primary outcomes. This item should be scored yes if the success of blinding was
tested among the outcome assessors and it was successful or:
for patient-reported outcomes in which the patient is the outcome assessor (e.g., pain, disability): the blinding procedure is
adequate for outcome assessors if participant blinding is scored yes
for outcome criteria assessed during scheduled visit and that supposes a contact between participants and outcome assessors
(e.g., clinical examination): the blinding procedure is adequate if patients are blinded, and the treatment or adverse effects of the
treatment cannot be noticed during clinical examination
for outcome criteria that do not suppose a contact with participants (e.g., radiography, magnetic resonance imaging): the blinding
procedure is adequate if the treatment or adverse effects of the treatment cannot be noticed when assessing the main outcome
for outcome criteria that are clinical or therapeutic events that will be determined by the interaction between patients and care
providers (e.g., co-interventions, hospitalization length, treatment failure), in which the care provider is the outcome assessor: the
blinding procedure is adequate for outcome assessors if item 4 (caregivers) is scored yes
for outcome criteria that are assessed from data of the medical forms: the blinding procedure is adequate if the treatment or
adverse effects of the treatment cannot be noticed on the extracted data
The number of participants who were included in the study but did not complete the observation period or were not included in the
analysis must be described and reasons given. If the percentage of withdrawals and drop-outs does not exceed 20% for shortterm follow-up and 30% for long-term follow-up and does not lead to substantial bias a yes is scored. (N.B. these percentages
are arbitrary, not supported by literature).
All randomized patients are reported/analyzed in the group they were allocated to by randomization for the most important moments of
effect measurement (minus missing values) irrespective of non-compliance and co-interventions.
In order to receive a yes, the review author determines if all the results from all pre-specified outcomes have been adequately
reported in the published report of the trial. This information is either obtained by comparing the protocol and the report, or in the
absence of the protocol, assessing that the published report includes enough information to make this judgment.
In order to receive a yes, groups have to be similar at baseline regarding demographic factors, duration and severity of complaints,
percentage of patients with neurological symptoms, and value of main outcome measure(s).
This item should be scored yes if there were no co-interventions or they were similar between the index and control groups.
The reviewer determines if the compliance with the interventions is acceptable, based on the reported intensity, duration, number and
frequency of sessions for both the index intervention and control intervention(s). For example, physiotherapy treatment is usually
administered over several sessions; therefore it is necessary to assess how many sessions each patient attended. For singlesession interventions (e.g., surgery), this item is irrelevant.
Timing of outcome assessment should be identical for all intervention groups and for all important outcome assessments.

other than RCTs, the data should be analyzed separately and


contrasted with the results from the primary analysis.
If one of the review authors is an author or coauthor of one
of the included trials, this person should not be involved in any
data analysis that includes the trial at issue.

Further Guidance
Quantitative Analysis. If it is clinically relevant and statistically justified to combine the results, statistical pooling should
be performed that provides an overall estimate of effect, with a
95% confidence interval for each outcome.40,41 The Editorial
Board recommends contacting a statistician before performing
a quantitative analysis. A meta-analysis should start by examining potential publication and other biases with a funnel plot
to explore asymmetry among trial results.42 If asymmetry is
present, potential reasons should be explored. However, funnel
plots may be misleading and should be interpreted cautiously.43
Formal statistical tests also exist, but there is no consensus
regarding the strengths and weaknesses of these tests.44 46
For the meta-analysis of dichotomous outcomes, the relative
risk, risk difference, or odds ratio can be used to summarize the
effect. Empirical evidence from 125 meta-analyses showed that
summary odds ratios and risk differences usually lead to similar

conclusions about treatment effect, but that risk differences are


substantially more heterogeneous.47 For continuous outcomes,
mean differences from each trial can be combined. If the continuous outcomes are not directly combinablethat is, if different
instruments are used for the same outcome measurements
standardized mean differences (effect sizes) might be used.40,41 For
time-to-event data (e.g., return-to-work), survival analysis is the
most appropriate statistic to use.48 If necessary, the authors of the
original studies may be contacted to provide relevant information.
If data are not presented in a way that can be easily included in a
meta-analysis, review authors should try to calculate effect sizes.
For example, for trials that report a mean outcome but no standard deviation, one could estimate the standard deviation by taking the mean standard deviation weighted by the relevant treatment groups sample size across all other trials that reported
standard deviations for the same outcome.
There are 2 statistical models for combining data in a metaanalysis: the fixed-effect model and the random-effects model.40
Although there are arguments favoring each model, in general,
the clinical heterogeneity of the back and neck pain literature
suggests that the assumptions underlying the random-effects
model are better suited to statistical combination of trials in
this field. However, the random-effects model does not account

1936 Spine Volume 34 Number 18 2009

Population 1: Acute low-back pain with neurological symptoms.


Comparison 1.1: traction vs. placebo/sham/no treatment
Outcome 1.1.1: pain intensity
Follow-up: short-term
Intermediate-term
long-term
Outcome 1.1.2: functional status
Follow-up: short-term
Intermediate-term
long-term
Outcome 1.1.3 ..
Comparison 1.2: traction vs. exercise therapy
Outcome 1.2.1: pain intensity
Follow-up: short-term
Intermediate-term
long-term
Outcome 1.2.2: functional status
Follow-up: short-term
Intermediate-term
long-term
Outcome 1.2.3 ..
Population 2: Acute low back pain without neurological symptoms.
Comparison 2.1: traction vs. placebo/sham/no treatment
Outcome 2.1.1: pain intensity
Follow up: .
Population 3: Chronic low back pain with neurological symptoms.

Figure 1. Example of an analysis


for a systematic review on traction for low-back pain.

for the heterogeneity, does not explain it, and does not take it
away. Careful analysis of heterogeneity, that is, of study characteristics that might explain differences among the results, is
always important.49 The characteristics of participants, types
of interventions, and the exact outcome values should be
clearly articulated for each group of study results that are combined. Sensitivity analyses should be performed to examine the
impact of variation in risk of bias or individual validity criteria
(refer Assessing Risk of Bias section).
Sometimes it may be difficult for review authors to decide
whether it is clinically relevant to combine the results from a
group of studies in a meta-analysisfor example, studies of
participants with different types of treatments, different comparison groups, or different clinical characteristics. There are
no simple answers here, and review authors must be explicit
about their decisions so that others may judge for themselves
whether their choices were clinically sensible.
A related but separate issue concerns statistical homogeneity. A test for the statistical homogeneity of studies may be
performed to evaluate whether the differences among the results of the studies are greater than those that would be found
by chance alone. However, the test is not very powerful, and
failure to reject the hypothesis of homogeneity is not proof that
the studies are homogeneous. If the hypothesis of homogeneity
is rejected, or if the review team decides, on clinical grounds,
that the studies are too heterogeneous to support statistical
combinations, then the potential sources of heterogeneity
should be examined, because the observed differences might be
caused by factors other than chance, such as different risks of
bias, characteristics of participants, interventions, control
groups, or outcomes. If the heterogeneity can be explained,
review authors should present the results of each relevant subgroup separately. Subgroup analyses should be kept to a min-

imum and should be defined a priori, because subgroup analyses can be informative but also misleading.50
Readers are referred to Chapters 9 and 10 in the Cochrane
Handbook of Systematic Reviews of Interventions7 for more
details on data analysis.

Grading the Quality of Evidence and Strength


of Recommendations
The Cochrane Handbook of Systematic Reviews of Interventions (see Chapter 12)7 and the CBRG Editorial Board recommend that review authors go beyond the reporting of the results
of quantitative analyses and rate the quality of the evidence for
each important patient-centered outcome. To help readers use
this new approach, the CBRG has adapted the GRADE approach for back and neck pain reviews. The quality of the
evidence on a specific outcome is based on 5 domains: limitations of the study design, inconsistency, indirectness (inability
to generalize), and imprecision (insufficient or imprecise data)
of results and publication bias across all studies that measure that
particular outcome.51 (Appendix 3, Supplemental Digital Content 3, two examples extracted from the Cochrane reviews of
Rehabilitation after lumbar disc surgery52 and Massage for
low back pain,53 available at: http://links.lww.com/BRS/A375).
The most important step is to choose which outcomes are
relevant for inclusion in the GRADE Evidence Profile. This is
based on the choice of primary outcome measures, selected a
priori in the protocol stage (see section inclusion criteria: outcome measures). For each outcome, all applicable RCTs (i.e.,
those that measured the outcome) are noted in the first column,
regardless of whether they have sufficient data to be combined
in a meta-analysis. Only RCTs included in the primary analysis
of the review should be included in the GRADE Evidence Profile (see section inclusion criteria: study design).

2009 Updated Guidelines for Systematic Reviews Furlan et al 1937


Limitations of the studies refer to the results of the risk of
bias assessment of the studies identified in column 1, using the
12 criteria recommended above. For example, if the studies
have a high (fewer than six criteria met, a fatal flaw that puts
the validity in question, or both) or low (six or more criteria
met, with no fatal flaws) risk of bias. Flaws or unmet criteria
should be explained in a footnote of the GRADE Evidence
Profile and Summary of Findings table.
Inconsistency refers to the lack of similarity of estimates
of treatment effects for the outcome across studies. Study results are considered consistent when direction, effect size, and
statistical significance are sufficiently similar to lead to the same
conclusions. Consistency in direction is defined as 75% or more
of the studies showing either a benefit or no benefit. In the case of
a benefit, consistency in effect size is defined as 75% or more of the
studies showing a clinically important or unimportant effect (see
section on clinical relevance). Consistency in statistical significance is defined by the Chi squared test for heterogeneity.
Indirectness (lack of ability to generalize) refers to the extent
to which the people, interventions and outcomes in the trials are
not comparable to those defined in the inclusion criteria of the
review. If the authors decide that there is uncertainty about generalizability of the results, the reason should be given in a footnote.
Authors may suggest that their results are more applicable to a
specific population, (e.g., the effects of using insoles for young,
male army recruits rather than a general working population)54 or
that the results are based on an indirect comparison (e.g., there is
strong evidence that discectomy is more effective than chemonucleolysis and that chemonucleolysis is more effective than placebo: ergo, discectomy is more effective than placebo).55
Imprecision refers to the number of participants and events
and the width of the confidence interval for each outcome, especially when the confidence interval is sufficiently wide so that the
estimate could either support or refute the effectiveness of the
index intervention. The CBRG Editorial Group further recommends that data are imprecise when only 1 study reports an outcome, regardless of the sample size or the confidence interval and
when fewer than 75% of the studies present data that can be
included in a meta-analysis. A footnote should explain the exact
reason why data were judged to be sparse or imprecise.
Publication bias refers to the probability of selective publication of trials and outcomes. This bias might be considered if
full results for planned outcomes identified in a protocol or the
trial report are not provided in the results section. If the review
authors decide there is publication bias, they should support
their decision in a footnote.

The overall quality of the evidence for each outcome is


the result of the combination of the assessments in all domains.
The GRADE Working Group recommends 4 levels of evidence:
High quality evidence at least 75% of the RCTs with no
limitations of study design have consistent findings, direct
and precise data and no known or suspected publication
biases.
Moderate quality evidence 1 of the domains is not met.
Low quality evidence 2 of the domains are not met.
Very low quality evidence 3 of the domains are not met.
The CBRG recommends adding another level:
No evidence no RCTs were identified that addressed this
outcome.
GRADEprofiler software is available to develop the GRADE
Evidence Profiles by importing data from Review Manager 5.
See the Cochrane Handbook for Systematic Reviews of Interventions,7 chapter 12 for more details on grading the evidence.

Clinical Relevance
Further Guidance. The CBRG recommends including an assessment of clinical relevance of study results in systematic
reviews. The conclusions about the effectiveness of the intervention should contain all the important information needed to
enable users to make a decision about the applicability of the
results to their population. The clinical relevance of the studies
should be independently assessed by at least 2 review authors.
In the 2003 Updated Method Guidelines, the Editorial
Board recommended 5 questions to assess the clinical relevance
of each included study.56,57 In 2006, Malmivaara et al, in consultation with the Editorial Board, reviewed the set of 5 questions and articulated the details in the evaluation of applicability and clinical relevance of results of RCTs. The final
consensus consisted of 40 items. For the most part, these items
are characteristics of the population, interventions, comparisons, analysis, and results that review authors are advised to
extract from the studies. These details should be used to answer
the 5 questions (Table 4). For more details and examples on
how to assess each item, review authors are encouraged to read
the original study by Malmivaara et al.58 There is ongoing
research examining how to determine important clinical differences in pain reduction and functional improvement. At
present, there is consensus regarding minimal clinically important changes for pain and function in back pain.59 Authors are

Table 4. Questions to Determine if Results Are Clinically Relevant


Based on the data provided, can you determine if the results will be clinically relevant?
Are the patients described in detail so that you can decide whether they are comparable to those that you see in
your practice?
Are the interventions and treatment settings described well enough so that you can provide the same for your
patients?
Were all clinically relevant outcomes measured and reported?
Is the size of the effect clinically important?*
Are the likely treatment benefits worth the potential harms?

Yes

No

Unsure

Yes

No

Unsure

Yes
Yes
Yes

No
No
No

Unsure
Unsure
Unsure

*For low-back pain, consider 30% on VAS/NRS for pain as clinically significant,59,62 and 2 to 3 points (or 8 to 12%) on the Roland-Morris Disability Questionnaire
for function.59,60
*For neck pain, consider 3.5 to 5 U on the 50-U Neck Pain Disability Index or 7 to 10% change63,64 for function and 2.5 on an 10-U NRS (25% change) for pain.63
*For effect size, most authors use Cohens 3 levels.61
Small: WMD less than 10% of the scale (e.g., 10 mm on a 100 mm VAS); SMD or d scores 0.5; relative risk, 1.25 or 0.8 (depending on whether it reports
risk of benefit or risk of harm).
Medium: WMD 10 to 20% of the scale; SMD or d scores from 0.5 to 0.8; relative risk between 1.25 to 2.0, or 0.5 to 0.8.
Large: WMD 20% of the scale; SMD or d scores 0.8; relative risks 2.0 or 0.5.
VAS indicates Visual Analog Scale; NRS, Numerical Rating Scale; SMD, standardized mean difference; WMD, weighted mean difference.

1938 Spine Volume 34 Number 18 2009

No significant difference between index and comparison group(s)


Quantitative analysis:
There is (high/moderate/low/very low) quality evidence from (X) trials (no. of people)
that there is no statistically significant difference in (short-term/ long-term) follow-up
for (outcome Z) (RR 1.1, 95% CI 0.8 to 1.4), between individuals with
(acute/subacute/chronic) (back/neck) pain (with/without) neurological symptoms who
received (index) and those who received (comparison).
Qualitative analysis:
There is (high/moderate/low/very low) quality evidence from (X) trials (no. of people)
that there is no significant difference in (short-term/long-term) follow-up for (outcome
Z), between individuals with (acute/subacute/chronic) (back/neck pain) (with/without)
neurological symptoms] who received (index) and those who received (comparison).
Index is more/less effective than comparison group(s)
Quantitative analysis:
There is (high/moderate/low/very low) quality evidence from (X) trials (no. of people)
that (index intervention) is (more/less) effective than (comparison intervention) for
individuals with (acute/subacute/chronic) (back/neck) pain (with/without) neurologic
symptoms for (outcome A) at (short-term/long-term) follow-up with RR 4.0 (95% CI
3.0 to 5.0) and (outcome B) at (short-term/long-term) follow-up with RR 4.0 (95% CI
3.0 to 5.0).
Qualitative analysis:
There is (high/moderate/low/very low) quality evidence from (X) trials (no. of people)
that (index intervention) is (more/less) effective than (comparison intervention) for
individuals with (acute/subacute/chronic) (back/neck) pain (with/without) neurologic
symptoms for (outcome A, B and C) in the (short-term/long-term).
Contradictory findings across trials
Qualitative analysis:
There is conflicting evidence from (X) trials (no. of people) about whether (index
intervention) is more/less effective than (comparison intervention) for individuals with
(acute/subacute/chronic) (back/neck) pain (with/without) neurological symptoms for
(outcome A, B and C) in the (short-term/long-term).
No evidence
There were no RCTs identified that examined the effects of (index intervention) for
individuals with (acute/subacute/chronic) (back/neck) pain (with/without) neurological
symptoms.
* The intervention for the comparison group should be explicitly described: placebo, no
treatment, waiting list controls, or treatment B (where treatment B is specifically
named).

Figure 2. Recommendation for


authors conclusions in systematic reviews.

advised to consult the literature that also includes key references on neck pain59 64 and include both statistical and clinical importance in their discussion(Table 4).59 64
The answers to these questions should be used to inform the
discussion of the final results and conclusions; for example, in
the discussion section, clinical relevance could be included as
follows: There was high quality evidence from 10 RCTs (2000
participants) that intervention A is more effective than no treatment for reducing pain in the long-term for individuals with
chronic low back pain. However, since none of the trials described the program in detail, it is difficult to determine how to
provide this treatment to your patients and which types of
exercise healthcare providers should provide to patients (this
example is not based on real data).

Conclusion
Minimum Criteria
Results should be listed in the same order as the comparisons and outcomes were set out in the protocol. To improve

consistency, the text should contain the following items


(Figure 2): quality of evidence, the number of trials (number
of participants), results of quantitative analysis (effect size
plus confidence interval), results of qualitative analysis (direction of the effect [more/less effective, no difference]), the
intervention, the type of participants, the comparison treatment (specifically stated), the outcome measured, and the
timing (short-term or long-term) of the outcome measure.
Example 1: There is high quality evidence from seven
trials (1268 people) that behavioral treatment is more effective than no treatment for individuals with chronic back
pain without neurologic symptoms for short-term pain relief (SMD: 0.62, 95% CI: 0.25 to 0.98) and short-term
behavioral outcomes (SMD: 0.40, 95% CI: 0.10 to 0.70
data only pooled from 5 trials).
Example 2: There is moderate quality evidence (4 trials; 354 people) that there is no statistically significant

2009 Updated Guidelines for Systematic Reviews Furlan et al 1939

difference in short-term pain relief between individuals


with chronic back pain with or without neurologic
symptoms who received acupuncture and those who received placebo or sham acupuncture.
Further Guidance
The Cochrane Handbook of systematic reviews of interventions,7 chapter 11 recommends that reviews include a
Summary of Findings table, which provides key information on the quality of evidence, the magnitude of effect of
the interventions examined, and the sum of available data
on the main outcomes. The information is imported from
the GRADEprofiler software and other data included in the
review. Main outcomes should be determined a priori, in
the protocol. Because the information is still new at time of
writing, review authors are directed to the Handbook for
more detailed information. As developed, we will add examples from the neck and back pain field to the Cochrane
Back Review Group website (www.cochrane.iwh.on.ca).
Updating
Minimum Criteria. One goal of Cochrane is to present the
best current evidence on the effects of healthcare interventions. This is accomplished by updating published
reviews as new evidence becomes available. The CBRG
Trial Search coordinator updates the literature searches
at least every 2 years and more frequently if important
new evidence is published and notifies the lead author of
the results. If the lead author is unable to complete the
update, for whatever reason, the Editorial Board reserves
the right to assume responsibility for the review. This
may include finding a new lead author or a totally new
review team.
The results of the updated literature search determine
the amount of work involved in updating the review. This
may range from the editorial office staff updating the literature search date and notifying the author that their review
was updated, in the event no new studies are identified, to
rewriting most of the review. Depending on when the original review was published, expectations of The Cochrane
Collaboration in general, and the CBRG in particular, may
have changed (e.g., based on the general direction of The
Cochrane Collaboration, this update of the method guidelines recommends using a GRADE approach rather than
Levels of Evidence for the final summary of results). Authors should explore the CBRG (www.cochrane.
iwh.on.ca) and Cochrane Collaboration (www.cochrane.
org) websites and contact the Managing Editor of the
CBRG for current information on updating your review.
Before starting an update of his or her review, the lead
review author should consider the following issues:
Is the current review team still willing and able to
update the review?
Are the inclusion criteria for studies, search strategies, risk of bias assessment criteria, analyses and
summary methods still appropriate?

Existing reviews may have included a combination of


(sub)acute or chronic back or neck pain. The Editorial
Board recommends that updates of reviews focus specifically on (sub)acute or chronic back or neck pain. It is
also recommended that reviews focus separately on nonspecific back or neck pain, sciatica or radicular symptoms, or specific causes (e.g., spinal stenosis, scoliosis).
This means that some reviews will need to be split into
two or more reviews with a smaller scope. This should be
discussed with the Managing Editor of the CBRG.
Discussion
The Editorial Board believes that systematic reviews represent one of the key advances in medical science in the past
15 years and offer a real opportunity for change in medical
practice worldwide. Obviously, one of the major challenges
for the future is to increase implementation of the results of
systematic reviews. Some initiatives have been developed
that try to make systematic reviews more easily available
for clinicians in daily practice. Recently published European and North American clinical guidelines on the management of low back pain have used the evidence from
systematic reviews as the basis for their recommendations.65 69 The BMJ Publishing Group publishes Clinical
Evidence, which is a summary of the current state of
knowledge based on Cochrane and other systematic reviews on the prevention and treatment of a wide range of
clinical conditions (www.clinicalevidence.com). NHS Clinical Knowledge Summaries from the UK are reliable
sources of evidence-based information (based in part on
Cochrane reviews) and practical know how about the
common conditions managed in primary care (http://
cks.library.nhs.uk/home; accessed September 19, 2008).
The number of evidence-based products being developed to
inform clinical decisions that use systematic reviews as the
basis for the evidence is rapidly increasing. Since behavioral
change is multifaceted, whether these and other implementation efforts indeed result in a change in clinicians behavior and in improved patient outcomes remains unclear.
Systematic reviews must be conducted as carefully as the
trials they report. To achieve full impact, systematic reviews
must meet high methodologic standards. The objective of
these method guidelines is to help review authors to design,
conduct, and report reviews of trials in the field of back and
neck pain systematically and explicitly. These guidelines are
not intended to set a gold standard or to discourage people
from doing a systematic review. On the contrary, we encourage people to undertake a systematic review in collaboration with others. The Cochrane Collaboration has just
released a new version of the Cochrane Handbook of Systematic Reviews Of Interventions (February 2008) and
Review Manager 5 (March 2008), the software used to
produce Cochrane review. The CBRG will post back and
neck-related examples on our website. Therefore, for more
guidance on systematic reviews of back and neck pain, we
refer readers to the Cochrane Handbook for Systematic
Reviews of Interventions (http://www.cochrane.org/
resources/handbook/index.htm), the Review Manager

1940 Spine Volume 34 Number 18 2009

website (http://www.cc-ims.net/RevMan), the


GRADEprofiler website (http://www.cc-ims.net/gradepro),
or the CBRG website (www.cochrane.iwh.on.ca). Address:
Cochrane Back Review Group, Institute for Work &
Health, Toronto, Ontario, Canada, M5G 2E9. Telephone:
(416) 927-2027, fax: (416) 927-4167.
Key Points
Many reviews of therapeutic interventions for
spinal disorders have been published. It is important that these reviews use adequate systematic
methods to minimize bias.
Previous method guidelines for systematic reviews in the field of spinal disorders were updated.
These method guidelines include recommendations that are mandatory (minimum criteria) and optional (further guidance) for review authors conducting reviews within the Cochrane Back Review Group.
The Cochrane Back Review Group now recommends using the GRADE approach to determine
the overall quality of the evidence for important
patient-centered outcomes across studies.
The method guidelines include a new section on
updating reviews.
Others may find these guidelines useful to plan,
conduct, or evaluate systematic reviews in the field
of spinal disorders.

Supplemental digital content is available for this article.


Direct URL citations appear in the printed text, and links to
the digital files are provided in the HTML text of this article
on the journals Web site (www.spinejournal.com).

References
1. Moher D, Cook DJ, Eastwood S, et al. Improving the quality of reports of
meta-analyses of randomised controlled trials: the QUOROM statement.
Quality of Reporting of Meta-analyses. Lancet 1999;354:1896 900.
2. Assendelft WJ, Koes BW, Knipschild PG, et al. The relationship between
methodological quality and conclusions in reviews of spinal manipulation.
JAMA 1995;274:1942 8.
3. Furlan AD, Clarke J, Esmail R, et al. A critical review of reviews on the
treatment of chronic low back pain. Spine 2001;26:E155 62.
4. Hoving JL, Gross AR, Gasner D, et al. A critical appraisal of review articles
on the effectiveness of conservative treatment for neck pain. Spine 2001;26:
196 205.
5. van Tulder MW, Assendelft WJ, Koes BW, et al. Method guidelines for
systematic reviews in the Cochrane collaboration back review group for
spinal disorders. Spine 1997;22:232330.
6. van Tulder M, Furlan A, Bombardier C, et al. Updated method guidelines for
systematic reviews in the Cochrane collaboration back review group. Spine
2003;28:1290 9.
7. Higgins J, Green S, eds. Cochrane Handbook for Systematic Reviews of
Interventions Version 5.0.0 [updated February 2008].The Cochrane Collaboration; 2008.
8. Glanville JM, Lefebvre C, Miles JN, et al. How to identify randomized
controlled trials in MEDLINE: ten years on. J Med Libr.Assoc 2006;94:
130 6.
9. Minozzi S, Pistotti V, Forni M. Searching for rehabilitation articles on MEDLINE and EMBASE: an example with cross-over design. Arch Phys Med
Rehabil 2000;81:720 2.
10. Sampson M, Barrowman NJ, Moher D, et al. Should meta-analysts search
Embase in addition to Medline? J Clin Epidemiol 2003;56:94355.

11. Woods D, Trewheellar K. Medline and Embase complement each other in


literature searches. BMJ 1998;316:1166.
12. Suarez-Almazor ME, Belseck E, Homik J, et al. Identifying clinical trials in
the medical literature with electronic databases: MEDLINE alone is not
enough. Control Clin.Trials 2000;21:476 87.
13. Day D, Furlan A, Irvin E, et al. Simplified search strategies were effective in
identifying clinical trials of pharmaceuticals and physical modalities. J Clin
Epidemiol 2005;58:874 81.
14. Avenell A, Handoll HH, Grant AM. Lessons for search strategies from a
systematic review, in The Cochrane Library, of nutritional supplementation
trials in patients after hip fracture. Am J Clin Nutr 2001;73:50510.
15. Bakkalbasi N, Bauer K, Glover J, et al. Three options for citation tracking:
Google Scholar, Scopus and Web of Science. Biomed Digit Libr 2006;3:7.
16. Golder S, McIntosh HM, Duffy S, et al. Developing efficient search strategies
to identify reports of adverse effects in MEDLINE and EMBASE. Health Info
Libr J 2006;23:312.
17. Furlan AD, Irvin E, Bombardier C. Limited search strategies were effective in
finding relevant nonrandomized studies. J Clin Epidemiol 2006;59:130311.
18. Deyo RA, Battie M, Beurskens AJ, et al. Outcome measures for low back
pain research: a proposal for standardized use. Spine 1998;23:200313.
19. Egger M, Zellweger-Zahner T, Schneider M, et al. Language bias in randomised controlled trials published in English and German. Lancet 1997;350:
326 9.
20. Egger M, Ebrahim S, Smith GD. Where now for meta-analysis? Int J Epidemiol 2002;31:15.
21. Gregoire G, Derderian F, Le Lorier J. Selecting the language of the publications included in a meta-analysis: is there a Tower of Babel bias? J Clin
Epidemiol 1995;48:159 63.
22. Juni P, Holenstein F, Sterne J, et al. Direction and impact of language bias in
meta-analyses of controlled trials: empirical study. Int J Epidemiol 2002;31:
11523.
23. Moher D, Pham B, Lawson ML, et al. The inclusion of reports of randomised
trials published in languages other than English in systematic reviews. Health
Technol.Assess 2003;7:190.
24. Pham B, Klassen TP, Lawson ML, et al. Language of publication restrictions
in systematic reviews gave different results depending on whether the intervention was conventional or complementary. J Clin Epidemiol 2005;58:
769 76.
25. Furlan AD, Tomlinson G, Jadad AA, et al. Methodological quality and
homogeneity influenced agreement between randomized trials and nonrandomized studies of the same intervention for back pain. J Clin Epidemiol
2008;61:209 31.
26. Colditz GA, Miller JN, Mosteller F. How study design affects outcomes in
comparisons of therapy. I: Medical. Stat Med 1989;8:44154.
27. Kunz R, Oxman AD. The unpredictability paradox: review of empirical
comparisons of randomised and non-randomised clinical trials. BMJ 1998;
317:118590.
28. Chalmers TC, Celano P, Sacks HS, et al. Bias in treatment assignment in
controlled clinical trials. N Engl J Med 1983;309:1358 61.
29. Miller JN, Colditz GA, Mosteller F. How study design affects outcomes in
comparisons of therapy. II: Surgical. Stat Med 1989;8:455 66.
30. Schulz KF, Chalmers I, Hayes RJ, et al. Empirical evidence of bias. Dimensions of methodological quality associated with estimates of treatment effects
in controlled trials. JAMA 1995;273:408 12.
31. Tierney JF, Stewart LA. Investigating patient exclusion bias in meta-analysis.
Int J Epidemiol 2005;34:79 87.
32. Boutron I, Moher D, Tugwell P, et al. A checklist to evaluate a report of a
nonpharmacological trial (CLEAR NPT) was developed using consensus.
J Clin Epidemiol 2005;58:1233 40.
33. Verhagen AP, de Vet HC, de Bie RA, et al. Balneotherapy and quality assessment: interobserver reliability of the Maastricht criteria list and the need for
blinded quality assessment. J Clin Epidemiol 1998;51:335 41.
34. van Tulder MW, Suttorp M, Morton S, et al. Empirical evidence of an
association between internal validity and effect size in randomized controlled
trials of low back pain. Spine 2009;34:168592.
35. Jadad AR, Moore RA, Carroll D, et al. Assessing the quality of reports of
randomized clinical trials: is blinding necessary? Control Clin Trials 1996;
17:112.
36. Berlin JA. Does blinding of readers affect the results of meta-analyses? University of Pennsylvania Meta-analysis Blinding Study Group. Lancet 1997;
350:185 6.
37. Hill CL, LaValley MP, Felson DT. Discrepancy between published report
and actual conduct of randomized clinical trials. J Clin Epidemiol 2002;55:
783 6.
38. Detsky AS, Naylor CD, ORourke K, et al. Incorporating variations in the
quality of individual randomized trials into meta-analysis. J Clin Epidemiol
1992;45:255 65.

2009 Updated Guidelines for Systematic Reviews Furlan et al 1941


39. Verhagen AP, de Vet HC, de Bie RA, et al. The art of quality assessment of
RCTs included in systematic reviews. J Clin Epidemiol 2001;54:651 4.
40. Normand SL. Meta-analysis: formulating, evaluating, combining, and reporting. Stat Med 1999;18:32159.
41. Whitehead A, Whitehead J. A general parametric approach to the metaanalysis of randomized clinical trials. Stat Med 1991;10:166577.
42. Sterne JA, Egger M, Smith GD. Systematic reviews in health care: investigating and dealing with publication and other biases in meta-analysis. BMJ
2001;323:1015.
43. Tang JL, Liu JL. Misleading funnel plot for detection of bias in metaanalysis. J Clin Epidemiol 2000;53:477 84.
44. Begg CB, Mazumdar M. Operating characteristics of a rank correlation test
for publication bias. Biometrics 1994;50:1088 101.
45. Egger M, Davey SG, Schneider M, et al, Bias in meta-analysis detected by a
simple, graphical test. BMJ 1997;315:629 34.
46. Sterne JA, Egger M. Funnel plots for detecting bias in meta-analysis: guidelines on choice of axis. J Clin Epidemiol 2001;54:1046 55.
47. Engels EA, Schmid CH, Terrin N, et al. Heterogeneity and statistical significance in meta-analysis: an empirical study of 125 meta-analyses. Stat Med
2000;19:170728.
48. Williamson PR, Smith CT, Hutton JL, et al. Aggregate data meta-analysis
with time-to-event outcomes. Stat Med 2002;21:333751.
49. Poole C, Greenland S. Random-effects meta-analyses are not always conservative. Am J Epidemiol 1999;150:469 75.
50. Hahn S, Williamson PR, Hutton JL, et al. Assessing the potential for bias in
meta-analysis due to selective reporting of subgroup analyses within studies.
Stat Med 2000;19:332536.
51. Atkins D, Best D, Briss PA, et al; GRADE Working Group. Grading quality
of evidence and strength of recommendations. BMJ 2004;328:1490.
52. Ostelo RW, Costa LO, Maher CG, et al. Rehabilitation after lumbar disc
surgery. Cochrane Database Syst Rev. 2008:CD003007.
53. Furlan AD, Imamura M, Dryden T, et al. Massage for low-back pain. Cochrane Database Syst Rev. 2008:CD001929.
54. Sahar T, Cohen M, Neeman V, et al. Insoles for prevention and treatment of
back pain. Cochrane Database Syst Rev. 2007:CD005275.
55. Gibson JNA, Waddell G. Surgical interventions for lumbar disc prolapse.
Cochrane Database Syst Rev. 2007:CD001350.
56. Guyatt GH, Sackett DL, Cook DJ. Users guides to the medical literature. II.
How to use an article about therapy or prevention. B. What were the results

57.
58.

59.

60.
61.
62.

63.

64.

65.

66.

67.

68.

69.

and will they help me in caring for my patients? Evidence-Based Medicine


Working Group. JAMA 1994;271:59 63.
Shekelle PG, Andersson G, Bombardier C, et al. A brief introduction to the
critical reading of the clinical literature. Spine 1994;19:2028S31S.
Malmivaara A, Koes BW, Bouter LM, et al. Applicability and clinical relevance of results in randomized controlled trials: the Cochrane review on
exercise therapy for low back pain as an example. Spine 2006;31:14059.
Ostelo RW, Deyo RA, Stratford P, et al. Interpreting change scores for pain
and functional status in low back pain: towards international consensus
regarding minimal important change. Spine 2008;33:90 4.
Bombardier C, Hayden J, Beaton DE. Minimal clinically important difference. Low back pain: outcome measures. J Rheumatol 2001;28:431 8.
Cohen J. Statistical Power analysis for the Behavioral Sciences. 1st ed. New
York,San Francisco,London: Academic Press; 1988:1 474.
Farrar JT, Young JP Jr, LaMoreaux L, et al. Clinical importance of changes
in chronic pain intensity measured on an 11-point numerical pain rating
scale. Pain 2001;94:149 58.
Pool JJ, Ostelo RW, Hoving JL, et al. Minimal clinically important change of
the Neck Disability Index and the Numerical Rating Scale for patients with
neck pain. Spine 2007;32:304751.
Stratford PW, Riddle DL, Binkley JM, et al. Using the Neck Disability Index
to make decisions concerning individual patients. Physiother Can 1999;
Spring:10719.
Airaksinen O, Brox JI, Cedraschi C, et al. Chapter 4. European guidelines for
the management of chronic nonspecific low back pain. Eur.Spine J 2006;
15(suppl 2):S192300.
van Tulder M, Becker A, Bekkering T, et al. Chapter 3. European guidelines
for the management of acute nonspecific low back pain in primary care.
Eur.Spine J 2006;15(suppl 2):S169 91.
Burton AK, Balague F, Cardon G, et al. Chapter 2. European guidelines for
prevention in low back pain: November 2004. Eur Spine J 2006;15(suppl
2):S136 68.
Chou R, Huffman LH. Nonpharmacologic therapies for acute and chronic
low back pain: a review of the evidence for an American Pain Society/
American College of Physicians clinical practice guideline. Ann Intern Med
2007;147:492504.
Chou R, Huffman LH. Medications for acute and chronic low back pain: a
review of the evidence for an American Pain Society/American College of
Physicians clinical practice guideline. Ann Intern Med 2007;147:50514.

You might also like