2018 ESMO Handbook of Interpreting Oncological Study Publications

ESMO HANDBOOK OF
INTERPRETING ONCOLOGICAL
STUDY PUBLICATIONS
ESMO HANDBOOK OF
INTERPRETING ONCOLOGICAL
STUDY PUBLICATIONS
Edited by
Mike Clarke
Northern Ireland Clinical Trials Unit and Northern Ireland
Methodology Hub, Queen’s University Belfast, Belfast, UK
Veronika Ballová
Kantonsspital Baden, Baden, Switzerland
Henk van Halteren

Admiraal de Ruijter Hospital, Goes, Netherlands
ESMO Press
First published in 2018 by ESMO Press.
© 2018 European Society for Medical Oncology
All rights reserved. No part of this book may be reprinted, reproduced, transmitted, or utilised
in any form by any electronic, mechanical, or other means, now known or hereafter invented,
including photocopying, microfilming, and recording, or in any information storage or retrieval
system, without written permission of the publisher or in accordance with the provisions of the
Copyright, Designs, and Patents Act 1988 or under the terms of any license permitting limited
copying issued by the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA
01923, USA (www.copyright.com/ or telephone 978-750-8400). Product or corporate names
may be trademarks or registered trademarks, and are used only for identification and explanation
without intent to infringe.
This book contains information obtained from authentic and highly regarded sources. Reprinted
material is quoted with permission and sources are indicated. A wide variety of references are
listed. Reasonable efforts have been made to publish reliable data and information, but the authors
and publisher cannot assume responsibility for the validity of all materials or for the consequence
of their use.
Although every effort has been made to ensure that drug doses, treatments, and other information
are presented accurately in this publication, the ultimate responsibility rests with the prescribing
physician. Neither the publisher nor the authors can be held responsible for errors or for any
consequences arising from the use of information contained herein. For detailed prescribing
information on the use of any product or procedure discussed herein, please consult the
prescribing information or instructional material issued by the manufacturer.
A CIP record for this book is available from the British Library.
ISBN: 978-88-941795-5-2
For orders, corporate sales, foreign rights, and reprint permissions, please contact:
ESMO Head Office
Guidelines and Publishing Department
Via Ginevra 4
6900 Lugano
Switzerland
Tel: +41 (0)91 973 19 00
www.esmo.org
Email: publishing@esmo.org
Printed through s | s | media limited, Rickmansworth, Hertfordshire, UK

Contents
Editors ix
Contributors xi
Reviewers xiii
Abbreviations xiv
Acknowledgements xvii
Introduction xviii
1 Risk Factors 1
Why Should Oncologists Worry About Risk Factors? 1
Measurement of Risk 1
Causation 2
Conclusions 10
Further Reading 10
References 10
2 Screening 13
Introduction 13
The Benefit of Screening 13
The Harms of Screening: Overdiagnosis 19
Cost-effectiveness of Breast Cancer Screening Programmes 22
Conclusions 24
Further Reading 24
References 25
3 Prognosis 29
Introduction 29
Factors Influencing Cancer Survival 29
v
Improvement Over Time 32
Prognosis and Survival 32
Cancer Prognosis in Trials Versus Observational Studies 36
Types of Bias 37
Conclusions 40
References 40
4 Cancer Registries 43
Introduction 43
Notification and Completeness of Cancer Registries 43
Minimal Data Set 45
Supplementary Items 47
Coding Rules 47
Follow-up 49
Epidemiological Studies with Cancer Registry Data 50
Quality of Care Studies with Cancer Registry Data 51
Conclusions 52
References 53
5 Drug Development (Including Phase I Trials) 55
Introduction 55
Strategies in Drug Development 56
Target Discovery Precedes Drug Discovery 57
Small Molecule Drug Discovery: Synthesis and Optimisation 58
Selection of a Drug: Preclinical Assays 60
Development of Anticancer Biologics 64
Towards Phase I Clinical Trials 65
Phase I Studies 66
Conclusions 68
Further Reading 69
References 69
6 Randomised Trials 71
Introduction 71
Using Systematic Reviews When Designing a Randomised Trial 72
Formulating the Question for a Randomised Trial 72
Eligibility Criteria 73
vi Contents
Outcome Selection 75
Sample Size 76
Randomising Patients 77
Blinding or Masking 81
Statistical Analysis 82
Reporting 82
Conclusions 83
Further Reading 84
References 84
7 Choice of Outcomes (Including Core Outcome Sets
and Surrogate Outcomes) 87
Outcomes (or Endpoints) 87
Primary and Secondary Endpoints 88
Core Outcome Sets 88
Categories of Outcomes 89
Surrogate Endpoints 91
Endpoint Definition 92
Most Common Individual Outcomes Used in
Oncology Clinical Trials 95
Conclusions 98
Further Reading 99
References 100
8 Statistical Issues (Including Subgroups,
Time-To-Event Analyses, Multiplicity) 104
Introduction 104
Are the Data Adequately Described? 105
Which Quantities Have Been Estimated? 109
Which Statistical Tests Have Been Performed? 110
How is the Type-I Error Controlled? 111
Is the Statistical Power Adequate? 112
Subgroup Analyses 115
Conclusions 117
Further Reading 117
References 118
Contents vii
9 Systematic Reviews: A Key to Support Evidence-Informed
Decision Making 119
Introduction 119
When is a Systematic Review Needed? 119
Formulating the Question 122
Defining Eligibility Criteria 122
Search Strategy 125
Study Selection 127
Assessing the Quality of the Studies and the Body of Evidence 127
Data Extraction from Studies 128
Synthesis 129
Conclusions 132
Further Reading and Resources 132
References 133
10 Clinical Research in Rare Cancers 135
Introduction 135
Challenges and Limitations in Clinical Research in Rare Cancers 136
Future Directions for Clinical Research in Rare Cancers 137
Conclusions 143
Further Reading 144
References 144
11 How to Become a Researcher 146
Introduction 146
Case Study 1: Marco 146
Case Study 2: Florence 151
Conclusions 153
Further Reading 154
Glossary 155
Index 171
viii Contents
Editors
Professor Mike Clarke

Professor/Director, MRC Methodology Hub,
Queen’s University Belfast, Belfast, Northern Ireland
Professor Mike Clarke has 30 years’ experience of rigorous evaluations

in health and social care, including numerous prospective studies and sys-
tematic reviews. In a career that started at the Clinical Trial Service Unit in
Oxford, he was Director of the UK Cochrane Centre from 2002 to 2011,
before his current post as Director of the Northern Ireland Methodology Hub
in Queen’s University Belfast. He is also Director of the Northern Ireland
Clinical Trials Unit and Co-ordinating Editor of the Cochrane Methodology
Review Group. He is a founder of Evidence Aid, improving access to evi-
dence for disasters and other humanitarian emergencies.
Mike has been actively involved in dozens of randomised trials, including
several with more than 1000 participants. He has provided detailed advice
for hundreds of other trials; used the individual participant data (IPD) from
more than a thousand in meta-analyses, and assessed reports for tens of thou-
sands more as part of initiatives to improve access to research.
Mike’s work on systematic reviews includes the Early Breast Cancer Trial-
ists’ Collaborative Group IPD reviews of randomised trials in breast cancer.
These have provided definitive evidence on the effects of treatments since
the 1980s, influencing the care of many millions of women worldwide.
Mike teaches widely about research, and established and continues to teach
the randomised trials and systematic reviews modules on the University of
Oxford’s MSc in Evidence Based Health Care.
ix
Dr Veronika Ballová
Senior Medical Oncologist
Kantonsspital Baden, Baden, Switzerland
Dr Veronika Ballova is a senior medical oncologist at the Onkologie

Kantonsspital in Baden, Switzerland. She graduated in medicine from
the Comenius University of Bratislava, Slovakia, in 1992, and completed
her specialist training in clinical oncology in 2001 at the National Cancer
Institute in Bratislava. In 2003 she also completed an ESMO fellowship at
the University Hospital in Cologne, Germany. Since then her career has
been mainly focused on haematological malignancies.
Veronika Ballova is an author of several papers published in peer-
reviewed international journals and has been an invited speaker at several
national meetings. She has also collaborated on international publica-
tions (books), as an author and editor.
Dr Henk van Halteren

Consultant in Medical Oncology
Admiraal de Ruijter Hospital, Goes, Netherlands
Dr Henk van Halteren is Consultant in Medical Oncology at the Admiraal

de Ruijter Hospital in the Southwest of the Netherlands. He graduated
in Internal Medicine in 2000 (University Medical Center St Radboud,
Nijmegen, the Netherlands) and completed his specialist training in medi-
cal oncology in 2001. His thesis, finalised in 2004, dealt with different
therapeutic aspects of colorectal cancer and cancer cachexia.
Until 2012, Henk van Halteren worked as a Consultant in Medical Oncol-
ogy in the Gelderse Vallei hospital (Ede, Netherlands). During that period
he participated with Wageningen University (division of Human Nutri-
tion) in research on the impact of obesity on cancer behaviour. Moreover,
he participated in pioneering research on chemotherapy-related bowel
toxicity and changes in microbiota.
x Editors
Contributors
Bellei M. Dipartimento di Medicina Diagnostica, Clinica e di Sanità

Pubblica, Università di Modena e Reggio Emilia, Modena, Italy
Brouwers M.C. Department of Oncology and Department of Health
Research Methods, Evidence and Impact, McMaster University,
Hamilton; Escarpment Cancer Research Institute, McMaster University
and Hamilton Health Sciences, Hamilton, Canada
Clarke M. Northern Ireland Clinical Trials Unit and Northern Ireland
Comber H. National Cancer Registry, Cork, Ireland
Constantinidou A. Medical School University of Cyprus and BoC
Oncology Centre, Nicosia, Cyprus
D’Incalci M. Department of Oncology, IRCCS-Istituto di Ricerche
Farmacologiche Mario Negri, Milan, Italy
de Koning H.J. Department of Public Health, Erasmus MC, University
Medical Center Rotterdam, Netherlands
Desar I.M.E. Department of Medical Oncology, Radboud University
Medical Centre, Nijmegen, Netherlands
Florez I.D. Health Research Methodology Program, McMaster
University, Hamilton, Canada; Department of Pediatrics, Universidad de
Antioquia, Medellin, Colombia
Fotia V. Department of Oncology, IRCCS-Istituto di Ricerche
Fuso Nerini I. Department of Oncology, IRCCS-Istituto di Ricerche
xi
Guida A. Dipartimento di Oncologia ed Ematologia, Azienda
Ospedaliero-universitaria Policlinico di Modena, Modena, Italy
Hoster E. Department of Internal Medicine III, University Hospital
Munich; IBE - Institute for Medical Information Processing, Biometry
and Epidemiology, Ludwig-Maximilians-University Munich, Munich,
Germany
Lemmens V.E.P.P. Department of Research, Netherlands Comprehensive
Cancer Organisation (IKNL), Utrecht, Netherlands
Levine O. Department of Oncology, McMaster University, Hamilton,
Canada
Ocana A. Department of Medical Oncology and Translational Research
Unit, Albacete University Hospital, Albacete, Spain
Sankatsing V.D.V. Department of Public Health, Erasmus MC,
University Medical Center Rotterdam, Netherlands
Tannock I.F. Division of Medical Oncology & Hematology, Princess
Margaret Cancer Centre, Department of Medicine, University of Toronto,
Toronto, Canada
Templeton A.J. Department of Medical Oncology, St. Claraspital Basel
and Faculty of Medicine, University of Basel, Basel, Switzerland
van der Graaf W.T.A. Department of Medical Oncology, Radboud
University Medical Centre, Nijmegen, Netherlands; The Institute of
Cancer Research & The Royal Marsden Hospital, Sutton, UK
van Erning F.N. Department of Research, Netherlands Comprehensive
Visser O. Department of Registration, Netherlands Comprehensive
Vissers P.A.J. Department of Research, Netherlands Comprehensive
xii Contributors
Reviewers
We would like to thank Dr Emiliano Calvo and all the authors for their
time spent reviewing the chapters.
xiii
Abbreviations
AACR American Association for Cancer Research

ADME Absorption, distribution, metabolism and excretion
AR Adaptive randomisation
ASCO American Society of Clinical Oncology
AUC Area under the curve
BOR Best overall response
CENTRAL Cochrane Central Register of Controlled Trials
CI Confidence interval
CLL Chronic lymphoid leukaemia
Cmax Maximum serum concentration
CMF Cyclophosphamide, methotrexate and 5-fluorouracil regimen
COMET Core Outcome Measures in Effectiveness Trials
COS Core outcome set
CR Complete response
CR30 Complete response at 30 months
CRC Colorectal cancer
CT Computed tomography
DCIS Ductal carcinoma in situ
DCO Death certificate only
DFS Disease-free survival
DoR Duration of response
DSS Disease-specific survival
ECCO European Cancer Organisation
ECOG Eastern Cooperative Oncology Group
EFS Event-free survival
EFS24 Event-free survival at 24 months
EMA European Medicines Agency
ENCR European Network of Cancer Registries
EORTC European Organisation for Research and Treatment of Cancer
ESMO European Society for Medical Oncology
EU European Union
xiv
FDA Food and Drug Administration
FDG-PET Fludeoxyglucose–positron emission tomography
FLASH Follicular Lymphoma Analysis of Surrogacy Hypothesis
GCP Good Clinical Practice
G-CSF Granulocyte-colony stimulating factor
GEMMs Genetically engineered mouse models
GI Gastrointestinal
GLP Good Laboratory Practice
GMP Good Manufacturing Practice
GRADE Grading of Recommendations, Assessment, Development
and Evaluation
HBV Hepatitis B virus
HIPEC Hyperthermic intraperitoneal chemotherapy
HR Hazard ratio
HR-QoL Health-related quality of life
HRT Hormone replacement therapy
HTS High throughput screening
IACR International Association of Cancer Registries
IARC International Agency for Research on Cancer
ICD-O International Classification of Diseases for Oncology
LMS Leiomyosarcoma
MAMS Multi-arm, multistage
MISCAN MIcrosimulation SCreening ANalysis
mRCC Metastatic renal cell carcinoma
MRD Minimal residual disease
MTD Maximum tolerated dose
NCI National Cancer Institute
NNS Number needed to screen
NSCLC Non-small cell lung cancer
OBD Optimum biological dose
OR Odds ratio
Abbreviations xv
ORR Objective/overall response rate
OS Overall survival
PET Positron emission tomography
PFS Progression-free survival
PICO Population, Interventions or exposures, Comparators and
Outcomes
PORT Postoperative radiotherapy
PPR Prior pelvic irradiation
PR Partial response
PROM Patient-Reported Outcome Measures
PULA Previously Untreated, Locally Advanced Task Force
PVNS Pigmented villonodular synovitis
RCT Randomised controlled trial
RoB Risk of Bias
RR Relative risk, Risk ratio
SD Standard deviation
SLL Small lymphocytic lymphoma
STROBE Strengthening the Reporting of Observational Studies in
Epidemiology
STS Soft tissue sarcoma
T1/2 Half-life
TNM Tumour, Node, Metastasis classification
TTF Time to treatment failure
TTP Time to progression
UICC Union for International Cancer Control
US United States
VEGF(R) Vascular endothelial growth factor (receptor)
WHO World Health Organization
xvi Abbreviations
Acknowledgements
This book is the result of the effort, work and experience of many
people. We would like to thank the ESMO Publishing Working Group,
in particular Prof Michele Ghielmini and Dr Raffaele Califano for
supporting the realisation of this book.
We would also like to thank Nicki Peters, Claire Bramley and

Matthew Wallace of ESMO for their support, assistance and patience
with the preparation of this publication.
Above all, we would like to thank all the authors and reviewers who
enabled us to make this book a reality.
Henk van Halteren

Veronika Ballová
Mike Clarke
xvii
Introduction
What you need to know...

a methodologist’s insight
Clinicians making decisions about the care of cancer patients need

many types of information for meaningful discussion with patients
and their families. Some of this information comes from training and
experience, some from knowledge of the patient’s preferences and
values, some from tests and investigations and some from research.
We have prepared this handbook to help with the latter, with an
emphasis on quantitative research. From studies that investigate the
causes of cancer, to its prognosis and treatment, each chapter focuses
on a specific type of study, its design and its value, giving examples
to show how the studies are done and how their findings are reported.
The handbook begins with the starting point for cancer: risk factors.
A cancer clinician may be less concerned about risk factors than
practitioners and policy makers working in public health and the
prevention of cancer. However, it is important that they understand
studies investigating risk factors if they are to discuss the possibilities
of the patient developing a second cancer and to provide advice to the
patient, family members and others who wish to reduce their cancer
risk. This issue also allows us to introduce the reader to the classic
designs of epidemiology: cohort and case-control studies.
Next, we tackle the research on identifying effective ways to screen
for cancer. Early diagnosis of tumours and, therefore, the opportunity
to start treatment early, can lead to improved survival and may avert
cancer deaths. But screening can also have adverse effects, cause
harm and use up resources. We discuss how the relevant research is
xviii
done to enable the reader to engage in decision making about the
implementation or modification of population-level screening.
The chapter on cancer prognosis focuses on the research that can help
to predict a patient’s life expectancy after being diagnosed with cancer.
It provides a guide to the interpretation of the findings for various
methods of assessing prognosis and possible biases. It leads onto a
chapter which shows how reports from population-based registries
might be used when making decisions about cancer.
The next chapters bring the reader into the realm of cancer treatment.
We begin with the types of research used for drug development, to
identify drugs that might go on to be tested in late-stage clinical trials.
Randomised trials are the fundamental study design for comparing the
effects of a new treatment with those of current practice, or for comparing
multiple treatments simultaneously. However, to provide information on
both the beneficial and harmful effects in ways that will help clinicians
and patients make well-informed decisions, researchers need to pay
particular attention to the choice of outcomes for their clinical trials.
Having measured these outcomes, researchers need to use appropriate
methods to analyse them. This handbook includes a guide to some basic
concepts around commonly-used statistical methods, knowledge of
which is essential when interpreting the reports of cancer studies.
Finally, when trying to cope with the vast amount of research into the
effects of treatments for common cancers, clinicians are likely to need
to rely heavily on systematic reviews. These reviews help avoid the
biases that might come from focusing on the findings of a single study
and maximise the power of existing research by providing a summary
of what might be a very large body of research. On the other hand, for
some cancers, studies are likely to be small and few. The challenges of
clinical research in rare cancers can mean that treatment is commonly
based on insufficient evidence. Therefore, the penultimate chapter
discusses the interpretation of current research and describes novel
approaches for trials that are most appropriate for rare cancers.
Introduction xix
We conclude with a discussion of pathways that clinicians might follow
if they would like to become researchers themselves and become
providers of the high-quality research evidence that is needed to
understand cancer better and to treat the disease more effectively.
The handbook can be read from beginning to end, or a clinician
might choose to read specific individual chapters when considering a
particular issue. We hope that all readers will find that the handbook
helps them use research as part of the evidence base for discussions,
decisions and choices about the care of patients with cancer.
Mike Clarke, DPhil
...and why you need to understand it.

the oncologists’ perspective
In the past decade the scientific world of oncology has changed

considerably. The number of new interventions evaluated has greatly
increased and information on study results is disseminated via an
expanding number of peer-reviewed journals and congress reports.
International and national guidelines committees strive to keep
guidelines up to date, but, due to the dynamics of the process, there is a
lack of time to safeguard guideline prerequisites, such as education and
guideline adherence. Practically, it is becoming increasingly difficult to
achieve consensus on the benefit of all new oncological interventions.
The definition of clinical benefit is also changing. In the past,
improvement in overall survival (ideally accompanied by a perceived
improvement in quality of life) was regarded as clinical benefit. But due
to the increase in the number of sequential treatments, endpoints such
as progression-free survival are now frequently regarded as surrogate
markers of clinical benefit. Another worrying issue is the threat of
publication bias: good news sells better. The chance of a manuscript
being published is higher, as well as the chance of a big podium at
xx Introduction
a scientific congress. Furthermore, 79% of trials registered in the
European Clinical Trials Database (EudraCT)* are commercially funded
by an industry which sells products.
Due to these phenomena, each practising oncologist should refrain
from passively absorbing the flood of new scientific information. This
handbook enables all of us to read between the lines of a scientific
publication and to better estimate the true benefit of a new oncological
intervention. In this way we can better find the balance between
treatment benefit and treatment hazard for our patients and keep cancer
care affordable.
We are indebted towards Professor Mike Clarke, a great and
enthusiastic teacher, who was willing to coordinate the scientific
content of this handbook.
Please enjoy reading and learning.
Henk van Halteren, MD
Veronika Ballová, MD
*
EudraCT Public Web Report for December 2017. European Medicines Agency, 2018.
Available from: https://eudract.ema.europa.eu/statistics.html (13 February 2018, date last accessed).
Introduction xxi
Glossary
Throughout this handbook you will find several terms highlighted in
blue, which are related to clinical trials. If you are not familiar with
the terminology and would like a brief explanation, please refer to the
glossary section starting on page 155.
xxii
Risk Factors
H. Comber
1
National Cancer Registry, Cork, Ireland
In epidemiology, a risk factor, or exposure, is an event, condition

or characteristic which modifies the risk of an event or outcome. The
relationship between exposure and outcome is the effect of the exposure.
Why Should Oncologists Worry About Risk Factors?

When a patient has been diagnosed with cancer, the risk factors that caused
it might not be of great importance to the oncologist who is treating her.
However, it is still important to know about the types of study that investi-
gate risk factors, not least because improved survival and life expectancy
of cancer patients have led to an increase in the risk of second cancers
(Oeffinger et al, 2013), partly due to treatment effects (Kamran et al, 2016;
Morton et al, 2014) and partly due to the risk factors that were responsi-
ble for the first cancer (Berrington de Gonzalez et al, 2011). Addressing
behavioural risk factors may reduce subsequent risk for the patient (Khuri
et al, 2001) and family members may also seek information on reducing
their cancer risk (Bottorff et al, 2015; Howell et al, 2013; Radecki Breit-
kopf et al, 2014). Furthermore, all physicians have a responsibility to give
advice that might prevent ill health, and to be aware of the strengths and
limitations of the evidence supporting this advice.
Measurement of Risk
Risk is defined as the number of events divided by the number of people
at risk. When measured over a specified period of time, it is described
as the incidence rate. Differences in risk due to an exposure may be
expressed as a ratio or a difference.
1
Risk Number of events/number of people at risk
Risk ratio Risk of exposed/risk of unexposed
■ measures the strength of the effect
■ is independent of the population risk
Risk difference Risk of exposed - risk of unexposed
■ describes the number of additional cases due to
the exposure
Excess or attributable risk (Parkin, 2011; Whiteman et al, 2015)
■ is the difference in the risk of a condition between
an exposed population and an unexposed population
Risk ratio and risk difference

In a study of hormone replacement therapy (HRT) (Jones et al, 2016),
500 out of 20 114 non-users and 52 out of 1612 users of combined
HRT developed breast cancer (Table 1). The risk to users was 3.6%
and to non-users 2.5%, giving a risk ratio of 1.30 (i.e. the risk to users
was 30% greater). The difference in risk was 0.74%, equivalent to 12
(1612 × 0.74%) additional cases of cancer in the 1612 users.
Table 1 Relative Risk of Postmenopausal Breast Cancer, by Type of HRT Preparation

From Jones ME, Schoemaker MJ, Wright L, et al. Menopausal hormone therapy and breast cancer: what is
the true size of the increased risk? Br J Cancer 2016; 115:607-615.
All women Cases Risk (%)
Non-users 20 114 500 500/20 114=2.5%
Oestrogen/progestogen HRT 1612 52 52/1612=3.2%
Risk ratio (3.2%/2.5%) 1.30
Risk difference (3.2%–2.5%) 0.74%
Causation
Risk factor epidemiology tries to separate the effects of the exposure
being investigated from all other exposures. This is important because
cancer may develop following a series of different exposures over a long
period, so the identification of all possible exposures is challenging.
2 Comber
Establishing Causation
Study conditions in epidemiology are difficult to control, so a single
study is rarely definitive, and evidence of causation depends on accu-
mulated evidence. Interpretation of this evidence may be controversial.
Mobile phones and brain cancer

The INTERPHONE (INTERPHONE Study Group, 2010) and other
large studies (Benson et al, 2013) have produced strong evidence that
there is no association between mobile phone use and brain cancer,
but controversy continues concerning a range of methodological
issues (Lagorio and Röösli, 2014; Morgan et al, 2015).
The epidemiologist Bradford Hill (Hill, 1965) proposed certain aspects

of a study which suggest causation (Table 2).
Table 2 Bradford Hill’s Criteria for Causation

• Strength: An exposure which increases the risk of the outcome by 5% is less convincing than one which
doubles it
• Consistency: Has the association been repeatedly observed in different places, circumstances and times?
• Specificity: Is the association limited to particular sites and types of disease?
• Temporality: Does the exposure precede the outcome?
• Biological gradient: Does the association show a dose–response curve?
• Plausibility: Is the causation biologically plausible?
• Coherence: This is related to plausibility – does the effect cohere with the generally known facts of the
natural history and biology of the disease?
• Experiment: If some preventive action is taken, does it in fact prevent the outcome?
• Analogy: Has a similar exposure been shown to be associated with a similar outcome?
Study Design
Cancer risk factors are often suggested by observing variation in cancer
incidence or mortality between populations differentiated by geography,
time, occupation or other characteristics. Hypotheses developed from
these observations are tested in analytical studies. These are typically
cohort or case-control studies, but sometimes a randomised trial (see
Chapter 6) might be used.
Risk Factors 3
Types of Epidemiological Study
Table 3 Advantages and Disadvantages of Different Study Types
Study type Advantages Disadvantages
Cohort study Clear sequence of events Large numbers of participants needed with long
Risk can be measured follow-up period, so expensive and often slow
Low risk of selection bias New exposures difficult to add
Loss to follow-up
Change in exposure status during study
Risk of confounding
Randomised trial Clear sequence of events Large numbers of participants needed with long
Risk can be measured follow-up period, so expensive and often slow
Low risk of bias or confounding New exposures difficult to add
Loss to follow-up
Change in exposure status during study
Ethical issues
Case-control study Relatively small number of Risk cannot be calculated
participants needed Prone to selection bias, recall bias and
Disease objectively confirmed confounding
No follow-up period needed; no Limit to exposures studied
drop-outs Difficult to acquire biological samples
Cohort studies
A cohort is a group of people followed over a period, some of whom will
have the exposure of interest and some of whom will have the outcome
of interest. Participants are assessed for many exposures in addition to
that under investigation and often have biological samples taken. For
rare exposures, it is necessary to find cohorts with a high prevalence
of exposure, such as occupational groups (Kachuri et al, 2016), while
general population cohorts are used for more common exposures
(Riboli, 2001). A randomised trial can be thought of as a type of cohort
study where the exposure is randomly assigned by the researcher. Field
trials are the custom in cancer epidemiology, where participants in the
community are randomised, either individually or by group (e.g. by area
of residence or clinic attended).
4 Comber
The Gambia Hepatitis Intervention Study (The Gambia Hepatitis Study
Group, 1987)
The Gambia Hepatitis Intervention Study is a large-scale study of
the prevention of liver cancer by hepatitis B (HBV) vaccination of
young infants. The latest estimates (Viviani et al, 2008) indicate that
the number of cases needed to detect a significant difference between
vaccinated and unvaccinated groups will be reached when subjects are
around 30 years old, between 2017 and 2020.
Case-control studies
Case-control studies begin with identified cases of cancer whose expo-
sures are compared to those of a group of people without cancer (con-
trols). Both groups are drawn from the same source population. The
source population may be patients attending a hospital or clinic, the
population of a region or other defined population. The control group
is chosen at random from this source population. Sometimes, cases and
controls are drawn from an existing cohort. This would be a nested case-
control study which provides better quality information on exposures.
Sources of Error in Risk Factor Studies

The errors which occur in studies of causation are of two kinds: system-
atic and random.
■ Systematic error is unaffected by study size
■ Random error decreases with increasing study size
Systematic error
Systematic errors are divided into bias and confounding.
■ Bias can be considered as an error in the conduct of a study (selection
bias, measurement bias)
■ Confounding is an error in study design or interpretation of study results
Risk Factors 5
Bias
Selection bias. Selection bias occurs when the exposed and unexposed
populations differ in ways (other than the exposure) which affect the out-
come. Selection bias can give rise to the ‘healthy worker’ effect, where
the effect of an occupational exposure is countered by the overall better
health of those in active work (Zielinski et al, 2009). Selection bias may
also occur if participants volunteer for the study for reasons related to the
exposure, e.g. interest in a healthy lifestyle.
Bias is difficult to avoid in the selection of the controls for case-control
studies. They may be chosen from patients with non-cancer conditions
attending the same hospital or from people living in the same area or
attending the same family doctor, and so may have risk factors in com-
mon with cases.
Measurement bias. Exposure measurement: Bias in recall of self-
reported exposures is common in case-control studies. Bias may be
differential between cases and controls, as patients with cancer are more
likely to recall a specific exposure, or it may be non-differential, due to
under-reporting of factors such as alcohol and tobacco intake. Differ-
ential bias may lead to over- or under-estimation of the effect, but non-
differential bias will always lead to under-estimation. Where possible,
self-reported exposures should be independently validated.
Outcome measurement: Bias in outcome measurement is uncommon
in cancer epidemiology, although cancer diagnoses may be missed in
cohorts for which the follow-up is inefficient. Overdiagnosis, or earlier
diagnosis, may occur in cohorts where the exposed participants are more
intensively monitored.
Confounding
Confounding is a common source of error in interpretation. A confounder
is something which affects the outcome but not the exposure of interest,
and is correlated with the exposure. For instance, heavy drinkers tend to
smoke, which means that high alcohol consumption is associated with,
but does not cause, lung cancer. Smoking is therefore a confounder of
6 Comber
the relationship between alcohol and lung cancer. Confounding occurs
frequently in cancer studies, due to the large number of potential carci-
nogenic exposures. While bias can be minimised by adherence to good
study design and practice, minimising confounding requires a thorough
knowledge, measurement and analysis of potential exposures and is usu-
ally part of study analysis as well as design.
Random Error
The relation between exposure and outcome is unpredictable at the indi-
vidual level, and measures of effect in individuals will be randomly dis-
tributed around some best estimate (e.g. an average). The usual meas-
ure for showing the scatter around the estimate is the 95% confidence
interval. There are various interpretations of this interval, but in practice
it is used to test if the data are consistent with some hypothesis (see
also Chapter 8). Random error reduces with study size but can also be
reduced by study design and conduct and by having a homogeneous
study population.
Statistical Testing
Statistical testing determines how consistent the measured effect is
with a hypothesised effect (see Chapter 8). The hypothesis is usually
that there is no effect, or that there is no difference between two effects
(null hypothesis). Conventionally, if the 95% confidence intervals of the
measured effect do not overlap those associated with the null hypothesis,
it is considered that there is a real effect. Confidence intervals are more
informative than probabilities (p-values) which give little information
about the underlying data.
Risk ratios and odds ratios are conventionally presented as unadjusted
and adjusted. The unadjusted ratio is the simple risk ratio or odds ratio
(risk exposed/risk unexposed). On the other hand, an adjusted ratio
arises from statistical models which allow for the effects of other vari-
ables and confounders (e.g. age, sex, smoking, body mass index) which
may affect the risk. Table 4 shows an example of unadjusted and adjusted
ratios and their confidence intervals.
Risk Factors 7
Table 4 Unadjusted and Adjusted Odds Ratios and 95% Confidence Intervals for
Colorectal Cancer Risk Associated With Duration of Observed Insulin Exposure
From Yang YX, Hennessy S, Lewis JD. Insulin therapy and colorectal cancer risk among type 2 diabetes
mellitus patients. Gastroenterology 2004; 127:1044-1050. Copyright © 2004. Reprinted with permission
from the American Gastroenterological Association.
Cases Controls Unadjusted odds ratio Adjusted odds ratio

(95% confidence interval) (95% confidence interval)*
No insulin therapy 107 (83.6) 1084 (87.5) 1.0 1.0
(reference)
≥5 years of insulin use 4 (3.1) 15 (1.2) 2.8 (0.9–8.5) 4.7 (1.3–16.7)
*Adjusted for sex and 7 other variables.
Interpretation
How important is the effect? Two factors determine the clinical impor-
tance of an effect:
■ The size of the effect
■ The frequency of occurrence of the exposure
Large effects, even with wide confidence intervals, should not be ignored
if they fulfil criteria of plausibility. Small, statistically significant effects
are common in large studies, but may be artefactual. However, small
effects with high exposure prevalence may have public health importance.
Where the background risk is low, risk difference is more informative
than risk ratio, because the risk ratio may exaggerate the importance of
an effect. The STROBE (Strengthening the Reporting of Observational
Studies in Epidemiology) initiative has produced a detailed guide on the
reporting and interpretation of observational studies (Vandenbroucke et
al, 2007), which describes how these studies should be reported.
Representativeness
Studies of cancer risk factors are investigations of aetiology, which are
presumed to have a biological basis. Although there may be differences
in susceptibility between populations, the effects of risk factors are usu-
ally similar in all populations. Good study design is therefore more
important (Doll et al, 2004) than the issue of whether the participants are
representative of the wider population.
8 Comber
Publication Bias
Many initial studies of risk are small and poorly designed. If they test
a novel hypothesis, they are less likely to be published if they fail to
support this hypothesis. If published, they are likely to be followed by
larger studies, which are more likely to be published. Small negative
studies of risk tend to be under-reported, leading to bias in reviews and
meta-analysis. Figure 1(a) shows the forest plot of a meta-analysis (see
Chapter 9) of the risk of prostate cancer in first-degree relatives of pros-
tate cancer patients (Bruner et al, 2003). Figure 1(b) shows a funnel plot
of the same data. The vertical dashed line indicates the weighted aver-
age, around which individual studies should be symmetrically grouped.
The smaller studies (at the bottom) are skewed to the right, suggesting
that smaller negative studies were less likely to be published, causing
publication bias.
a b
Brothers (black) or fathers (grey) Funnel plot
Prostate cancer in first-degree relatives
X Andersson et al, 1996

Aprikian et al, 1995
Bratt et al, 1999
Ghadirian et al, 1991
Ghadirian et al, 1997
Glover et al, 1998
Hayes et al, 1995
Isaacs et al, 1995
Keetch et al, 1995
Study reference
Lesko et al, 1996

Size of study
Lightfoot et al, 2000

McCahy et al, 1996
Spitz et al, 1991
Steinberg et al, 1990
Whittemore et al, 1995
Bratt et al, 1997
Cerhan et al, 1999
Goldgar et al, 1994
X Gronberg et al, 1996
Gronberg et al, 1999
Kalish et al, 2000
Narod et al, 1995
Rodriguez et al, 1998
Schuurman et al, 1999
McCahy et al, 1996

Summary
0.5 1 2 5 10 20 50 0.5 1 2 3 4 5
Relative risk Log Relative risk
Figure 1 (a) Relative risks of prostate cancer in men with a history of prostate cancer
in a first-degree relative. (b) Funnel plot for first-degree relatives. The circles represent
the estimates of the log relative risk for each study and the horizontal lines are 95%
confidence intervals.
From Bruner DW, Moore D, Parlanti A, et al. Relative risk of prostate cancer for men with affected relatives:
systematic review and meta-analysis. Int J Cancer 2003; 107:797-803. By permission of John Wiley and Sons.
Risk Factors 9
Conclusions
While the European Code Against Cancer (International Agency for
Research on Cancer, 2017) has only 12 proven recommendations for
action to reduce risk, a PubMed search for ‘cancer prevention/risk fac-
tors’ yields over 130 000 citations. This prompts the question: how, and
why, should a busy clinician deal with all this evidence? It is tempting to
wait for consensus to be summarised in systematic reviews and meta-
analyses (see Chapter 8). However, these vary in quality, may not be up
to date and should not be regarded as a substitute for critical reading
of key reference papers. Guidelines and checklists help in making an
assessment of the evidence, but it is also important to assess the practical
importance of the findings. Many ‘positive’ reports turn out to have little
practical impact in the real world. It is the responsibility of all cancer cli-
nicians to give cancer prevention advice, but to be aware of the strengths
and limitations of the evidence.
Declaration of Interest:
Dr Comber has reported no conflict of interest.
Further Reading
Coggen D, Rose G, Barker DJP. Chapter 1: What is epidemiology? Epidemiology
for the uninitiated, 4th edition. http://www.bmj.com/about-bmj/resources-
readers/publications/epidemiology-uninitiated/ (23 January 2018, date last
accessed)
Dos Santos Silva I. Cancer epidemiology: principles and methods.
https://www.iarc.fr/en/publications/pdfs-online/epi/cancerepi/CancerEpi.pdf.
Rothman K. Epidemiology—An Introduction, 2nd edition. London: Oxford Uni-
versity Press, 2012 (23 January 2018, date last accessed).
Vandenbroucke JP, von Elm E, Altman DG, et al; STROBE Initiative. Strength-
ening the Reporting of Observational Studies in Epidemiology (STROBE):
explanation and elaboration. PLoS Med 2007; 4:e297.
References
Benson VS, Pirie K, Schüz J, et al. Mobile phone use and risk of brain neoplasms
and other cancers: prospective study. Int J Epidemiol 2013; 42:792–802.
10 Comber
Berrington de Gonzalez A, Curtis RE, Kry SF, et al. Proportion of second can-
cers attributable to radiotherapy treatment in adults: a cohort study in the US
SEER cancer registries. Lancet Oncol 2011; 12:353–360.
Bottorff JL, Robinson CA, Sarbit G, et al. A motivational, gender-sensitive
smoking cessation resource for family members of patients with lung cancer.
Oncol Nurs Forum 2015; 42:363–370.
Bruner DW, Moore D, Parlanti A, et al. Relative risk of prostate cancer for men
with affected relatives: systematic review and meta-analysis. Int J Cancer
2003; 107:797–803.
Doll R, Peto R, Boreham J, Sutherland I. Mortality in relation to smoking: 50
years’ observations on male British doctors. BMJ 2004; 328:1519.
Hardell L, Carlberg M, Söderqvist F, Mild KH. Case-control study of the asso-
ciation between malignant brain tumours diagnosed between 2007 and 2009
and mobile and cordless phone use. Int J Oncol 2013; 43:1833–1845.
Hill AB. The environment and disease: association or causation? Proc R Soc
Med 1965; 58:295–300.
Howell LA, Brockman TA, Sinicrope PS, et al. Receptivity and preferences in
cancer risk reduction lifestyle programs: a survey of colorectal cancer family
members. J Behav Health 2013; 2:279–290.
International Agency for Research on Cancer, European Commission. European
Code Against Cancer. Available from: http://cancer-code-europe.iarc.fr
(24 January 2018, date last accessed).
INTERPHONE Study Group. Brain tumour risk in relation to mobile telephone
use: results of the INTERPHONE international case-control study. Int J Epi-
demiol 2010; 39:675–694.
Jones ME, Schoemaker MJ, Wright L, et al. Menopausal hormone therapy and
breast cancer: what is the true size of the increased risk? Br J Cancer 2016;
115:607–615.
Kachuri L, Villeneuve PJ, Parent MÉ, et al, Canadian Cancer Registries Epide-
miology Research Group. Workplace exposure to diesel and gasoline engine
exhausts and the risk of colorectal cancer in Canadian men. Environ Health
2016; 15:4.
Kamran SC, Berrington de Gonzalez A, Ng A, et al. Therapeutic radiation and
the potential risk of second malignancies. Cancer 2016; 122:1809–1821.
Khuri FR, Kim ES, Lee JJ, et al. The impact of smoking status, disease stage,
and index tumor site on second primary tumor incidence and tumor recur-
rence in the head and neck retinoid chemoprevention trial. Cancer Epidemiol
Biomarkers Prev 2001; 10:823–829.
Lagorio S, Röösli M. Mobile phone use and risk of intracranial tumors: a consist-
ency analysis. Bioelectromagnetics 2014; 35:79–90.
Risk Factors 11
Morgan LL, Miller AB, Sasco A, Davis DL. Mobile phone radiation causes
brain tumors and should be classified as a probable human carcinogen (2A)
(Review). Int J Oncol 2015; 46:1865–1871.
Morton LM, Onel K, Curtis RE, et al. The rising incidence of second cancers:
patterns of occurrence and identification of risk factors for children and
adults. Am Soc Clin Oncol Educ Book 2014; e57–e67.
Oeffinger KC, Baxi SS, Novetsky Friedman D, Moskowitz CS. Solid tumor sec-
ond primary neoplasms: who is at risk, what can we do? Semin Oncol 2013;
40:676–689.
Parkin DM. 1. The fraction of cancer attributable to lifestyle and environmental
factors in the UK in 2010. Br J Cancer 2011; 105 Suppl 2:S2–S5.
Radecki Breitkopf C, Asiedu GB, Egginton J, et al. An investigation of the colo-
rectal cancer experience and receptivity to family-based cancer prevention
programs. Support Care Cancer 2014; 22:2517–2525.
Riboli E. The European Prospective Investigation into Cancer and Nutrition
(EPIC): plans and progress. J Nutr 2001; 131:170S–175S.
The Gambia Hepatitis Study Group. The Gambia Hepatitis Intervention Study.
Cancer Res 1987; 47:5782–5787.
Vandenbroucke JP, von Elm E, Altman DG, et al. Strengthening the Reporting of
Observational Studies in Epidemiology (STROBE): explanation and elabo-
ration. PLOS Med 2007; 4:e297.
Viviani S, Carrieri P, Bah E, et al. 20 years into the Gambia Hepatitis Interven-
tion Study: assessment of initial hypotheses and prospects for evaluation of
protective effectiveness against liver cancer. Cancer Epidemiol Biomarkers
2008; 17:3216–3223.
Whiteman DC, Webb PM, Green AC, et al. Cancers in Australia in 2010 attrib-
utable to modifiable factors: summary and conclusions. Aust N Z J Public
Health 2015; 39:477–484.
Yang YX, Hennessy S, Lewis JD. Insulin therapy and colorectal cancer risk among
type 2 diabetes mellitus patients. Gastroenterology 2004; 127:1044–1050.
Zielinski JM, Garner MJ, Band PR, et al. Health outcomes of low-dose ionizing
radiation exposure among medical workers: a cohort study of the Canadian
national dose registry of radiation workers. Int J Occup Med Environ Health
2009; 22:149–156.
12 Comber
Screening
V.D.V. Sankatsing
H.J. de Koning
2
Department of Public Health, Erasmus MC,
University Medical Center Rotterdam, Netherlands
Introduction
Screening for cancer in an asymptomatic population can lead to early diag-
nosis of tumours and, therefore, to earlier treatment of cancer. Early detec-
tion and treatment can result in improved survival and may avert cancer
deaths, but screening can also have adverse effects and cause harm.
Therefore, it is important that decision makers have access to reliable
research on the effects of screening. In terms of prolonging survival,
this is achieved through randomised trials (see Chapter 6) of mortality
due to the specific cancer. Ethical or time- and cost-related issues may,
however, not always render such trials feasible. Furthermore, the benefit
of screening as shown in a controlled study setting may differ from the
effect of population-based screening. Therefore, observational studies,
although more prone to biases than randomised trials, can help assess the
effects of population-based screening programmes.
This chapter discusses the assessment of population-level cancer screen-
ing by randomised trials and observational studies, and the influence of
potential biases on estimates of the effects. Furthermore, this chapter
elaborates on the cost-effectiveness of organised screening programmes.
The Benefit of Screening

Participants in a typical randomised trial are randomly allocated to either
the intervention or the control group. Randomised trials are thereby
designed to avoid confounding at baseline, due to both observable char-
acteristics and unknown factors, creating comparable intervention and
13
control groups. In a randomised trial of breast cancer screening (mam-
mography), the estimate for the mortality reduction due to screening in the
intervention group is based both on women who are actually screened and
on women (allocated to the intervention group) who decline the invitation
to screening. This is the intention to treat or intention to screen principle.
Ten randomised trials of mammography screening were conducted in the
1970s and 1980s (Alexander et al, 1999; Miller et al, 1992a; Miller et al,
1992b; Nyström et al, 2002; Shapiro et al, 1966). A meta-analysis of these
trials showed a combined relative risk (RR) of 0.81 (95% confidence
interval [CI]: 0.74–0.87) (Gøtzsche and Jørgensen, 2013).
Estimates from the randomised trials reflect breast cancer mortality reduc-
tion as a result of screening in a controlled study setting rather than in a
population-based screening setting. In addition, the randomised trials of
mammography screening were conducted more than 20 years ago. Ques-
tions about the relevance of the trials to current screening practice have
been raised. Currently, all countries in the European Union have some
form of breast and cervical cancer screening for the population at aver-
age risk. Implementation of population-based colorectal cancer screening
has also started in many of these countries. The effect of current cancer
screening programmes can be estimated by observational studies, includ-
ing incidence-based cohort mortality studies, case-control studies and
trend studies. Using the evidence from observational studies, the Interna-
tional Agency for Research on Cancer recently estimated the reduction in
breast cancer mortality as a result of mammography screening to be 40%
in women aged 50 to 69 years who attended screening (Lauby-Secretan et
al, 2015). The reduction in breast cancer mortality was 23% for women
in the same age range who were invited to screening. These estimates
were based on incidence-based cohort mortality studies that had largely
accounted for lead-time bias and geographical or temporal differences
between screened and unscreened groups. Informative case-control studies
of the effect of invitation to screening and a small number of informative
ecological studies largely support these estimates and it has been stated
that the observational research evidence for the benefit of mammography
screening is also sufficient for women aged 70 to 74 years (Lauby-Secre-
tan et al, 2015).
14 Sankatsing and de Koning

In the randomised trials, and many observational studies, the benefit of
mammography screening is expressed as a percentage reduction in breast
cancer mortality, which is a relative measure. However, similar relative
reductions in different age groups may correspond to different absolute
numbers of breast cancer deaths prevented, because the incidence of
breast cancer increases with age. Therefore more recently, the screen-
ing benefit is also sometimes defined as the absolute number needed to
screen (NNS) to prevent one breast cancer death or to gain one life-year.
Methodological Issues
There are two important biases that are specifically associated with the
evaluation of screening:
1. Lead-time bias is related to earlier diagnosis because of screening.
The time between screen-detection of a preclinical detectable lesion
and the time at which the tumour would have appeared clinically in
the absence of screening is referred to as the lead-time. Because of the
lead-time, the time between diagnosis and death is longer in cases of
screen-detection than in cases of clinical detection, even if the actual
date of death is not delayed.
2. Length bias. Cancers detected at screening do not reflect a representa-
tive sample, because slow-growing tumours (which have a rather good
prognosis and longer survival) are more likely to be detected at screen-
ing than fast-growing tumours, since the slow-growing tumours are in
the preclinical detectable phase for longer.
As both types of bias are related to survival after diagnosis, randomised
trials can be designed to avoid lead-time and length bias by using cancer
mortality (rather than overall survival) as the endpoint. However, the
most extreme form of length bias relates to overdiagnosis, which can-
not be avoided and is often argued to be the major harm of screening
(discussed below).
Screening 15
Potential biases in randomised trial studies
Randomised trials are generally considered to deliver the most reliable esti-
mates of the effects of screening. If properly designed, randomised trials
can overcome lead-time bias and length bias. However, there are potential
methodological issues associated with randomised trials, which may lead to
biased estimates of effects. Common practical issues, which affect the inter-
nal validity of trials, are inadequate randomisation or loss to follow-up.
Other potential biases associated with randomised trials are, for example:
1. Contamination of the control group with screening (of any form),
which can lead to underestimation of the screening effect.
Example 1
Two of the mammography trials (Canadian National Breast Screening
Study-1 and -2) did not use registry-based invitations but were volun-
teer-based, leading to screening uptake by only a small proportion of
the population (Miller et al, 1992a; Miller et al, 1992b). In addition,
women in the control group in one of the two Canadian trials (Study-2)
were physically examined once a year by professionally trained nurses
(Miller et al, 1992b). This physical examination could have led to a shift
in the stage distribution of cancers detected (Rijnsburger et al, 2004).
2. Bias due to cluster randomisation. Randomised trials generally use

individual randomisation in which each participant is recruited and
randomly allocated to the intervention independently. Cluster ran-
domisation can be an alternative, when individual randomisation is
not feasible (Clarke, 2009). Although contamination of the control
group with screening might be less common using cluster randomi-
sation, bias with respect to the comparability of risk factors in the
intervention and control groups at baseline is more likely than with
individual randomisation.

Example 2
The combined effect of the randomised trials of mammography
screening, expressed as the RR of breast cancer mortality, has been
assessed by several meta-analyses. A Cochrane Review and the
Independent UK Panel on Breast Cancer Screening (Gøtzsche and
Jorgensen, 2013; Marmot et al, 2013) assessed the combined RR by
performing an intention to treat analysis, and excluding the Edinburgh
trial because its cluster randomisation resulted in groups that were not
comparable with respect to socioeconomic status (Alexander et al,
1999). Statistically significant RR reductions in breast cancer mor-
tality of 19% and 20% respectively (RR 0.81, 95% CI: 0.74–0.87; RR
0.80, 95% CI: 0.73–0.89) were found.
Potential biases in observational studies

Observational studies are more prone to bias than randomised trials.
Adequate control of the risk of bias by study design and appropriate
analysis is therefore crucial. Biases in observational studies vary by study
type, but a common important drawback compared with randomised
trials is that observational studies of screening may lack comparability
of screened women and control women because there is no control by
design with respect to women who undergo the intervention and women
who do not undergo the intervention. This chapter focuses on three types
of observational studies, often used for the evaluation of screening pro-
grammes: incidence-based cohort mortality studies, case-control studies
and trend studies.
Incidence-based cohort mortality studies can estimate the effect of
attending screening or being invited to screening. In these studies,
breast cancer mortality from cancers diagnosed in the cohort after first
invitation to screening is compared with expected breast cancer mor-
tality in the absence of screening. The latter is estimated using a com-
parison group, ideally consisting of women (from the same region) not
yet invited for screening. If this is not possible, a historical comparison
group may be used. However, historical data may make it difficult to
distinguish between a reduction in cancer mortality because of screening
Screening 17
and other temporal changes. In addition, poor adjustment for lead-time
and lack of individual follow-up of women often lead to biased estimates
in incidence-based mortality studies (Njor et al, 2012).
Case-control studies measure the effects of exposure to screening by
comparing history of screening between women who died from breast
cancer (cases) and women who did not die from breast cancer (controls).
There are several potential biases associated with case-control studies.
The most important is probably self-selection bias. Self-selection may
cause bias in favour of screening because health-conscious women are
more likely to attend screening.
Example 3
Broeders et al (2012) conducted a meta-analysis of eight case-control
studies that quantified the effects of population-based mammogra-
phy screening. Before adjustment for self-selection bias, the com-
bined odds ratio (OR) for breast cancer mortality in screened versus
unscreened women was 0.46 (95% CI: 0.40–0.54), which corresponds
to a reduction in the odds of dying of breast cancer of 54%. After
adjustment for self-selection, the reduction in breast cancer mortality
fell slightly to 48% (combined OR 0.52, 95% CI: 0.42–0.65).
The effect of screening programmes can also be estimated by studying

trends in population cancer mortality rates over time. However, trend
studies are at high risk of bias because of the gradual implementation of
population-based screening and because of the fact that deaths from can-
cer diagnosed before the implementation of screening (prevalent cases)
cannot be excluded (Moss et al, 2012). These potential biases are likely
to dilute the screening effect. They can be partially reduced by using
sufficiently long follow-up after full screening coverage, excluding the
period directly after the implementation of screening from the analysis,
and restricting the analysis to age ranges in which the benefit of screen-
ing is most likely to appear (usually 5 years above the age group invited
to screening) (Moss et al, 2012).

Reviews of European trend studies report estimates of reduction in breast
cancer mortality (annual percentage change) after implementation of
mammography screening of 1%, 2.3%–2.8% and 9% per year, excluding
studies with less than 10 years of follow-up after full screening cover-
age was reached (Broeders et al, 2012; Moss et al, 2012). Studies that
compared breast cancer mortality between time periods before and after
implementation of population-based screening estimated reductions of
28% to 36% (Broeders et al, 2012; Moss et al, 2012).
The Harms of Screening: Overdiagnosis

Alongside the beneficial effect on cancer mortality, screening is also
associated with potential harms. Overdiagnosis is one of the most
important potential adverse outcomes (Jørgensen and Gøtzsche, 2009).
It is defined as screen-detection of tumours that would never have pre-
sented clinically during an individual’s lifetime in the absence of screen-
ing. With respect to breast cancer screening, overdiagnosis could occur
because some cases of screen-detected ductal carcinoma in situ (DCIS)
or indolent invasive breast cancer may have never presented clinically,
due to slow growth, a complete lack of growth or regression of the lesion
(Biesheuvel et al, 2007; Yen et al, 2003). Overdiagnosis is also possible
with respect to lesions with average or high growth rates if the person
dies of another cause. Overdiagnosis results in more individuals being
diagnosed in the presence of screening and may lead to overtreatment in
the screening setting. Complications or side effects as a consequence of
overtreatment are undesired, since treatment of overdiagnosed cancers
will not improve survival.
As some tumours may progress too slowly to become clinically apparent
during the person’s lifetime, screening for cancers with relatively low
tumour growth rates, e.g. prostate cancer, may be more prone to overdi-
agnosis than screening for cancers with average or relatively high growth
rates, e.g. breast cancer. Overdiagnosis is also a concern for lung cancer
screening because death from competing comorbidity (often smoking-
related) is common in the population eligible for screening.
Screening 19
Methodological Issues
Potential biases in randomised trials
Overdiagnosis can be estimated from randomised trials in which cancer
mortality was the endpoint by calculating the number of excess cancers
in the intervention group. This is ideally carried out by comparing the
cumulative incidence in the intervention and control groups.
Inadequate follow-up after the trial may lead to biased estimates of over-
diagnosis. As cancers are detected earlier due to screening, the incidence
in individuals participating in screening will be higher during the screen-
ing period. When screening in the trial has finished, cancer incidence in
the intervention group decreases. It is expected that extra cancers will be
diagnosed in the control group once the lead-time has passed. Therefore,
overdiagnosis can be estimated if sufficiently long follow-up has passed
after cessation of the trial’s screening period, to allow for all cancers in
the control group to appear clinically. If the follow-up period is too short,
the effect of lead-time is not taken into account and the extent of over-
diagnosis due to screening is likely to be overestimated.
Another important potential bias is screening of the control group at the
end of a trial. For example, in several trials of mammography screening,
women in the control group were offered screening after the trial, which
may have led to overdiagnosis in the control group. This would lead to an
underestimation of the extent of overdiagnosis in the intervention group.
Example 4
The Independent UK Panel on Breast Cancer Screening estimated
overdiagnosis in the randomised trials of mammography screening to
be 11% for women invited to screening during lifetime, based on the
trials for which it is clear that the women in the control groups were
not offered screening at the end of the trial (Malmö I trial and Cana-
dian National Breast Screening Study-1 and -2 [Marmot et al 2013;
Miller et al, 2000; Miller et al 2002; Zackrisson et al, 2006]).

In summary, overdiagnosis is ideally estimated by comparing the cumu-
lative incidence in the intervention and control groups from randomised
trials with sufficiently long follow-up, in which the specific cancer mor-
tality was used as the endpoint and in which the control group was not
offered screening at the end of the trial.
Potential biases in observational studies
Estimates of overdiagnosis in the randomised trials of breast cancer
screening stem from a study setting more than 20 years ago. The impact
of overdiagnosis in current population-based screening settings can be
estimated from observational studies by comparing breast cancer inci-
dence in screened and unscreened populations.
Estimates of overdiagnosis as a result of mammography screening are
known to vary widely (Puliti et al, 2012). This variation may be partially
caused by bias in the studies.
The two most important potential biases in estimates of overdiagnosis
from observational studies are:
■ Differences in the underlying risk of breast cancer in the populations
compared
■ Failure to account for the effect of lead-time (Pulliti et al, 2011)
Example 5
Puliti et al (2012) conducted a review of European observational stud-
ies that estimated overdiagnosis as a result of population-based mam-
mography screening. After exclusion of studies that failed to adjust
properly for underlying breast cancer risk and lead-time, estimates
of overdiagnosis ranged from 1% to 10% (as opposed to 0% to 54%
before exclusion) with a summary estimate of 6.5% (Paci, 2012; Puliti
et al, 2012).
To overcome bias related to lead-time, overdiagnosis is ideally estimated

using follow-up until death. Since life-long follow-up is not feasible for tri-
als or observational studies, it must be simulated with microsimulation mod-
els. Using the MISCAN (MIcrosimulation SCreening ANalysis) model,
Screening 21
the rate of overdiagnosis as a result of breast cancer screening in the
Netherlands was calculated for different phases of the screening programme
and for different populations at risk (de Gelder et al, 2011). The estimated
overdiagnosis rate was 3.6% of all predicted cancers in women invited to
screening and older women, 5 years after the screening programme reached
full coverage. During the implementation phase of screening, overdiag-
nosis was estimated to be substantially higher (11.4% in the total female
population), which emphasises the importance of long follow-up when
seeking a reliable estimate.
Along with the aforementioned biases, variation in the estimates of over-
diagnosis is also caused by differences in the definition of the population
at risk that is used to calculate overdiagnosis (de Gelder et al, 2011).
Cost-effectiveness of Breast Cancer Screening

Programmes
Since the randomised trials of breast cancer screening in the early 1970s
and 1980s, breast cancer screening programmes have been implemented
in many countries. In making decisions about cancer screening, which is
offered to an asymptomatic population, it is important to know whether it
has a significant effect on cancer mortality (compared with no screening)
and whether it is cost-effective. Factors which affect the cost effective-
ness of cancer screening are:
1. How the screening is organised
2. Policy of the programme
3. Country-specific factors
In European countries, organised breast cancer screening has been dem-
onstrated to be cost-effective (Carles et al, 2011; Groenewoud et al,
2007).
Screening outside a national programme is called ‘opportunistic screen-
ing’. In countries with organised screening programmes, opportunistic
screening is rarely performed (Vainio and Bianchini, 2002) but in some
countries, such as the United States (US), opportunistic screening is
common practice. Breast cancer screening practice in the US has also

been proven to be cost-effective (Stout et al, 2006), but opportunistic
mammography screening is less cost-effective than organised breast can-
cer screening, even if the screening benefit is equal (Bulliard et al, 2009;
de Gelder et al, 2009).
It is difficult to generalise the conclusions of cost-effectiveness analyses
between countries because cost-effectiveness also depends on the spe-
cific policy of the programme, including factors such as:
■ Target age range
■ Screening interval
Example 6
Although more frequent screening may lead to improved detection of
fast-growing cancers and a potential increase in the benefits of screening
(Bailey et al, 2010; Buist et al, 2004), annual screening has often been
demonstrated to be less cost-effective than biennial screening because
of a disproportional rise in costs compared to the effects for an annual
over a biennial interval (Schousboe et al, 2011; Stout et al, 2014).
Most European breast cancer screening programmes are targeted at

women aged 50–69 years, with a screening interval of 2 years (Giordano
et al, 2012). This age range has been extended to 40 years, 74 years,
or both in some European countries. However, even if there is general
consistency among European countries with respect to their screening
policies, the benefits and costs of screening may differ between countries
because the cost-effectiveness of screening also depends on country-spe-
cific characteristics such as:
■ Cancer incidence and cancer mortality before the start of screening
■ Structure of the healthcare system
■ Coverage of the population by invitation
■ Participation rate (De Koning, 2000; van Ineveld et al, 1993).
Participation rates differ substantially between European countries,
ranging from 19% to 89% (Giordano et al, 2012).
Screening 23
In addition to having reliable information on cost-effectiveness, when
considering the implementation of organised screening it is important
to assess the cost-effectiveness of ongoing programmes, as the ratio of
effects and costs may change over time. Assessing the cost-effectiveness
of current screening programmes is particularly relevant when changes
to screening policies are being considered (for example, extension of
screening) or when new screening technology becomes available.
Conclusions
The benefits and harms of screening can be quantified by randomised trials
or observational studies, if potential biases are adequately accounted for.
To justify a screening programme, the evidence of its benefits needs to
be sufficient. In addition, it is essential to determine whether the benefits
of screening outweigh the harms, and whether the screening programme
would be cost-effective in a specific country or region.
This chapter helps decision makers to consider the implementation or
modification of population-level screening by discussing several poten-
tial biases associated with research into the effects of screening that may
influence the results of randomised trials and observational studies. It
also highlights important factors to consider when interpreting cost-
effectiveness analyses.
Dr Sankatsing has reported no conflict of interest.
Dr de Koning has reported no conflict of interest.
Further Reading
Biesheuvel C, Barratt A, Howard K, et al. Effects of study methods and biases
on estimates of invasive breast cancer overdetection with mammography
screening: a systematic review. Lancet Oncol 2007; 8:1129–1138.
Broeders M, Moss S, Nyström L, et al. The impact of mammographic screening
on breast cancer mortality in Europe: a review of observational studies. J
Med Screen 2012; 19 Suppl 1:14–25.

de Gelder R, Bulliard JL, de Wolf C, et al. Cost-effectiveness of opportunis-
tic versus organised mammography screening in Switzerland. Eur J Cancer
2009; 45:127–138.
de Gelder R, Heijnsdijk EA, van Ravesteyn NT, et al. Interpreting overdiagno-
sis estimates in population-based mammography screening. Epidemiol Rev
2011; 33:111–121.
Giordano L, von Karsa L, Tomatis M, et al. Mammographic screening pro-
grammes in Europe: organization, coverage and participation. J Med Screen
2012; 19 Suppl 1:72–82.
Lauby-Secretan B, Scoccianti C, Loomis D, et al. Breast-cancer screening – view-
point of the IARC Working Group. N Engl J Med 2015; 372:2353–-2358.
Marmot MG, Altman DG, Cameron DA, et al. The benefits and harms of breast
cancer screening: an independent review. Br J Cancer 2013; 108:2205–2240.
Moss SM, Nyström L, Jonsson H, et al. The impact of mammographic screening
on breast cancer mortality in Europe: a review of trend studies. J Med Screen
2012; 19 Suppl 1:26–32.
Nyström L, Andersson I, Bjurstam N, et al. Long-term effects of mammography
screening: updated overview of the Swedish randomised trials. Lancet 2002;
359:909–919.
Stout NK, Lee SJ, Schechter CB, et al. Benefits, harms, and costs for breast
cancer screening after US implementation of digital mammography. J Natl
Cancer Inst 2014; 106:dju092.
References
Alexander FE, Anderson TJ, Brown HK, et al. 14 years of follow-up from
the Edinburgh randomised trial of breast-cancer screening. Lancet 1999;
353:1903–1908.
Bailey SL, Sigal BM, Plevritis SK. A simulation model investigating the impact
of tumor volume doubling time and mammographic tumor detectability on
screening outcomes in women aged 40-49 years. J Natl Cancer Inst 2010;
102:1263–1271.
Biesheuvel C, Barratt A, Howard K, et al. Effects of study methods and biases
on estimates of invasive breast cancer overdetection with mammography
screening: a systematic review. Lancet Oncol 2007; 8:1129–1138.
Broeders M, Moss S, Nyström L, et al. The impact of mammographic screening
on breast cancer mortality in Europe: a review of observational studies. J
Med Screen 2012; 19 Suppl 1:14–25.
Buist DS, Porter PL, Lehman C, et al. Factors contributing to mammography
failure in women aged 40-49 years. J Natl Cancer Inst 2004; 96:1432–1440.
Screening 25
Bulliard JL, Ducros C, Jemelin C, et al. Effectiveness of organised versus oppor-
tunistic mammography screening. Ann Oncol 2009; 20:1199–1202.
Carles M, Vilaprinyo E, Cots F, et al. Cost-effectiveness of early detection of
breast cancer in Catalonia (Spain). BMC Cancer 2011; 11:192.
Clarke M. Cluster trials: a few words on why and how to do them. Int J Epide-
miol 2009; 38:36–37.
de Gelder R, Bulliard JL, de Wolf C, et al. Cost-effectiveness of opportunis-
tic versus organised mammography screening in Switzerland. Eur J Cancer
2009; 45:127–138.
de Gelder R, Heijnsdijk EA, van Ravesteyn NT, et al. Interpreting overdiagno-
sis estimates in population-based mammography screening. Epidemiol Rev
2011; 33:111–121.
De Koning HJ. Breast cancer screening; cost-effective in practice? Eur J Radiol
2000; 33:32–37.
Giordano L, von Karsa L, Tomatis M, et al. Mammographic screening pro-
grammes in Europe: organization, coverage and participation. J Med Screen
2012; 19 Suppl 1:72–82.
Gøtzsche PC, Jørgensen KJ. Screening for breast cancer with mammography.
Cochrane Database Syst Rev 2013; 6:CD001877.
Groenewoud JH, Otten JD, Fracheboud J, et al. Cost effectiveness of different
reading and referral strategies in mammography screening in the Nether-
lands. Breast Cancer Res Treat 2007; 102:211–218.
Jørgensen KJ, Gøtzsche PC. Overdiagnosis in publicly organised mammography
screening programmes: systematic review of incidence trends. BMJ 2009;
339:b2587.
Lauby-Secretan B, Scoccianti C, Loomis D, et al. Breast-cancer screening –
viewpoint of the IARC Working Group. N Engl J Med 2015; 372:2353–2358.
Marmot MG, Altman DG, Cameron DA, et al. The benefits and harms of breast
cancer screening: an independent review. Br J Cancer 2013; 108:2205–2240.
Miller AB, Baines CJ, To T, Wall C. Canadian National Breast Screening Study:
1. Breast cancer detection and death rates among women aged 40 to 49 years.
CMAJ 1992a; 147:1459–1476.
Miller AB, Baines CJ, To T, Wall C. Canadian National Breast Screening Study:
2. Breast cancer detection and death rates among women aged 50 to 59 years.
CMAJ 1992b; 147:1477–1488.
Miller AB, To T, Baines CJ, Wall C. Canadian National Breast Screening Study-
2: 13-year results of a randomized trial in women aged 50-59 years. J Natl
Cancer Inst 2000; 92:1490–1499.

Miller AB, To T, Baines CJ, Wall C. The Canadian National Breast Screening
Study-1: breast cancer mortality after 11 to 16 years of follow-up. A rand-
omized screening trial of mammography in women age 40 to 49 years. Ann
Intern Med 2002; 137(5 Part 1):305–312.
Moss SM, Nyström L, Jonsson H, et al. The impact of mammographic screening
on breast cancer mortality in Europe: a review of trend studies. J Med Screen
2012; 19 Suppl 1:26–32.
Njor S, Nyström L, Moss S, et al. Breast cancer mortality in mammographic
screening in Europe: a review of incidence-based mortality studies. J Med
Screen 2012; 19 Suppl 1:33–41.
Nyström L, Andersson I, Bjurstam N, et al. Long-term effects of mammography
screening: updated overview of the Swedish randomised trials. Lancet 2002;
359:909–919.
Paci E. EUROSCREEN Working Group. Summary of the evidence of breast
cancer service screening outcomes in Europe and first estimate of the benefit
and harm balance sheet. J Med Screen 2012; 19:Suppl 1:5–13.
Puliti D, Duffy SW, Miccinesi G, et al. Overdiagnosis in mammographic screen-
ing for breast cancer in Europe: a literature review. J Med Screen 2012; 19
Suppl 1:42–56.
Puliti D, Miccinesi G, Paci E. Overdiagnosis in breast cancer: design and meth-
ods of estimation in observational studies. Prev Med 2011; 53:131–133.
Rijnsburger AJ, van Oortmarssen GJ, Boer R, et al. Mammography benefit in
the Canadian National Breast Screening Study-2: a model evaluation. Int J
Cancer 2004; 110:756–762.
Schousboe JT, Kerlikowske K, Loh A, Cummings SR. Personalizing mammog-
raphy by breast density and other risk factors for breast cancer: analysis of
health benefits and cost-effectiveness. Ann Intern Med 2011; 155:10–20.
Shapiro S, Strax P, Venet L. Evaluation of periodic breast cancer screening
with mammography. Methodology and early observations. JAMA 1966;
195:731–738.
Stout NK, Lee SJ, Schechter CB, et al. Benefits, harms, and costs for breast
cancer screening after US implementation of digital mammography. J Natl
Cancer Inst 2014; 106:dju092.
Stout NK, Rosenberg MA, Trentham-Dietz A, et al. Retrospective cost-effec-
tiveness analysis of screening mammography. J Natl Cancer Inst 2006;
98:774–782.
Vainio H, Bianchini F. IARC Handbooks of cancer Prevention. Volume 7: Breast
cancer screening. International Agency for Research on Cancer. Lyon: IARC
Press, 2002.
Screening 27
van Ineveld BM, van Oortmarssen GJ, de Koning HJ, et al. How cost-effective
is breast cancer screening in different EC countries? Eur J Cancer 1993;
29A:1663–1668.
Yen MF, Tabár L, Vitak B, et al. Quantifying the potential problem of overdi-
agnosis of ductal carcinoma in situ in breast cancer screening. Eur J Cancer
2003; 39:1746–1754.
Zackrisson S, Andersson I, Janzon L, et al. Rate of over-diagnosis of breast can-
cer 15 years after end of Malmö mammographic screening trial: follow-up
study. BMJ 2006; 332:689–692.

Prognosis
P.A.J. Vissers
F.N. van Erning
3
V.E.P.P. Lemmens
Department of Research, Netherlands Comprehensive Cancer Organisation
(IKNL), Utrecht, Netherlands
Introduction
Cancer prognosis is the patient’s expected chance of recovery from the
disease. There are several measures of prognosis, such as quality of life
and survival. In this chapter, we limit our discussion to various types of
survival. Survival rates are expressed as the proportion of patients alive
within a certain period of time, usually within 5 years since diagnosis or
start of treatment. Measures of survival are usually given as an average
(mean or median) based on a large group of patients.
When using survival data, either in research or in everyday clinical prac-
tice, it is important to realise that there are different methods for expressing
the likely duration of survival. This chapter will guide you through possible
biases of these different methods in order to better interpret the study results.
We use colon cancer as our main example to illustrate the key points.
Factors Influencing Cancer Survival

There are many tumour-related, treatment-related and sociodemographic
factors that affect cancer survival.
Tumour-related Prognostic Factors
Cancer survival depends on cancer type. In Europe, the types with the best
survival, with an average 5-year survival rate of above 85%, are testicular
cancer, thyroid cancer, skin melanoma and early-stage prostate cancer. On the
other hand, the types of cancer with the poorest survival in Europe are lung,
29
oesophageal, liver and pancreatic cancer, with 5-year survival rates below
15% (EUROCARE, 2015). Another prognostic factor is the anatomical
extent of the disease, commonly classified according to the TNM classifica-
tion, which comprises tumour size, affected (loco)regional lymph nodes and
metastases (Brierly et al, 2016). A higher tumour stage is usually associated
with poorer survival (Maringe et al, 2013), but there are some exceptions.
For example, among colon cancer patients, a survival paradox was observed
after the introduction of the 6th TNM staging system (Sobin and Wittekind,
2002). Several studies showed that stage IIIA (T1-2N1) colon cancer is asso-
ciated with better survival than stage IIB (T4N0) disease (Kim et al, 2015;
O’Connell et al, 2004). Similar results were found in the Netherlands (Figure
1) and in the Surveillance, Epidemiology and End Results (SEER) database
(Gunderson et al, 2010). This difference may be due to the fact that patients
with stage III colon cancer are treated with adjuvant chemotherapy, whereas
stage II colon cancer patients are not. Moreover, it has been suggested that
T4N1 tumours may be understaged as T4N0 tumours or that T4N0 tumours
are more aggressive by nature (O’Connell et al, 2004). This phenomenon is
known as stage migration, which will be discussed later in this chapter. As
well as stage, there are numerous other tumour-related factors that affect sur-
vival, such as histology, mutational status and biochemical markers.
100
90
80 Stage I
Percent survival
70 Stage IIA
60 Stage IIB
50 Stage IIIA
40 Stage IIIB
30 Stage IIIC
20
Stage IV
10
0
1 year 2 year 3 year 4 year 5 year
Figure 1 Relative survival of colon cancer patients diagnosed between 2004 and
2009 in the Netherlands, stratified by cancer stage.
From Netherlands Cancer Registry, Netherlands Comprehensive Cancer Organisation. By permission of the
Netherlands Comprehensive Cancer Organisation.
30 Vissers et al.
Treatment-related Prognostic Factors
Survival is also related to cancer treatment: both the type of treatment
and the patient’s response to it. For instance, cytoreductive surgery and
hyperthermic intraperitoneal chemotherapy (HIPEC) have improved
the survival of patients with peritoneal carcinomatosis from colorectal
cancer significantly compared with systemic 5-fluorouracil and leuco-
vorin (Verwaal et al, 2008). Therefore, part of the international varia-
tion in survival can be explained by differences in treatment guidelines.
Interactions between tumour- and treatment-related characteristics can
also influence prognosis. For example, tumoural RAS mutations are con-
sidered as a contraindication for treatment with anti-EGFR agents in
patients with metastatic colorectal cancer (Punt et al, 2017).
Sociodemographic Prognostic Factors

Several sociodemographic factors affect cancer survival. Among the most
well-known are age, sex, race (Joosse et al, 2013; Shahir et al, 2006; Yoon
et al, 2015) and comorbidity (Figure 2). Comorbid conditions display an
independent relation with survival in most cancer types (Janssen-Heijnen et
al, 2005) and low socioeconomic status has also been associated with poorer
survival among cancer patients (Aarts et al, 2013; Ward et al, 2004). This
could be because cancer patients with low socioeconomic status have less
knowledge about where and how to access healthcare. Treatment compli-
ance could also be poorer in patients with low socioeconomic status.
100 0 1 ≥2 comorbid conditions
5-year relative survival
90
80
70
60
50
40
30
20
10
0
<65 years 65 - 79 years 80+ years
Figure 2 5-year relative survival among colon cancer patients diagnosed between
2004 and 2009 in the southern region of the Netherlands, stratified by age and
number of comorbid conditions.
From South region of the Netherlands Cancer Registry, Netherlands Comprehensive Cancer Organisation. By permission
of the Netherlands Comprehensive Cancer Organisation.
31
Prognosis
Improvement Over Time
For several types of cancer, survival has improved over time. It is important
to remember this if studies from different time periods are being considered.
In the EUROCARE-5 study, data from cancer registries in 31 countries
showed improved 5-year relative survival rates in 2005–2007 compared
with 1999–2001 for prostate cancer, non-Hodgkin lymphoma, colorectal
cancer, kidney cancer and breast cancer (Figure 3) (EUROCARE, 2015).
100 1999-2001 2005-2007 Difference

90
5-year relative survival
80
70
60
50
40
30
20
10 8.3 6.6 5.5 4.1 4.0 3.8
0
Prostate Non-Hodgkin Rectum Kidney Breast Colon
lymphoma
Figure 3 5-year relative survival time trends for several types of cancer in Europe 2000-2007.
From the EUROCARE-5 study (EUROCARE, 2015) presentation at the European Cancer Conference 2015,
reproduced with permission from Sant M, EUROCARE.
Prognosis and Survival

Prognosis of cancer patients is often expressed as a survival rate. For all
survival analyses, a starting point and the time interval to the event of inter-
est should be defined (see Chapter 8). As an example, in randomised tri-
als (see Chapter 6), the appropriate starting point is usually the time of
randomisation. In observational studies, the starting point will depend
on the research question and might be the date of diagnosis or start of treat-
ment. Events of interest for cancer prognosis include death (cause-specific)
or cancer recurrence, depending on the study objective (see Chapter 7).
The survival time is defined as the time from the starting point to the time
of event or end of follow-up for those who did not experience the event at
the end of follow-up, with censoring of patients who have not yet experi-
32 Vissers et al.
enced the event, perhaps because they left the study or were event free at
the end of the study period (Dos Santos Silva, 1999). Different methods
are available for survival analyses, to allow the calculation of results for
single groups or to compare groups, and these are discussed in Chapter 8.
The most commonly used types of survival estimate are discussed briefly
below. All may be used in reports of studies of prognosis.
Overall Survival
Overall survival is the most basic form of survival, in which the time
from the starting point until death due to any cause is studied. In other
words, no distinction is made between deaths from the cancer under
investigation and deaths from other causes (Dos Santos Silva, 1999).
Cancer-specific Survival and Relative Survival

Contrary to overall survival, cancer-specific survival takes the cause of
death into account. Deaths due to the cancer under investigation are the
outcome of interest, and deaths from other causes are dealt with separately
in the analyses. There are two main approaches for reporting cancer-spe-
cific survival: net survival and crude probability of death.
Net survival represents the probability of surviving cancer in the absence
of other causes of death. Net survival is not influenced by changes in mor-
tality from other causes and therefore provides a useful measure for cancer
control over time (National Cancer Institute, 2018). On the other hand,
crude probability of death is the probability of dying of cancer in the pres-
ence of other causes of death. It provides a measure for the risk of death
from cancer when all causes of death are possible. The crude measure is
mostly reported as a cumulative probability of death from cancer instead
of a survival rate, because the survival rate contains both the probability of
surviving and the probability of dying from other causes (National Cancer
Institute, 2018).
Measures of net survival and crude probability of death can be quite dif-
ferent, for example in men over the age of 70 with prostate cancer. Using
net survival, the probability of dying of prostate cancer is 40% within 15
years of diagnosis; however, when other causes of death are taken into
account, the crude probability of dying of prostate cancer within 15 years
Prognosis 33
is 20% (Cronin and Feuer, 2000). Net survival is used when studying can-
cer progress to represent trends and compare differences between groups
of cancer patients. Crude cause-specific probability of death is mainly of
interest when assessing the impact of cancer on an individual level and is
used in predictive tools, clinical decisions and cost-effectiveness analyses.
A possible limitation of studies that use cause-specific survival is that
recorded causes of death are often unreliable. For example, if a cancer
metastasises, the death certificate might incorrectly list the metastatic site
as the underlying cause of death (National Cancer Institute, 2016). Fur-
thermore, it is often difficult to establish for certain whether the cause of
death is truly related to the cancer under investigation.
If information regarding cause of death is not available or unreliable,
relative survival can be used. Relative survival is an estimate of cancer-
specific survival. It is calculated as the absolute survival among cancer
patients divided by the expected survival of a comparable group from the
general population (i.e. with the same age and gender distribution, and
over the same period of time). Relative survival rates reflect the excess
mortality due to cancer and are therefore higher than the corresponding
absolute survival. This effect is more pronounced among the elderly, in
whom competing causes of death are more prominent (Figure 4).
100
90
80 <65 years Relative survival
Percent survival
70 <65 years Absolute survival

60 65-79 years Relative survival
50 65-79 years Absolute survival
40 80+ years Relative survival
30 80+ years Absolute survival
20
10
0
1 year 2 year 3 year 4 year 5 year
Figure 4 5-year absolute versus relative survival among colon cancer patients
diagnosed between 2004 and 2009, stratified by age.
From Netherlands Cancer Registry, Netherlands Comprehensive Cancer Organisation. By permission of
the Netherlands Comprehensive Cancer Organisation.
34 Vissers et al.
In a population-based study among breast cancer patients, Schaffar et
al (2015) showed that net survival estimates derived using the cause of
death were very sensitive to misclassification of cause of death, while
net survival estimates derived using relative survival were more robust.
Therefore, relative survival is recommended for estimation of long-
term net survival among patients with breast cancer and, when reading
a study of survival outcome, it is important to note which method was
used (Schaffar et al, 2015).
Disease-free Survival, Progression-free Survival and Recurrence-free

Survival
The terms disease-free, progression-free and recurrence-free survival
are not always consistently used (see Chapters 7 and 8). Disease-free
survival is often used in settings where the cancer has not spread beyond
the site of the primary tumour (which might be removed by surgery and
then treated with adjuvant therapy), while progression-free survival
is used in the metastatic setting. However, the exact definitions differ
between studies. For example, some adjuvant studies allow the inclusion
of patients with second primary cancers, whereas others do not. There-
fore, inclusion and exclusion criteria should always be provided in the
report of a study, to enable a correct interpretation of the survival pro-
portions. Another point to consider is how progression and recurrence
are established, because patients in research studies might be monitored
more closely than those in routine practice.
Disease-free and progression-free survival are increasingly used as pri-
mary outcomes in randomised trials (see Chapters 6 and 7). For example,
several studies of systemic therapy among patients with colorectal cancer
have shown that they provide valid surrogates for overall survival (Buyse
et al, 2007; Sargent et al, 2005; Sargent et al, 2007; Tang et al, 2007).
Conditional Survival
Survival estimates reported from the time of cancer diagnosis are not
necessarily applicable to patients who have already survived for some
time after their initial diagnosis and treatment. Conditional survival
analysis is a method for estimating the survival rate for patients who
Prognosis 35
have already survived for a certain period of time. Such survival esti-
mates appear useful for cancer survivors because they yield more rel-
evant information about their future prognosis, which can be used for
personal health-related planning and for the organisation of cancer sur-
veillance by physicians. These estimates also provide information about
excess mortality among cancer survivors compared with the general
population (van Erning et al, 2014).
Cancer Prognosis in Trials Versus Observational

Studies
Two commonly used study designs for assessing cancer prognosis are ran-
domised trials and population-based cohort studies. As discussed in Chap-
ter 6, randomised trials provide a gold standard for evaluating the relative
effects of treatments, with patients randomly assigned to the alternatives
(often, treatment and control) to create groups that are equal with respect
to all features except the treatment assignment (Booth and Tannock, 2014).
To maximise the power and minimise the potential for bias, randomised
trials of the same topic should be combined in systematic reviews (see
Chapter 9). An example of a treatment for which the efficacy was estab-
lished in randomised trials is adjuvant chemotherapy for stage III colon
cancer patients, for which one trial showed that patients who received a
combination of 5-fluorouracil and leucovorin had a significant improve-
ment in 5-year overall survival compared with patients who underwent
resection only: 74% and 63%, respectively (O’Connell et al, 1997).
Randomised trials have a superior internal validity compared with
observational studies, but their generalisability might be limited, espe-
cially for estimating prognosis. Randomised trials often use strict eligi-
bility criteria, perhaps limiting their study population to relatively young
and healthy patients and excluding patients with the poorest life expec-
tancy. In contrast, patients seen in routine practice are likely to be much
more heterogeneous. Therefore, randomised trials may not be helpful
in estimating prognosis in routine practice and knowledge gained from
randomised trials should be complemented with data from population-
based observational studies. For instance, an observational study using
data from the Netherlands Cancer Registry found that, among stage III
36 Vissers et al.
colon cancer patients, crude 5-year overall survival was 29% for patients
who did not receive adjuvant chemotherapy versus 62% for patients who
did receive adjuvant chemotherapy (van Steenbergen et al, 2010). The
difference between treatment groups in this observational study is much
larger than in the previously mentioned trial, which is (to a large extent)
due to the selection of the fitter patients for adjuvant chemotherapy in
routine practice, meaning that relatively unfit patients with comorbidity
and bad performance status were more common in the group who did not
receive chemotherapy.
Types of Bias
In both randomised trials and observational studies, several types of bias
can occur. Even though randomised trials are considered the gold standard
for assessing the effects of interventions, some research questions can only
be answered by population-based observational studies because of ethical
concerns. For example, the harmfulness of cigarette smoking could not be
studied in a randomised trial because it would require some of the partici-
pants to be allocated to smoking.
Some of the biases that are common in population-based observational
studies are discussed here. The most important types are selection bias,
confounding, and information bias (also often referred to as measure-
ment bias or classification bias).
Selection Bias
Selection bias is the selective recruitment of patients who are not repre-
sentative of the exposure or outcome pattern in the general population.
In true population-based studies, which would include all inhabitants of a
country or clearly defined region, the risk of selection bias is smaller than
in single or multicentre studies. Also, in population-based cancer studies,
there is always a risk of not including patients who, for example, did not
undergo an oncological treatment. When comparing prognosis of a cer-
tain cancer type between two population-based registries (see Chapter 4),
the registry in which a higher proportion of untreated patients is miss-
ing is likely to exhibit a better prognosis than the registry which is more
complete. This is because patients with the least beneficial outlook (such
Prognosis 37
as the frail elderly or those with extensive disease) relatively often cease
treatment.
Another type of selection bias arises due to loss to follow-up. In line with
the aforementioned example, a differential completeness of vital follow-
up data leads to a seemingly worse prognosis in registries which have a
complete vital follow-up, compared with registries with less adequate
follow-up (see Chapter 4).
Confounding
Often, selection bias is confused with confounding. Confounding can be
referred to as a ‘mixing of effects’ wherein the effects of the exposure
under study on a given outcome are mixed in with the effects of an addi-
tional factor (or set of factors), resulting in a distortion of the true relation-
ship (see also Chapter 1). In a randomised trial, this can happen when the
distribution of a known prognostic factor differs between the groups being
compared (Booth and Tannock, 2014) and might be due to chance (Clarke
and Halsey, 2001). Confounding factors may mask an actual association
or, more commonly, falsely demonstrate an apparent association between
treatment and outcome when no real association between them exists.
If overall survival is used in population-based studies, it is likely that
survival rates will be influenced by confounding by indication, in that
the fittest patients receive treatment and, whether the treatment is effec-
tive or not, they will, by definition, exhibit better survival. This could
be accounted for by the use of propensity score matching to reduce
heterogeneity between groups that receive different treatments (Seeger
et al, 2007).
Information Bias (Measurement Bias, Misclassification)

Information bias may arise in a clinical study because of misclassification
of the level of exposure to the treatment being assessed, or misclassifi-
cation of the disease or its outcome. The misclassification of treatment
(exposure) or disease status can be considered as either differential or non-
differential. In non-differential (random) misclassification, the probability
of exposure being misclassified is independent of disease status and the
38 Vissers et al.
probability of disease status being misclassified is independent of expo-
sure status. Differential misclassification may arise due to recall bias or
observer/interviewer bias.
Other types of information bias are several time-related biases which can
occur in studies of prognosis. Immortal time bias occurs in studies when
death (or any other study outcome) occurs after the end of follow-up. This
bias can arise when the period between the patient entering the cohort and
their first exposure to a drug (during which the event of interest has not
occurred) is either misclassified or simply excluded and not accounted for
in the analysis. Often, this occurs when drug exposure is compared with
non-exposure and extreme beneficial effects for the drug of interest will be
found. Immortal time bias can be prevented by including time-dependent
covariables (i.e. classify participants as unexposed until the first prescrip-
tion of the specific drug and then as exposed thereafter) (Suissa, 2007).
Another time-related bias seen in cohort studies is lead-time bias. If
there has been a trend towards earlier diagnosis (e.g. through the intro-
duction of a screening programme), survival may appear to be improv-
ing over time but the gain may be due entirely to increased lead-time
(i.e. patients are being diagnosed earlier), but with no change in the true
mortality rate (Dos Santos Silva, 1999) (see Chapter 2). To overcome
this bias, the comparison of prognosis between different regions should
take account of any differences in their screening programmes or the
methods used to detect and diagnose cancer, and results should be evalu-
ated stage-specifically. However, in time, bias as a result of cancer stage
migration may also occur. Stage migration occurs due to changes in the
staging system itself or due to evolving technology, which allows more
sensitive detection of the tumour and subsequent spread of the disease.
For example, in a Dutch study, survival was found to increase over time
among cancer patients with stage III colon cancer, but this could be at
least partly explained by stage migration, because some poor-prognosis
patients who would have been classified as stage III by older methods
may have been classified as stage IV in recent times (van Steenbergen
et al, 2012).
Prognosis 39
Conclusions
Survival of cancer patients has improved in recent decades, but there is
still a large variation between regions and subgroups of patients. There are
different approaches to collect, express and analyse survival data and it is
important for clinicians to be aware of these approaches and of the poten-
tial biases related to them.
Dr Vissers has reported no conflict of interest.
Dr van Erning has reported no conflict of interest.
Prof Dr Lemmens has reported no conflict of interest.
References
Aarts MJ, Koldewijn EL, Poortmans PM, et al. The impact of socioeconomic
status on prostate cancer treatment and survival in the southern Netherlands.
Urology 2013; 81:593–599.
Booth CM, Tannock IF. Randomised controlled trials and population-based
observational research: partners in the evolution of medical evidence. Br J
Cancer 2014; 110:551–555.
Brierley JD, Gospodarowicz MK, Wittekind C (Eds). TNM Classification of
Malignant Tumours, 8th edition. Oxford: John Wiley & Sons Inc., 2016.
Buyse M, Burzykowski T, Carroll K, et al. Progression-free survival is a surrogate
for survival in advanced colorectal cancer. J Clin Oncol 2007; 25:5218–5224.
Clarke M, Halsey J. DICE 2: a further investigation of the effects of chance in
life, death and subgroup analyses. Int J Clin Pract 2001; 55:240–242.
Cronin KA, Feuer EJ. Cumulative cause-specific mortality for cancer patients in
the presence of other causes: a crude analogue of relative survival. Stat Med
2000; 19:1729–1740.
Dos Santos Silva I. Cancer Epidemiology: Principles and Methods. Lyon: Inter-
national Agency for Research on Cancer, 1999; pp. 263–276.
EUROCARE. Survival of Cancer Patients in Europe, 1999–2007: The EURO-
CARE-5 Study. Eur J Cancer 2015; 51:2099–2268.
Gunderson LL, Jessup JM, Sargent DJ, et al. Revised TN categorization for
colon cancer based on national survival outcomes data. J Clin Oncol 2010;
28:264–271.
40 Vissers et al.
Janssen-Heijnen ML, Houterman S, Lemmens VE, et al. Prognostic impact
of increasing age and co-morbidity in cancer patients: a population-based
approach. Crit Rev Oncol Hematol 2005; 55:231–240.
Joosse A, Collette S, Suciu S, et al. Sex is an independent prognostic indicator
for survival and relapse/progression-free survival in metastasized stage III to
IV melanoma: a pooled analysis of five European Organisation for Research
and Treatment of Cancer randomized controlled trials. J Clin Oncol 2013;
31:2337–2346.
Kim MJ, Jeong SY, Choi SJ, et al. Survival paradox between stage IIB/C (T4N0)
and stage IIIA (T1-2N1) colon cancer. Ann Surg Oncol 2015; 22:505–512.
Maringe C, Walters S, Rachet B, et al. Stage at diagnosis and colorectal cancer
survival in six high-income countries: a population-based study of patients
diagnosed during 2000-2007. Acta Oncol 2013; 52:919–932.
National Cancer Institute. Measures of Cancer Survival. Available from: http://
surveillance.cancer.gov/survival/measures.html (24 January 2018, date last
accessed).
O’Connell JB, Maggard MA, Ko CY. Colon cancer survival rates with the new
American Joint Committee on Cancer sixth edition staging. J Natl Cancer
Inst 2004; 96:1420–1425.
O’Connell MJ, Mailliard JA, Kahn MJ, et al. Controlled trial of fluorouracil and
low-dose leucovorin given for 6 months as postoperative adjuvant therapy for
colon cancer. J Clin Oncol 1997; 15:246–250.
Punt CJ, Koopman M, Vermeulen L. From tumour heterogeneity to advances
in precision treatment of colorectal cancer. Nat Rev Clin Oncol 2017;
14:235–246.
Sargent DJ, Patiyil S, Yothers G, et al. End points for colon cancer adjuvant
trials: observations and recommendations based on individual patient data
from 20,898 patients enrolled onto 18 randomized trials from the ACCENT
Group. J Clin Oncol 2007; 25:4569–4574.
Sargent DJ, Wieand HS, Haller DG, et al. Disease-free survival versus overall
survival as a primary end point for adjuvant colon cancer studies: individual
patient data from 20,898 patients on 18 randomized trials. J Clin Oncol 2005;
23:8664–8670.
Schaffar R, Rachet B, Belot A, Woods L. Cause-specific or relative survival set-
ting to estimate population-based net survival from cancer? An empirical
evaluation using women diagnosed with breast cancer in Geneva between
1981 and 1991 and followed for 20 years after diagnosis. Cancer Epidemiol
2015; 39:465–472.
Prognosis 41
Seeger JD, Kurth T, Walker AM. Use of propensity score technique to account
for exposure-related covariates: an example and lesson. Med Care 2007;
45:S143–S148.
Shahir MA, Lemmens VE, van de Poll-Franse LV, et al. Elderly patients with rec-
tal cancer have a higher risk of treatment-related complications and a poorer
prognosis than younger patients: a population-based study. Eur J Cancer
2006; 42:3015–3021.
Suissa S. Immortal time bias in observational studies of drug effects. Pharma-
coepidemiol Drug Safety 2007; 16:241–249.
Sobin LH, Wittekind C (Eds). TNM Classification of Malignant Tumours,
6th edition (UICC). New York: Wiley, 2002.
Tang PA, Bentzen SM, Chen EX, Siu LL. Surrogate end points for median over-
all survival in metastatic colorectal cancer: literature-based analysis from 39
randomized controlled trials of first-line chemotherapy. J Clin Oncol 2007;
25:4562–4568.
van Erning FN, van Steenbergen LN, Lemmens VE, et al. Conditional survival
for long-term colorectal cancer survivors in the Netherlands: who do best?
Eur J Cancer 2014; 50:1731–1739.
van Steenbergen LN, Lemmens VE, Rutten HJ, et al. Increased adjuvant treat-
ment and improved survival in elderly stage III colon cancer patients in The
Netherlands. Ann Oncol 2012; 23:2805–2811.
van Steenbergen LN, Rutten HJ, Creemers GJ, et al. Large age and hospital-
dependent variation in administration of adjuvant chemotherapy for stage
III colon cancer in southern Netherlands. Ann Oncol 2010; 21:1273–1278.
Verwaal VJ, Bruin S, Boot H, et al. 8-year follow-up of randomized trial: cytore-
duction and hyperthermic intraperitoneal chemotherapy versus systemic
chemotherapy in patients with peritoneal carcinomatosis of colorectal can-
cer. Ann Surg Oncol 2008; 15:2426–2432.
Ward E, Jemal A, Cokkinides V, et al. Cancer disparities by race/ethnicity and
socioeconomic status. CA Cancer J Clin 2004; 54:78–93.
Yoon HH, Shi Q, Alberts SR, et al. Racial differences in BRAF/KRAS mutation
rates and survival in stage III colon cancer patients. J Natl Cancer Inst 2015;
107:djv186.
42 Vissers et al.
Cancer Registries
O. Visser
4
Department of Registration, Netherlands Comprehensive
Introduction
Cancer registries collect information on cancer cases. Hospital-based
cancer registries collect information on all cancer cases in a hospital,
while population-based cancer registries collect information on cancer
cases in a certain geographical area. This chapter is about population-
based cancer registries and how reports from these might be used when
making decisions about cancer.
Many cancer registries in Europe are national and cover the whole
country (e.g. in Denmark, Slovenia or Belgium), while other registries
only cover smaller geographical areas, such as departments in France
or provinces in Italy or Spain. Most cancer registries are general regis-
tries, which means they cover all cancer types. However, there are also
specialised cancer registries, for example those for childhood cancer or
haematological malignancies, or rare cancers (see Chapter 10).
The number of cancer registries in Europe has been increasing over the last
50 years from only a few in the middle of the 20th century to more than
200 by 2016 (European Network of Cancer Registries [ENCR], 2016). The
cancer registries of Denmark and Slovenia are among the oldest in Europe.
Notification and Completeness of Cancer Registries

As population-based cancer registries are supposed to register all cancer
cases in their catchment area, complete notification of these cases is
essential. Multiple notification sources are a prerequisite to reach a
satisfactory level of completeness. A level of completeness of at least
95% of all cases is generally considered acceptable.
43
Sources of notification that are used by cancer registries include pathology
laboratories, hospital discharge registries and health insurance companies
(Table 1). Nowadays, the vast majority of cancer cases are pathologi-
cally confirmed, meaning that notification by pathology laboratories is
the main source of notification in many registries. In countries where
the cancer registry has a legal basis and notification is compulsory, gen-
eral practitioners and medical specialists also report cancer cases to the
cancer registries.
Table 1 Main Notification Sources

• Pathology laboratories
• Hospital discharge registries
• Death certificates
• Haematology departments
• Radiotherapy departments
• Health insurance companies
• Medical specialists, general practitioners
Combining all cancers (excluding basal cell carcinoma of the skin) and
regions globally, approximately one in every two cancer patients die of
their disease (Ferlay et al, 2013), which means that death certificates
from these patients provide a worthwhile additional source of notifica-
tion. Almost all cancer registries in Europe use death certificates as a
notification source, but in some countries (e.g. Sweden, Netherlands)
the use of death certificates is not possible because of national legisla-
tion. When the death certificate is the only source of notification, the
registries try to ‘trace back’ the patient in order to collect the necessary
data on their original diagnosis of cancer. The cases that are only noti-
fied by death certificates and for which trace back was not possible are
called ‘death certificate only’ (DCO). The proportion of DCO cases in a
cancer registry is generally considered as a measure of (in)completeness
of the registry. The lower the proportion of DCO cases, the higher the
level of completeness. Because (in)completeness may differ by cancer
site, incompleteness is best calculated by cancer site.
44 Visser
Examples
Assumption: 90% of all cases in the cancer registry are notified by
regular sources.
1. If, for a particular cancer site, all patients die of their disease and
90 out of 100 cases (all of whom die of their disease) are notified
by regular sources, the remaining 10 would be notified by DCO.
Therefore, the proportion of DCO=10% and completeness=100%
(90+10 out of 100).
2. For another cancer site, if half the patients die of their disease and 90
out of 100 cases are notified by regular sources (45 surviving cases
and 45 who die of their disease), there would be 5 DCO cases and 5
survivors with no information provided to the registry. The propor-
tion of DCO=5/95=5.3% and completeness=95% (90+5 out of 100).
3. For a third cancer site, if no patients die of their disease and 90
out of 100 cases are notified by regular sources (all of whom sur-
vive their disease), there will be no DCO cases. The proportion of
DCO=0/90=0% and completeness=90% (90 out of 100).
The examples illustrate that a high proportion of DCO cases is most prob-
lematic (as far as completeness is concerned) for cancer types with high
survival rates and that 5% DCO is roughly equivalent to a completeness
of 95%. Therefore, up to 5% DCO can be considered acceptable.
Minimal Data Set

Each cancer registry should at least register the items in the
WHO/IACR (World Health Organization/International Association of
Cancer Registries) minimal data set (Table 2) (MacLennan, 1991).
Ethnicity is included in the minimal data set, but is considered a controver-
sial item and therefore not registered in most European countries. Besides,
ethnicity is not available in many of the sources used for most registries.
However, country of birth is available in some registries, such as in the
Netherlands, and several studies have been carried out using country of
birth. These include both epidemiological and quality of care studies.
Cancer Registries 45
Table 2 The WHO/IACR Minimal Data Set for Cancer Registries
• Personal identification (name, identification number)
• Date of birth
• Sex
• Ethnicity
• Address at diagnosis
• Vital status
• Date of death/date of follow-up
• Source of information (hospital, etc.)
• Incidence date (date of diagnosis)
• Most valid basis of diagnosis
• Topography (site of the tumour)
• Morphology, behaviour
Abbreviation: IACR, International Association of Cancer Registries; WHO, World Health Organization.
While in the past the International Classification of Diseases (ICD) was

used to classify cancer, it is now recommended to use the International
Classification of Diseases for Oncology (ICD-O) instead (Muir and
Percy, 1991). ICD-O includes morphology codes, allowing cancers to be
classified in greater detail than ICD. The current 3rd edition of ICD-O
(Fritz et al, 2000) is based on ICD-10 but includes important updates
on the classification of haematological malignancies and some other
cancers which are not yet included in ICD-10. The 4th edition of ICD-O
is expected to reflect changes of ICD-11, which is due in 2018.
Using up-to-date classifications in cancer registries is prerequisite for
following current trends in clinical practice. The study of trends over
time is sometimes hampered by changes in the classification, especially
when it is impossible to convert between different versions of the clas-
sification system. This is the case for the majority of haematological
malignancies. Consequently, time-trend analyses of haematological
malignancies can only be performed at an aggregated level (lymphoma,
leukaemia) or for entities that did not change over time (Hodgkin
lymphoma, multiple myeloma). Changes in the classification also make
it difficult to compare results from different registries if they do not use
the same classification system in the same period of time.
46 Visser
Supplementary Items
Along with the minimal data set, many cancer registries collect sup-
plementary items. The most important items include stage and primary
treatment. Most registries that collect data on stage use the Union for
International Cancer Control (UICC) Tumour, Node, Metastasis (TNM)
Classification of Malignant Tumours. However, without direct access to
medical files of cancer patients, collecting TNM data for cancer regis-
tries is challenging. Only a few registries have direct access to medical
files and consequently only a few European registries have reliable TNM
data on all registered cancer patients.
Treatment data on generally specified treatment modalities (surgery,
radiotherapy, chemotherapy) are widely collected, but only a few
registries (among them specialised registries) collect detailed informa-
tion on the therapy, such as the type of surgery, chemotherapy or targeted
therapies.
Table 3 gives an overview of supplementary items that may also be col-
lected by cancer registries. They are generally collected for a subset of
selected cancers during a certain period of time, sometimes referred to
as a high-resolution study.
Table 3 Supplementary Items
• Diagnostics (e.g. computed tomography [CT] or positron emission tomography [PET] scanning)
• Number of metastatic lymph nodes
• Sites of distant metastases
• Dates of treatment (e.g. date of surgery, start and stop date of chemotherapy, etc.)
• Clinical symptoms
• Cytogenetics
• Molecular diagnostics
• Recurrence data
Coding Rules
For all registered items, there are internationally agreed coding rules.
These are necessary because even a seemingly simple item such as a
person’s gender can become a contentious issue, because gender can
change. This raises questions such as whether a person’s gender should
be registered as it is at diagnosis or as the sex they had at birth. The latter
might be preferred because people who are transgender may still have
sex-specific organs.
The IACR/International Agency for Research on Cancer (IARC) coor-
dinate the drawing up of the coding rules. Preferably, coding rules are to
be changed as little as possible, as changing rules could disturb the study
of time trends. The protocols of international collaborative studies gen-
erally refer to the international coding rules, which makes it difficult to
participate in those kinds of studies if registries do not follow those rules.
Coding rules are most relevant for coding multiple tumours and the inci-
dence date (date of diagnosis). The rules for multiple tumours (Table 4) have
an impact on the incidence of some cancers, mainly for organs where
multiple tumours per organ are not uncommon (such as cancer of the
skin, bladder, colon or breast). Incidence rates may differ considerably
if only one cancer per organ is registered or if more than one can be
registered. Cancer registries may deviate from the international coding
rules as long as they are able to apply the agreed rules when calculating
incidence.
Table 4 Main Coding Rules for Multiple Tumours

(Source: ICD-O-3)
1. Recognition of the existence of two or more primary cancers does not depend on time
2. A primary cancer is one that originates in a primary site or tissue and is not an extension, a recurrence, or a
metastasis
3. Only one tumour shall be recognised as arising in an organ or a pair of organs or a tissue
The above rules result in counting only one incident breast cancer in a woman who has bilateral breast cancer.
A cancer registry may register both cancers but reports only one incident breast cancer. Therefore, the number of
registered cancers is not always equivalent to the number of incident cases.
Abbreviation: ICD-O, International Classification of Diseases for Oncology.
The rules for coding the incidence date are even more complex (Table 5).
However, consistency is important because the incidence date
influences the length of survival.
48 Visser
Example
If the first signs of the malignancy precede the pathological confirma-
tion by 6 months and the patient dies 3 months after the pathological
confirmation, the survival would be 9 months if the date of first signs
was selected as incidence date, or only 3 months if the date of patho-
logical confirmation was registered as the incidence date.
Table 5 Order of Declining Priority for the Incidence Date

(Source: ENCR)
1. Date of first pathological confirmation
2. Date of admission to the hospital
3. Date of first consultation at the out-patient clinic
4. Other dates
5. Date of death
Whichever date is selected, the incidence date should not be later than the date of the start of the treatment.
Abbreviation: ENCR, European Network of Cancer Registries.
Follow-up
Follow-up of the patient’s vital status is one of the standard activities of
cancer registries. However, this generally does not include the collection
of recurrence data.
When following up for vital status, registries might use ‘active’ follow-
up (in which the status of all patients is checked periodically in relevant
hospitals or in population registers) or link at regular intervals to popula-
tion registers. The latter procedure can result in virtually 100% complete
follow-up, as long as linkages can be performed with a unique identifica-
tion number for the patient, which is possible in many Northern Euro-
pean countries, or with complete identifying data (such as name, date of
birth, etc.). In addition, population registers must supply information on
emigration because follow-up will be incomplete in registries that do not
receive information about people leaving their territory.
Epidemiological Studies with Cancer Registry Data
Numerous epidemiological studies have been performed on cancer inci-
dence, mortality and survival using data from cancer registries. The
EUROCARE studies on the survival of cancer patients in Europe are
among the most extensive of these. They have shown a large variation in
incidence of cancer in Europe (Ferlay et al, 2013), which suggests that
preventive measures should be considered in countries with high inci-
dence rates for specific cancers.
The EUROCARE studies have also found large variations in survival
rates within Europe (De Angelis et al, 2014), although not as large as for
incidence. For example, survival rates have been shown to be improving
for almost all cancers over time (De Angelis et al, 2014). However, in
some cancers, such as lung cancer, the trend for improvement in survival
is relatively slow (Francisci et al, 2015). The EUROCARE studies have
shown that survival is poorer in Central and Eastern European coun-
tries than in countries in Northern and Western Europe (De Angelis et
al, 2014). A clear relation between the per capita income in a country
and survival is generally observed (Gatta et al, 2013). However, there
are exceptions. For example, Denmark and the United Kingdom have
relatively poor survival rates in relation to their per capita income.
Comparison of incidence and survival between registries or countries
may be hampered by variations in data quality in the cancer registries.
It should always be taken into account that (selective) incompleteness
may influence the results of any comparison.
■ Incompleteness in incidence will result in a lower incidence rate lead-
ing to a ‘more favourable’ outcome
■ Incompleteness in follow-up will lead to a lower number of patients
who died and, consequently, to higher survival rates and a ‘more
favourable’ outcome
In other words, one should bear in mind that favourable outcomes of
incidence or survival may be the result of poor data. Therefore, indica-
tors of data quality, such as the proportion of DCOs, are essential for
interpreting the results.
50 Visser
Studies on the stage distribution over time for specific cancers
(e.g. breast cancer) can evaluate the efficacy of screening programmes
for early detection (see Chapter 2). When screening is aimed at the early
detection of pre-cancerous lesions, such as in the case of cervical cancer
screening, the efficacy of the screening can be monitored by following
incidence rates over time.
Quality of Care Studies with Cancer Registry Data

A randomised controlled trial (RCT) (see Chapter 6) is the best tool
for comparing the outcomes of patient groups receiving different can-
cer treatments. However, in many instances, RCTs are not possible and,
when they are performed, they might include highly selected patients.
For example, cancer patients in RCTs are usually younger than general
cancer patients, have no or only few comorbidities and are more likely
to be treated in university hospitals or specialised cancer centres. There-
fore, the outcome data reported across the intervention groups in RCTs
may not be valid for the general cancer patient population. Studies using
cancer registry data can be supplementary to RCTs or, in the case when
an RCT is not feasible, cancer registries might be the best source of evi-
dence on the effects of a treatment.
Example
When stereotactic radiotherapy became available in the Netherlands
for elderly stage I non-small cell lung cancer (NSCLC) patients, an
RCT was planned to compare stereotactic radiotherapy with tradi-
tional treatment modalities for these patients. For several reasons, the
RCT did not take place and an alternative study using cancer registry
data was set up. In a number of publications (Haasbeek et al, 2012;
Palma et al, 2010; Palma et al, 2011), it was shown that the intro-
duction of stereotactic radiotherapy in the Netherlands was followed
by a decrease in the proportion of untreated elderly NSCLC patients,
increased access to curative treatment and increased overall survival.
Cancer registry data can also be used in volume-outcome research.
Examples of quality of care studies include studies designed to investigate:
■ The relationship between the number of treated patients and their out-
comes, where both postoperative mortality and overall survival can
be used as outcome measures. In many studies, a correlation between
small volumes and high postoperative mortality was observed
(Reames et al, 2014)
■ The relationship between the number of resected lymph nodes and
survival (Wu et al, 2016)
■ The effect on survival of introducing new targeted therapies (Thielen
et al, 2016)
■ The influence of country of birth on treatment and survival (Arnold
et al, 2013; Elferink et al, 2016)
For these types of study, it might be essential for the researcher to
have access to clinical data in the cancer registry or the ability to link
cancer registry data to clinical data. Unfortunately, only a small number
of European cancer registries provide such options at this time.
Conclusions
An extensive network of population-based cancer registries has been
established during the past 50 years in Europe. These registries can pro-
vide important information on the incidence and survival of cancer in
Europe. The effects of preventative measures can be monitored and the
effects of therapies that are introduced into clinical practice can be fol-
lowed over time. A variety of quality of care studies can be performed
based on data from cancer registries. This applies especially to clinical
treatment settings for which no RCT data are available.
Dr Visser has reported no conflict of interest.
52 Visser
References
Arnold M, Aarts MJ, Siesling S, et al. Diverging breast and stomach cancer inci-
dence and survival in migrants in The Netherlands, 1996-2009. Acta Oncol
2013; 52:1195–1201.
De Angelis R, Sant M, Coleman MP, et al; EUROCARE-5 Working Group.
Cancer survival in Europe 1999-2007 by country and age: results of EURO-
CARE–5-a population-based study. Lancet Oncol 2014; 15:23–34.
Elferink MA, Lamkaddem M, Dekker E, et al. Ethnic inequalities in rectal can-
cer care in a universal access healthcare system: a nationwide register-based
study. Dis Colon Rectum 2016; 59:513–519.
European Network of Cancer Registries (ENCR). Registry contact list avail-
able from: http://www.encr.eu/index.php/who-we-are/registry-contact-list
Ferlay J, Soerjomataram I, Ervik M, et al. GLOBOCAN 2012 v1.0, Cancer Inci-
dence and Mortality Worldwide: IARC CancerBase No. 11 [Internet]. Lyon,
France: International Agency for Research on Cancer, 2013. Available from:
http://globocan.iarc.fr (24 January 2018, date last accessed).
Francisci S, Minicozzi P, Pierannunzio D, et al; EUROCARE-5 Working Group.
Survival patterns in lung and pleural cancer in Europe 1999-2007: Results
from the EUROCARE-5 study. Eur J Cancer 2015; 51:2242-2253.
Fritz A, Percy C, Jack A, et al. World Health Organization International Classifi-
cation of Diseases for Oncology (ICD-O), 3rd edition. Geneva, Switzerland:
World Health Organization, 2000.
Gatta G, Trama A, Capocaccia R. Variations in cancer survival and patterns of
care across Europe: roles of wealth and health-care organization. J Natl Can-
cer Inst Monogr 2013; 2013:79–87.
Haasbeek CJ, Palma D, Visser O, et al. Early-stage lung cancer in elderly patients:
a population-based study of changes in treatment patterns and survival in the
Netherlands. Ann Oncol 2012; 23:2743–2747.
MacLennan R. Items of patient information which may be collected by regis-
tries. In: Jensen OM, Parkin DM, MacLennan R, Muir CS, Skeet RS (Eds).
Cancer Registration: Principles and Methods. IARC Scientific Publications
No. 95. Lyon, France: IARC, 1991.
Muir CS, Percy S. Classification and coding of neoplasms. In: Jensen OM,
Parkin DM, MacLennan R, Muir CS, Skeet RS (Eds). Cancer Registration:
Principles and Methods. IARC Scientific Publications No. 95. Lyon, France:
IARC, 1991.
Palma D, Visser O, Lagerwaard FJ, et al. Impact of introducing stereotactic lung
radiotherapy for elderly patients with stage I non-small-cell lung cancer: a
population-based time-trend analysis. J Clin Oncol 2010; 28:5153–5159.
Palma D, Visser O, Lagerwaard FJ, et al. Treatment of stage I NSCLC in elderly
patients: a population-based matched-pair comparison of stereotactic radio-
therapy versus surgery. Radiother Oncol 2011; 101:240–244.
Reames BN, Ghaferi AA, Birkmeyer JD, Dimick JB. Hospital volume and oper-
ative mortality in the modern era. Ann Surg 2014; 260:244–251.
Thielen N, Visser O, Ossenkoppele G, Janssen J. Chronic myeloid leukaemia in
the Netherlands: a population-based study on incidence, treatment and sur-
vival in 3,585 patients from 1989-2012. Eur J Haematol 2016; 97:145–154.
Wu SG, Zhang ZQ, Liu WM, et al. Impact of the number of resected lymph
nodes on survival after preoperative radiotherapy for esophageal cancer.
Oncotarget 2016; 7:22497–22507.
54 Visser
Drug Development
(Including Phase 1 Trials) 5
M. D’Incalci
I. Fuso Nerini
V. Fotia
Department of Oncology, IRCCS-Istituto di Ricerche Farmacologiche
Mario Negri, Milan, Italy
Introduction
Progress in understanding the molecular basis of cancer has been sup-
ported in recent years by the introduction of new automation-enabling
technologies, which have prompted the discovery of many new and poten-
tially therapeutic chemicals. Yet the development of new drugs, especially
in oncology, remains an extremely difficult and expensive process.
Drug development involves specialists from many different areas of
science, including chemists, biologists, healthcare providers, clinicians
and governmental health regulators. Practice guidelines that ensure the
proper design, performance and monitoring of drug development have
been instituted by regulatory agencies. Examples of such guidelines are:
■ Good Laboratory Practice (GLP)
■ Good Manufacturing Practice (GMP)
■ Good Clinical Practice (GCP)
Drug development consists of preclinical and clinical phases

(Phases I-IV). The preclinical phase includes:
■ The discovery of new molecular entities and their optimisation
■ The selection of suitable compounds for clinical evaluation
55
Clinical Phases I, II and III assess:
■ The safety and therapeutic efficacy of a new drug in patients
■ Its pharmacokinetic and pharmacodynamic properties
Phase IV, also called pharmacovigilance, is the post-marketing surveil-

lance for evaluation of adverse effects.
This chapter focuses on preclinical and clinical Phase I studies. Areas
which might benefit from conceptual or practical improvements are
highlighted. Clinical development processes using randomised trials
are discussed in Chapter 6.
Strategies in Drug Development

The development of new anticancer drugs tends to be influenced by com-
pound novelty and origin, which influences the strategy to be adopted.
We may distinguish between:
■ Novel agents originating from either natural sources or chemical syn-
thesis
■ Analogues of existing drugs offering improvements in therapeutic
index and/or in pharmacokinetic profile

■ Drugs already in clinical use which have acquired a novel therapeu-
tic application based on observations of unexpected pharmacological

effects (drug repositioning)
The past 20 years has seen a change in drug development: advances have
been made in the understanding of cancer biology, pharmacology, infor-
matics and chemistry. Many new drugs are now being designed to target
key components of early neoplasia and cancer progression, arising from
initiatives such as the Cancer Genome Project (Garber, 2005). Impor-
tantly, agents that have been designed in this way to date constitute sig-
nificant therapeutic improvements for only some haematological malig-
nancies and solid tumours, and the significant application of resources
has not yet led to a significant survival increase for the majority of cancer
patients.
56 D’Incalci et al.
Targeted Therapy: A Matter of Semantics
Originally, ‘targeted therapy’ referred to inhibitors of a specific bio-
logical target that selectively drives malignant transformation. However,
current use of the term is often imprecise.
Very few approved ‘targeted drugs’ have cancer-specific targets. Most
act on proteins which also have essential functions in normal tissue, such
as inhibitors of growth factors or angiogenic signalling. Recently devel-
oped immunotherapeutic agents are also directed at the immune cells
of the patient and not to cancer-specific proteins. Oncologists often use
the term ‘targeted’ when referring to drugs that act on targets other than
nucleic acids, regardless of which target is actually being hit.
The distinction between ‘targeted therapy’ and ‘chemotherapy’ is also con-
ceptually wrong and scientifically unsound. Even cytotoxic chemothera-
peutics hit molecular targets crucial for the growth or malignant behaviour
of tumours. Here, selectivity stems from complex biological differences
between tumour and normal tissues. For example, defects in DNA repair
can render cancer cells more susceptible to DNA-directed drugs. Therefore,
it seems prudent to use the term ‘targeted therapy’ as originally defined.
Target Discovery Precedes Drug Discovery

The anticancer drug discovery process usually starts with the search for
a promising target. This is a molecule specifically involved in tumour
cell growth and survival. New genomic technologies assisted by systems
biology approaches have greatly improved the discovery of aberrant reg-
ulatory pathways in cancer. Nevertheless, the selection and validation of
optimal pharmacological targets remain challenging. Several different
assays have been implemented for target validation (Hughes et al, 2011).
They analyse:
■ The role of the target in cancer progression
■ The consequences of its modulation in tumour and normal cells
The concept of synthetic lethality, according to which a target is essential

only in neoplastic cells that carry specific mutations, has led to the search
for new targets being expanded beyond oncogenes (Canaani, 2014).
Drug Development (Including Phase 1 Trials) 57

However, the search for cancer-specific targets is complicated by tumour
heterogeneity and by the development of drug resistance caused by
genomic variability and epigenetic changes. These are the major causes
for the disappointingly modest efficacy of many recently developed drugs.
Multi-targeted agents may be less prone than single targeted drugs to cir-
cumvention by target mutations or activation of bypass pathways. Thus,
an anticancer therapeutic should be designed to target a pathway rather
than a single molecule or, even better, to target and shut down a biologi-
cal hallmark of cancer rather than a single molecular pathway (Dobbel-
stein and Moll, 2014; Hanahan and Weinberg, 2011).
Components of the tumour microenvironment display a minor hetero-
geneity because they are not genetically mutated. Stromal elements have
been shown to induce a malignant phenotype in neoplastic cells and hin-
der drug penetration into the tumour. Combining cytotoxic agents with
compounds targeting the tumour microenvironment might be a promis-
ing alternative therapeutic strategy (Cairns et al, 2006).
Small Molecule Drug Discovery: Synthesis

and Optimisation
Once a biological target with viable disease linkage has been validated,
the next step is to discover molecules which can inhibit this target. Drugs
can be designed to inhibit:
■ Receptors
■ Enzymatic activities
■ Protein–protein interactions
Combinatorial chemical synthesis is a relatively novel and elegant way

to rapidly assemble new molecules with suitable pharmacophores, i.e.
functional groups which offer pharmacological efficacy. Many drug
searches involve the production of chemical libraries which consist
of promising scaffolds harbouring appropriate chemical permutations
(Gershell and Atkins, 2003). Libraries of molecules of natural origin are
also investigated, since nature remains a source of active compounds that
is unsurpassed in novelty and complexity.
Molecules from these libraries are subjected to biochemical and cellu-
lar assays to select those with desired activities, so-called ‘hits’. High
throughput screening (HTS) is a robotics system capable of simulta-
neously testing thousands of compounds. In silico prediction software
is often used for preliminary screening because it allows the target-led
selection of small subsets of molecules from vast chemical libraries.
Databases of small molecules together with molecular modelling of
the biological target allow virtual screening. Drug–target interactions
can be modelled, simulating efficacy, safety and/or pharmacokinetic
properties of the molecules under evaluation (Sliwoski et al, 2013). The
usefulness of virtual screening is limited by the discrepancy between
theoretically conceivable and readily synthesisable molecular structures.
So, preclinical drug discovery still requires extensive empirical experi-
mentation. Candidate hits are then analysed to define:
■ Chemical integrity
■ Synthetic accessibility
■ Functional behaviour
■ Structure–activity relationships
■ Physicochemical properties
■ Pharmacokinetic properties
A process named ‘lead optimisation’ helps compound choice by refin-

ing:
■ Potency
■ Selectivity
■ Physiochemical characteristics
Traditionally, rational drug design requires the knowledge of how sub-

stituents added to a molecular scaffold might alter drug properties. Now-
adays, the ‘hit-to-lead process’ entails a second combinatorial chemis-
try step that generates compound sublibraries with a major probability of
finding the best drug candidate.

Selection of a Drug: Preclinical Assays
Smart drug development requires the efficient elimination of all drug
candidates that are likely to fail in subsequent clinical trials. The bio-
chemical, cellular and in vivo assays described in the following sections
help researchers to investigate antitumour efficacy, pharmacokinetic
properties and toxic effects of the candidate compounds. These inves-
tigations are conducted in assays ‘in parallel’ rather than ‘in series’, in
order to optimise pharmacological properties simultaneously and iden-
tify the best drug candidates.
Preclinical Studies of Antitumour Efficacy
To investigate antitumour potential, in vitro screening assays are con-
ducted and dose–response curves are generated that compare the poten-
cies of compounds. Tumour cell lines grown in culture are commonly
used as the first line of study. The outcomes of these assays are:
■ Induction of cell death
■ Inhibition of cell proliferation
■ Modulation of cell phenotype
Often reporter systems or immunodetection methods are also used for

this purpose. In the past, drug screening approaches used a panel of
human cell lines, the most famous being the Developmental Therapeu-
tics Program of the US National Cancer Institute (NCI).
Example 1
The NCI-60 Human Tumor Cell Lines Screen was developed by the
NCI in the late 1980s and has served the global cancer research com-
munity for more than 20 years. It was designed to screen novel small
molecules for their potential anticancer activity, testing their ability
to inhibit the growth of 60 different human tumour cell lines. The
COMPARE algorithm provided an automated way of comparing the
biological response patterns of each compound and it was a valid tool
to help infer putative mechanisms of action (Shoemaker, 2006).
However, cancer cell lines in culture are fundamentally different from
cancer cells in a tumour mass. Some features that are challenging to
model in vitro are:
■ Growth rate
■ Metabolic activity
■ Cell–cell interactions
■ Cell–microenvironment interactions (chronic inflammation, angio-
genesis, evasion of immune response)

Thus in vitro screening assays cannot reliably predict human clinical
responses. The use of many cell lines from different origins might help
address the cancer heterogeneity. On the other hand, co-culture with
stromal components or new technologies allowing three-dimensional
tumour growth, such as organoids, may help reflect in vitro the tumour
microenvironment (Wilding and Bodmer, 2014).
Testing drugs on living organisms is an essential part of the drug devel-
opment process. Antitumour activity is evaluated as:
■ Increase in life span of an animal
■ Tumour growth inhibition
Modulation of levels of predictive biomarkers should be considered

in this preclinical setting. Murine models with implanted tumours are
widely employed. Tumours are distinguished as:
■ Syngeneic and xenografted
■ Originating from established cell lines or human biopsy specimens
■ Implanted ectopically or orthotopically
Xenograft animal models are particularly important to assess the anti-

tumour activity at the preclinical level. These experiments are not required
by the regulatory agencies. Nevertheless, they are extremely useful for
deciding if the new compound should be developed in the clinic and can
indicate which tumours might be more sensitive.

However, the predictive value of these models is not always completely
satisfactory. Key aspects of human malignancies difficult to reproduce in
such models are:
■ Intra-tumour heterogeneity
■ Interactions with the microenvironment/immune system
■ The metastasis process
Recently, genetically engineered mouse models (GEMMs) have

emerged, in which neoplasms are initiated through activation or over-
expression of oncogenes or depletion of tumour suppressors. GEMMs
allow analysis of tumour progression from in situ to locally advanced
or even widespread metastatic disease. They are limited by significant
differences between human and murine targets and by restricted genetic
alterations (Sharpless and Depinho, 2006).
Preclinical Pharmacokinetic Studies

Some in vitro assays help to determine:
■ How the compound overcomes the barriers between the sites of
administration and the target

■ How it binds to plasma proteins
■ How it is metabolised
Studies in animal models are still necessary to evaluate how the entire
body interacts with the compound. Conventional preclinical pharmacoki-
netic studies involve the treatment of animals and the measurement of
concentrations of the drug and its metabolites at specific time points in:
■ Blood
■ Tumour
■ Normal tissues
Drug concentrations in normal organs should always be evaluated to

explain potential non-target effects or to suggest some possible applica-
tion. For example, the presence of the compound in the brain indicates
its ability to cross the blood–brain barrier, suggesting its potential use
against cerebral tumours.
The challenge lies in being able to extrapolate these results to the clini-
cal environment. Conventional allometric scaling is the most common
method for predicting human pharmacokinetic variables. It is usually
based on differences in body surface area, although other parameters
such as body and organ weight, time normalisation, liver conjugation
activity and plasma protein binding have also been used (Pritchard et al,
2003).
Preclinical Toxicology Studies

Safety studies help select molecules with optimal therapeutic indexes.
An example of the assessment of effects of agents on normal cells in
vitro is the human ether-a-go-go K+ (hERG-K+) conductance assay. This
is an electrophysiology study used to screen out those agents likely to
cause QT interval prolongation, thus hampering cardiac repolarisation
(Pritchard et al, 2003).
However, examination of adverse effects in appropriate animal models is
the only way to assess consequences on the overall organism. Common
endpoints are:
■ Drug-related death
■ Net animal weight loss
■ Behavioural changes
■ Variability in food consumption
■ Organ weight loss
■ Histopathological alterations
Toxicity is evaluated after a single or repeated drug administration.

Prediction of toxic responses in the clinic, based on preclinical studies,
is limited. Mice, in particular, tolerate high doses of many anticancer
agents. Different toxicities between rodents and humans are mainly due
to diversity in:
■ Pharmacokinetic profile
■ Target cell sensitivity

Moreover, experiments in rodents are unsuitable for predicting:
■ Long-term toxicity, because of their brief life span
■ Human-specific side effects, such as muscle pain or fatigue
This justifies the additional use of a non-rodent species.

Retrospective studies of systemic drug exposure and toxicity in mice and
humans suggest that, for many cytotoxic agents, the plasma area under
the curve (AUC) is a good predictor of relative toxicity in humans (Col-
lins et al, 1986). Instead, for some other drugs, threshold concentration or
time dependencies are more strictly related to the onset of adverse effects.
Development of Anticancer Biologics

Biologics include proteins and other analogous products applicable to
the prevention or treatment of human cancer. They are usually produced
by genetically engineered cells, such as:
■ Bacteria
■ Yeast
■ Mammalian cells
Antibodies can be conjugated with other compounds, combining cyto-

toxic activity with high specificity. Some examples are:
■ Radioimmunoconjugates (e.g. 131I-tositumomab)
■ Antibodies conjugated with chemotherapeutic drugs (e.g. brentuxi-
mab vedotin, trastuzumab emtansine)

■ Immunotoxins (e.g. denileukin diftitox)
As for small molecules, the starting point for development of antitumour

antibodies is the generation of antibody libraries. Screening techniques
then select antibodies and bioengineering improves their affinity to the
desired target (for detailed information, see Hoogenboom, 2005).
It is important that animal models used to study the antitumour efficacy of
antibodies are carefully chosen. Because of the high specificity, most anti-
bodies do not cross the species barrier. Thus, antibodies against human anti-
gens are necessarily tested in human tumour-bearing mice. However, the
activity of the corresponding orthologue antibodies against murine tumours
should also be assessed to examine the effects on neoplastic stroma.
Towards Phase I Clinical Trials

A Phase I trial is the first occasion that the new candidate drug or drug
combination is administered to a human being. Its aims are:
■ The determination of safe dose levels and treatment schedules
■ The collection of preliminary data about efficacy, pharmacokinetic
and pharmacodynamic effects

Phase I trials of anticancer agents are conducted in cancer patients who,
usually, are no longer responding to existing therapies. This differs from
studies typically used to evaluate other new therapeutics. Particular
challenges posed by oncology Phase I trials are:
■ Narrow therapeutic window
■ Serious disease condition of the recruited patients
■ Inter-individual variation in drug response and toxicity
Criteria for a New Agent Eligible to Enter Phase I Clinical Development

1) It should be novel. First-in-class compounds with a novel mode of action
should be given priority. The development of ‘me-too drugs’ should be
discouraged, because these agents have the same mechanisms of action
as existing drugs and offer no clear therapeutic advantage.
2) It should have preclinical efficacy. A candidate drug should be inves-
tigated only if it has demonstrated a good therapeutic index in in
vitro and in vivo assays. Compounds should be prioritised for clinical
evaluation if they show efficacy against tumours that are resistant to
conventional therapies.
3) Its pharmacological properties should be favourable. Failures of drug
development in the clinic have often been attributed to suboptimal phar-
macokinetic properties. The absorption, distribution, metabolism and
excretion (ADME) profiles of agents in preclinical assays should be
known to be favourable, in addition to potency and selectivity (Loong
and Siu, 2013).

Several methodologies are available for Phase I trials, relevant to trial design
and choice of starting dose. They range from classic simple rule-based mod-
els, known as 3+3 models, to sophisticated computational models involving
Bayesian algorithms. Many experimental schemes are based on preclinical
toxicology and pharmacokinetic data. The preclinical study phase is often
governed by the ‘bench to bedside’ philosophy. Likewise, a ‘bedside to bench’
approach is desirable, learning from the observations made in the Phase 1
trial. This entails the tolerable dose and exposure established in humans being
re-tested in animal models to confirm efficacy (Lieu et al, 2013).
Selection of Patients to be Enrolled

The selection of patients to be enrolled needs careful consideration.
Patients with early-stage disease and good performance status are more
likely to benefit from new drugs than those in a poor state. Inclusion of
patients with different cancer histotypes ensures broad sampling of clinical
heterogeneity. Expansion cohort studies, for which additional patients are
enrolled at the recommended dose, have evolved as a means of optimising
data quality. Such patient enrichment can help reveal whether a certain
subgroup is particularly sensitive to the therapy (Manji et al, 2013).
Phase I Studies
Major Aims of Phase 1 Studies
1) Safety assessment and dose definition. Phase I trials define the
maximum tolerated dose (MTD) and the recommended dose for
testing in Phase II (RP2D). Differences in aims exist in relation to
anti-tumour drug type:
■ For cytotoxic agents, dose-limiting toxicity is commonly defined
within the first cycle of treatment. Since therapeutic and adverse

effects are often attributed to the same mechanisms, dosing is at the
level of highest toxicity tolerated.
■ For molecular targeted anticancer drugs, increasing the dose does
not correlate with greater clinical benefit. Here the aim of Phase I
trials is to establish the lowest dose which achieves adequate target
modulation and clinical activity, i.e. optimum biological dose (OBD).
Since this kind of agent is usually chronically administered, toxicities
are likely to be delayed or cumulative and cannot be detected after
the first treatment cycle (Loong and Siu, 2013). Even more difficult is
the application of a dose-escalation schedule for immunotherapeutic
agents, since side effects usually occur outside the window of dose-
limiting toxicity.
2) Pharmacokinetic analyses. Evaluation of pharmacokinetic param-
eters such as maximum serum concentration (Cmax), AUC, half-life
(T1/2) and clearance using different doses. Identification and determi-
nation of metabolites, assessment of the routes of elimination of the
drug/metabolites.
3) Pharmacodynamic analyses. Assessment of the effects of the drug,
possibly investigating the expression of the putative target and its
changes following drug treatment. Early introduction of biomarker
measurements for patient stratification is important to predict the
response and toxicity of the drug and thus pave the way for its further
rational development. Biomarkers include circulating tumour-derived
DNA and molecular and functional imaging technologies (Hoelder et
al, 2012).
Common Pitfalls in Phase I Trials

Interpretation of the results of Phase I trials can be erroneous, as there are
many pitfalls. Some examples are:
■ A long half-life of a drug in blood does not mean a prolonged thera-
peutic effect. The drug could persist in the systemic circulation at low
levels, detectable with analytical methods but insufficient to inhibit
the pharmacological targets.
■ Target modulation could persist even if the drug is no longer present
in the circulation. Exposure–response studies are particularly impor-

tant to understand the kinetics of the pharmacodynamic effect.
■ Drug concentration in plasma is not predictive of drug concentration
in the tumour. Because of practical and ethical reasons there are only a
few studies in which drug levels have been examined within tumours.
The results available indicate that drug concentrations can differ

substantially between plasma, primary tumour and metastases.
Drugs can distribute heterogeneously even within a tumour mass
(Fuso Nerini et al, 2014; Garattini et al, 2018).
■ High plasma-protein binding is not necessarily detrimental. If drug
affinity for the target is higher than that for proteins, a protein-bound
drug does not reduce its ability to reach its target. Protein binding can
even help stabilise drugs in blood, avoiding their dispersion through-
out the entire body fluid compartment.
■ It is important to select appropriate biomarkers of antitumour efficacy.
A predictive indicator can be difficult to identify even if the mecha-
nism of action is well understood. The heterogeneous and dynamic
nature of cancer also provides a challenge in this area.
Example 2
As for antiangiogenic therapy, a correlation between clinical outcome
and circulating levels of the proangiogenic factor VEGF was hardly
observed. Novel imaging methods are now emerging as potential pre-
dictive biomarkers, taking advantage of the minimal invasivity and of
the opportunity for serial measurements (Jain et al, 2009).
Conclusions
Recent scientific and technological breakthroughs justify optimism for
the discovery of new, effective anticancer drugs. Innovative oncology
therapies have already transformed some treatments of relatively uncom-
mon cancers into long-term disease management and suggest new ways
for drug development for cancer. Efforts should be made to reduce
the dropout rate of compounds entering the clinical phase, in order to
decrease the financial consequences of failure and, more importantly,
to avoid recruiting patients into trials that are predestined to fail. There
is a need to predict drug efficacy, toxicity and ADME profiles early in
the development process, thus increasing the chance of success of a
candidate drug in late-stage clinical trials.
Dr D’Incalci has reported no conflict of interest.
Dr Fuso Nerini has reported no conflict of interest.
Dr Fotia has reported no conflict of interest.
Further Reading
Begley CG, Ellis LM. Raise standards for preclinical cancer research. Nature
2012; 483:531–533.
Ellis LM, Fidler IJ. Finding the tumor copycat. Therapy fails, patients don’t. Nat
Med 2010; 16:974–975.
Fojo T, Parkinson DR. Biologically targeted cancer therapy and marginal ben-
efits: are we making too much of too little or are we achieving too little by
giving too much? Clin Cancer Res 2010; 16:5972–5980.
Kamb A, Wee S, Lengauer C. Why is cancer drug discovery so difficult? Nat Rev
Drug Discov 2007; 6:115–120.
Lengauer C, Diaz LA Jr, Saha S. Cancer drug discovery through collaboration.
Nat Rev Drug Discov 2005; 4:375–380.
LoRusso PM, Boerner SA, Seymour L. An overview of the optimal planning,
design, and conduct of phase I studies of new therapeutics. Clin Cancer Res
2010; 16:1710–1718.
Moffat JG, Rudolph J, Bailey D. Phenotypic screening in cancer drug discovery
– past, present and future. Nat Rev Drug Discov 2014; 13:588–602.
Sledge GW Jr. What is targeted therapy? J Clin Oncol 2005; 23:1614–1615.
References
Cairns R, Papandreou I, Denko N. Overcoming physiologic barriers to cancer
treatment by molecularly targeting the tumor microenvironment. Mol Cancer
Res 2006; 4:61–70.
Canaani D. Application of the concept synthetic lethality toward anticancer
therapy: A promise fulfilled? Cancer Lett 2014; 352:59–65.
Collins JM, Zaharko DS, Dedrick RL, Chabner BA. Potential roles for preclini-
cal pharmacology in phase I clinical trials. Cancer Treat Rep 1986; 70:73–80.
Dobbelstein M, Moll U. Targeting tumour-supportive cellular machineries in
anticancer drug development. Nat Rev Drug Discov 2014; 13:179–196.
Fuso Nerini I, Morosi L, Zucchetti M, et al. Intratumor heterogeneity and its
impact on drug distribution and sensitivity. Clin Pharmacol Ther 2014;
96:224–238.

Garattini S, Fuso Nerini I, D’Incalci M. Not only tumor but also therapy hetero-
geneity. Ann Oncol 2018; 29:13–19.
Garber K. Human Cancer Genome Project moving forward despite some doubts
in community. J Nat Cancer Inst 2005; 97:1322–1324.
Gershell LJ, Atkins JH. A brief history of novel drug discovery technologies. Nat
Rev Drug Discov 2003; 2:321–327.
Hanahan D, Weinberg RA. Hallmarks of cancer: the next generation. Cell 2011;
144:646–674.
Hoelder S, Clarke PA, Workman P. Discovery of small molecule cancer drugs:
successes, challenges and opportunities. Mol Oncol 2012; 6:155–176.
Hoogenboom HR. Selecting and screening recombinant antibody libraries. Nat
Biotechnol 2005; 23:1105–1116.
Hughes JP, Rees S, Kalindjian SB, Philpott KL. Principles of early drug discov-
ery. Br J Pharmacol 2011; 162:1239–1249.
Jain RK, Duda DG, Willett CG, et al. Biomarkers of response and resistance to
antiangiogenic therapy. Nat Rev Clin Oncol 2009; 6:327–338.
Lieu CH, Tan AC, Leong S, et al. From bench to bedside: lessons learned in
translating preclinical studies in cancer drug development. J Natl Cancer Inst
2013; 105:1441–1456.
Loong HH, Siu LL. Selecting the best drugs for phase I clinical development and
beyond. Am Soc Clin Oncol Educ Book 2013; 469–473.
Manji A, Brana I, Amir E, et al. Evolution of clinical trial design in early drug
development: systematic review of expansion cohort use in single-agent
phase I cancer trials. J Clin Oncol 2013; 31:4260–4267.
Pritchard JF, Jurima-Romet M, Reimer ML, et al. Making better drugs: deci-
sion gates in non-clinical drug development. Nat Rev Drug Discov 2003;
2:542–553.
Sharpless NE, Depino RA. The mighty mouse: genetically engineered mouse
models in cancer drug development. Nat Rev Drug Discov 2006; 5:741–754.
Shoemaker RH. The NCI60 human tumour cell line anticancer drug screen. Nat
Rev Cancer 2006; 6:813–823.
Sliwoski G, Kothiwale S, Meiler J, Lowe EW Jr. Computational methods in drug
discovery. Pharmacol Rev 2013; 66:334–395.
Wilding JL, Bodmer WF. Cancer cell lines for drug discovery and development.
Cancer Res 2014; 74:2377–2384.
Randomised Trials
M. Clarke
6
Northern Ireland Clinical Trials Unit and Northern Ireland
Introduction
There are many occasions in cancer care when more than one treatment
may be suitable for a patient, or when a promising new therapy becomes
available but there are doubts about how much better it will be than the
existing options. In choosing between the alternatives for a particular
patient, we need good evidence of the likely differences in their effects
for that patient. These can then be discussed with the patient, along with
other issues that will affect their choice. Randomised trials, and system-
atic reviews of these (see Chapter 9), provide evidence about the size of
the differences between treatments. To be reliable, this evidence needs to
come from studies which have minimised the potential impact of chance
and bias. This can be achieved through randomised trials which recruit
a sufficient number of participants to allow us to be confident that any
differences are not due to chance and which allocate patients between
the treatments in a fair way, such that the findings will be due to true
differences between the effects of the treatments and not to underlying
differences between the patients allocated to the different groups.
This chapter looks at this study design, which is common across cancer,
with tens of thousands of such studies already conducted in screening,
diagnosis, treatment and palliative care. The chapter discusses key fea-
tures of randomised trials, while Chapter 9 discusses how combining the
findings of trials in systematic reviews further decreases the potential
impact of chance and bias, and increases the likelihood that the answer
from the research is sufficiently reliable to permit well-informed deci-
sions about the care of patients with cancer.
71
Using Systematic Reviews When Designing a
Randomised Trial
There is growing concern about waste in research (Macleod et al, 2014).
One source of this arises when new trials are conducted that have not
taken proper account of the existing evidence base. Such trials fail to
build on what is already known and might address issues that no longer
need further research or miss opportunities to resolve important uncer-
tainties (Clarke et al, 2014). A systematic review of existing randomised
trials will help to provide the scientific, ethical and environmental jus-
tification for the new study (Clarke, 2004). The review should help the
researchers to clarify the topics they wish to investigate and to formulate
a clear question for their trial. It will also make it much easier to place
the findings of the trial in the context of other relevant research when the
trial is finished (Clarke and Hopewell, 2013), thereby making it clearer
to the reader what the new study adds (Clark and Horton, 2010). How-
ever, when consulting a systematic review in the planning of a trial, it is
important to consider its quality and the potential for publication bias
to have impacted on its findings (see Chapter 9) (Hopewell et al, 2009).
Formulating the Question for a Randomised Trial

In any piece of research, it is important to ensure that the question that
requires answering has been formulated correctly. In randomised trials,
this will underpin the choice of the interventions to be compared, the
patients to recruit and the outcomes to measure (see Chapter 7). In the
most straightforward randomised trial, two interventions will be com-
pared (one of which might be ‘usual care’, ‘control’ or a placebo). The
randomisation assigns patients to one of these two treatment groups.
It is also possible to simultaneously compare more than two options,
providing additional treatment options for each patient, while still using
randomisation to assign them to one of these.
Therefore, researchers designing a trial and practitioners wishing to use
the findings need to be clear about the treatment question that is being
addressed by the study. For instance, will the new intervention be added
to existing practice or will it replace some aspect of existing practice?
72 Clarke
To illustrate, the early randomised trials of chemotherapy for patients
with cancer sought to answer the question of whether the effects of
chemotherapy were worthwhile, compared to management without
chemotherapy. In the late 1950s, this was the design used to test single
agents such as thiotepa, which was compared with a placebo for the treat-
ment of women with breast cancer in what was the first multicentre col-
laborative trial of the National Institutes of Health Cancer Chemotherapy
National Service Center (Fisher et al, 1968). In the 1970s, a combination
of cytotoxic drugs was assessed against no chemotherapy, including the
Italian studies led by Gianni Bonadonna which investigated the effects of
the CMF regimen (cyclophosphamide, methotrexate and 5-fluorouracil)
(Bonadonna et al, 1977).
As regimens were shown to be successful, randomised trials moved on
to direct comparisons of different regimens. These included studies in
which additional agents were added to the standard chemotherapy, or
completely new drugs or regimens were compared with the standard
options. For example, the North Central Cancer Treatment Group and
Mayo Clinic compared 5-fluorouracil versus the same regimen plus leu-
covorin for advanced colorectal cancer in the 1980s (O’Connell et al,
1987). Numerous trials now directly compare different chemotherapy
regimens, sometimes recruiting thousands of patients.
The issues relating to the design of trials are also important when users
of the trials are trying to apply the results. They need to consider whether
the interventions compared in the trial are similar enough to the options
for a particular patient, if they are to use the size of the effect seen in the
trial to inform the decision.
Eligibility Criteria
Alongside this judgement on whether the treatments tested in the trial
are similar enough to those being considered for a particular patient,
people using a randomised trial need to decide if the study patients are
similar enough for the findings to be applicable to the new patient about
whom a decision is being made. As a first step, this is determined by the
inclusion and exclusion criteria for the trial, which can be broad or nar-
row (Yusuf et al, 1990). But, it is worth remembering that these are merely
Randomised Trials 73
the eligibility criteria and, even though certain types of patient might be
eligible, these might not have been recruited.
In an explanatory, or efficacy, trial, which might be used when a new
treatment is beginning to be tested in cancer patients, the inclusion
criteria are usually kept narrow to ensure that a homogeneous, well-
defined population is recruited. This will show whether, in ideal circum-
stances, the new treatment has different effects to the existing treatment
against which it is compared (Schwartz and Lellouch, 1967). If a new
treatment is no better than the existing treatment in the ‘ideal’ circum-
stances of an efficacy trial, it is unlikely to be better in the more hetero-
geneous population of patients encountered in routine practice.
However, if a treatment looks promising in these highly selected patients,
the question is likely to move on to how effective it might be in routine
practice, for a broader range of patients. This would lead to effectiveness
or pragmatic trials (which would usually be a Phase III trial), in
which the eligibility criteria are set broadly so as to include as
many as possible of the types of patient who are likely to be
considered for the treatment in the future (Schwartz and Lellouch, 1967).
In some of the largest randomised trials, this has been translated into
the ‘uncertainty principle’ (Peto and Baignent, 1998). This means
that patients are eligible for a trial if those making the decision about the
patient’s participation, including the patient and those responsible for their
care, are uncertain about the effects of the interventions in the trial and will-
ing to accept allocation to any of the treatments being tested. In the presence
of such uncertainty, randomisation may in fact be the most appropriate solu-
tion since it uses a fair and balanced process to determine which treatment
the patient will receive, while capturing data on the effects of the treatments,
which should help to resolve uncertainty for patients like them in the future.
An example from breast cancer is provided, and another, from prostate can-
cer, is the STAMPEDE trial, which simultaneously tested a series of treat-
ments (James et al, 2012; James et al, 2016).
74 Clarke
Example 1
The uncertainty principle has underpinned some of the largest ran-
domised trials of treatments for patients with cancer. For example, the
ATLAS trial of 5 years versus 10 years of tamoxifen for breast cancer
survivors had the following eligibility criteria: “Women were eligible
for randomisation if they had had early breast cancer (in which all
detected disease could be removed); they had subsequently received
tamoxifen for some years and were still on it (or had stopped in the
past year and could resume treatment with little interruption); they
appeared clinically free of disease (with any local recurrence removed
and no distant recurrence detected); follow-up seemed practicable;
and substantial uncertainty was shared by the woman and her doctor
as to whether to stop tamoxifen or continue for about 5 more years.
No restrictions were placed on age, type of initial surgery or histology,
hormone receptor status, nodal status or other treatments” (Davies et
al, 2013).
Outcome Selection
The importance of the careful selection of outcomes is discussed in
Chapter 7, along with the value of using a core outcome set when plan-
ning the outcomes to be measured in a trial and the role of the COMET
Initiative (Gargon et al, 2014; Williamson et al, 2017). Choosing the
outcomes carefully allows the researchers to focus their efforts on meas-
uring the effects that are likely to be influential for future decision mak-
ers, and likely to detect any important differences between the treat-
ments being compared. In cancer, this might include the recurrence or
the progression of the cancer, as well as mortality, and perhaps cancer-
specific mortality; but care needs to be taken when using surrogate out-
comes (see Chapter 7). Other outcomes are also likely to be needed, to
investigate side effects of the treatment, quality of life, as well as costs
and resource use. Appropriate methods need to be used to measure the
outcomes, including validated instruments for patient-reported out-
come measures (PROM). The timing of the measurements will also be
important to maximise the usefulness of the findings on any differences
between the treatments, while minimising the burden on those who will
measure and report the outcomes, including the patients themselves and
those involved in the delivery of care.
Example 2
A subcommittee of the Previously Untreated, Locally Advanced
(PULA) Task Force of the Head and Neck Steering Committee of the
Coordinating Center for Clinical Trials at the National Cancer Institute
(NCI) identified 18 main areas of concern for the measurement of out-
comes in clinical trials for head and neck cancer. They recommended
measures suitable for use in multicentre clinical trials on the basis of
validity, feasibility and clinical acceptance (Ringash et al, 2015).
One or a small number of outcomes are likely to be chosen as the

primary outcomes for the trial. These are those that are judged to be
the most likely to reveal how well the treatments perform against each
other, and are likely to be those that the researchers will work hardest to
collect in full.
Sample Size
Before embarking on a randomised trial, a sample size calculation allows
the researcher to estimate how many patients will be needed to detect
or refute a difference between the treatments that would be regarded as
worthwhile. This may have consequences for the feasibility of the trial
and will help users of the results to determine whether or not the trial was
of the right size to provide a reliable answer. Calculating the sample size
typically depends on an estimate for what will happen to patients in the
control group, how variable the results will be for patients within each
group, and how different the outcomes will be for patients receiving the
alternative treatment (see Chapter 8).
76 Clarke
Example 3
In a current Dutch trial for patients with advanced colorectal cancer,
those with RAS wild type tumours are treated with doublet chemother-
apy (FOLFOX or FOLFIRI) and randomised between the addition of
either bevacizumab or panitumumab, while patients with RAS mutant
tumours are randomised between doublet chemotherapy (FOLFOX or
FOLFIRI) plus bevacizumab or triple chemotherapy (FOLFOXIRI)
plus bevacizumab. The sample size has been calculated: “The median
PFS [progression-free survival] in patients with RAS wildtype and
RAS mutant tumours is estimated to be 10 months. The treatment is
assumed to reduce the hazard rate for PFS by 30%. To detect such
an improvement in PFS with 80% power and a two-sided logrank
test at 5%, 247 events need to be observed. This requires an inclusion
of approximately 640 patients, which are expected to be accrued in
4 years” (Huiskens et al, 2015).
Randomising Patients
The feature of randomised trials that distinguishes them from other
prospective studies in which the effects of different interventions are
compared is the randomisation. There are a variety of ways in which
randomisation can be achieved and different methods to apply this. The
key elements are the use of a random sequence and protection against
the manipulation of the allocation or participation for a specific patient
before they have entered the trial. If either of these aspects breaks down,
the trial and its findings are likely to be compromised, introducing bias
into the results and negating the value of comparing the treatments in a
randomised trial.
There are many ways in which randomisation can be done. In mod-
ern cancer trials, this will typically be through the use of a computer
program. The underlying principle is that the groups of patients that
will be created by the randomisation will differ only by that allocation
and not by other factors such as their baseline prognosis, the day on
which they presented or the distance they live from the care facility. At its
simplest, randomisation for a two-group trial would use a tool that gives
a 50:50 probability of the patient being allocated to either treatment,
such as flipping a coin. It might also be done by rolling a dice, draw-
ing lots or taking the next in a series of envelopes that have been well
shuffled. An advantage of simple randomisation is that it is completely
unpredictable, providing that the allocation for an individual patient is
concealed up until the point that he or she enters the trial. However, it
can also have the disadvantage, particularly in a small trial, of leading to
large, chance imbalances between the groups. As an example, if a coin
is flipped for each of 100 patients to allocate them to one of two treat-
ments in a trial, it is likely that a consecutive series of 6, 7 or 8 patients
will receive the same allocation at some point in the sequence. If this
occurred early and the trial did not recruit enough patients, it could lead
to a sizeable imbalance in the number of patients in the two groups,
making analysis of the trial difficult. It might also cause a problem if
the run occurred for patients in a particular prognostic subgroup, such
that, for example, the 10 worst prognosis patients are allocated 8 versus
2 between the two groups. By increasing the size of the trial, the effects
of these chance imbalances will be minimised, because it is unlikely that
the imbalances will get worse as the trial gets larger. An imbalance of 8
versus 2 is very unlikely to become 24 versus 6 when the next 20 patients
in that prognostic group are randomised. Rather, it is most likely to drift
towards 18 versus 12.
Methods exist to try to force the allocations to be balanced within dif-
ferent groups of patients even when the numbers are small. This is usu-
ally achieved by using blocked randomisation and when this is done
for particular subgroups of patients it is called stratification. In blocked
randomisation, the number of patients allocated to each treatment will be
the same after each block of allocations has been used. If a block size of
four is used, two in every four patients will be allocated to one treatment
and the other two will be allocated to the alternative. The sequence of
allocations within these blocks of four would be random, and there are
six possible series for each set of four patients: AABB, ABAB, ABBA,
BBAA, BABA and BAAB. Using a block size of four will ensure that
78 Clarke
the maximum difference in the number of patients in the two treatment
groups will be two. This achieves tight balance through the trial, but
needs to be accompanied by processes to ensure that people recruiting
patients do not know the current position in the block. If, for example,
they knew that they are at the fourth position in a block of four, and
knew what had been allocated to the three previous patients, they would
know the next allocation before recruiting the patient. Therefore, people
designing a trial might not reveal the block size being used or might ran-
domly vary it through the study.
Stratified randomisation might be used to ensure that a similar num-
ber of patients within key prognostic groups is allocated to the treat-
ments being tested. For example, it may be important to ensure that both
women and men are balanced between the treatments, or that people
with stage I cancer are divided 50:50 between the treatments and that
the same is true for those with stage II cancer. This does not mean that
the same number of men and women, or the same number of stage I and
stage II patients, need to be recruited to the trial. Rather, it means that if,
for example, there were 16 men and 36 women in a trial, they are likely
to be distributed 8:8 and 18:18.
A potential difficulty with stratified randomisation is that if there are
several strata (e.g. sex and stage) and many levels within these strata
(e.g. men and women, and stages I, II and III), there might be too many
subgroups for the number of patients within specific strata levels to com-
plete the blocks and reach balance. With two sex groups and three stages
there are six separate strata levels: women with stage I, men with stage
I, women with stage II, men with stage II, women with stage III and
men with stage III. If additional strata were needed for, for example, age
(divided into four groups) and the treatment centre (with four centres
involved), the number of strata would increase to 96 (i.e. 2 × 3 × 4 × 4).
If these different strata did not recruit a number of patients equal to the
block size used, small imbalances could build up by chance across these
fine strata, leading to large imbalances for some of the top-level strata
(e.g. for men allocated to each treatment).
Example 4
In the BeTa randomised trial for patients with recurrent or refractory non-
small cell lung cancer, the 636 patients recruited at 177 study sites in 12
countries during 2005 to 2008 were randomly allocated to either erlotinib
plus bevacizumab or erlotinib plus placebo, according to a computer-gen-
erated randomisation sequence by use of an interactive voice response
system. The patients were “stratified by sex, baseline Eastern Cooperative
Oncology Group performance status score (0 or 1 vs 2), smoking history
(never vs current or previous), and study site” (Herbst et al, 2011).
One way to overcome this is to use minimisation, which is a more com-

plex way to ensure balance across subgroups or strata in a randomised trial
(Treasure and MacRae, 1998). It weights the allocation of each patient in
favour of an assignment which will lead to the least imbalance across the
different types of patient and setting for which balance is being sought. In
its simplest form, minimisation allocates the next patient to the treatment
that would give rise to the smallest overall imbalance, but there are sev-
eral variations on the method (Pocock and Simon, 1975). Some of these
use simple randomisation until a particular level of imbalance is reached
and then allocate patients in a deterministic way so that the imbalance is
reduced. Others use a weighted randomisation when a particular level of
imbalance is reached, so that there is still a possibility that the specific
patient will be allocated to the treatment that would worsen the imbalance.
Example 5
In the STAMPEDE trial of celecoxib plus hormone therapy versus hor-
mone therapy alone for men with hormone-sensitive prostate cancer,
“computer-based randomisation was done centrally (via telephone)
using minimisation with a random element of 80% allocation towards
minimising arms, balancing on minimisation factors of randomising
centre, metastases, nodal involvement, age at randomisation, World
Health Organization (WHO) performance status, type of hormone
therapy, regular aspirin or non-steroidal anti-inflamatory drugs use at
baseline and planned use of radiotherapy” (James et al, 2012).
80 Clarke
The use of restricted randomisation methods such as stratification and
minimisation can introduce bias if those who are recruiting patients are
able to use their knowledge to predict what the next patient will receive.
Preventing this foreknowledge of the allocation is achieved through
allocation concealment. This is not the same as blinding or masking
the treatment, which happens after the patient has entered the trial and
might be achieved by the use of a placebo, but is not possible in all trials.
Allocation concealment takes place before the patient enters the trial,
by hiding the allocation that she will receive until she is in the trial,
and is possible for all trials. Increasingly, researchers achieve allocation
concealment by using independent third-party randomisation systems
through a remote login to a computer or a telephone call. These tech-
niques allow information on the patient to be captured centrally before
the allocation is provided, and these data can be used for stratification or
minimisation. It has been shown that if adequate allocation concealment
is not used, the effect of the intervention being investigated might be
overestimated, which may lead to conclusions that a treatment is ben-
eficial even when it is harmful. However, the direction of the bias when
non-random methods of allocation are used can be unpredictable. Some-
times the bias will overestimate the true effect, sometimes it will under-
estimate the true effect (Odgaard-Jensen et al, 2011).
Blinding or Masking
In some trials, it is important to make sure that the people who are involved
in the trial do not know which intervention a patient is receiving. This is
‘blinding’ and it can be applied to one or more of the different types of peo-
ple involved in the trial, such as the patient, practitioner, outcome asses-
sor or analyst. It is used to try to ensure that knowledge of the allocated
treatment does not lead to changes in behaviour which would not happen
outside of the trial and which could lead to difficulties in detecting the true
effect of treatment. However, it is not always possible to blind the partici-
pants in research, because the treatments being compared might need to
be administered in very different ways, the treatment or its side effects
might be obvious to the outcome assessors, or the effect on some outcomes
might reveal the treatment groups to the person doing the analyses. In such
cases, it might be especially important to choose an unequivocal primary
endpoint, such as overall survival.
Statistical Analysis
The common types of statistical analysis in cancer research are discussed
in Chapter 8, but one feature in particular is important to randomised tri-
als. This is the conduct of an analysis which is consistent with the random
allocation. Moving patients between groups in the analysis (for example,
because they switched between the treatments being compared in the
trial) means that the benefits of the use of randomisation to allocate them
initially will be lost. This does not mean that patients must be forced to
continue with the treatment to which they were allocated, but rather that
they are analysed on the basis of the intention for them to receive that
treatment. This is the intention to treat principle (Hollis and Campbell,
1999), and requires that patient outcomes are analysed in accordance
with the treatment the patients were allocated to receive by the randomi-
sation. As well as planning to analyse the patients in the group to which
they were assigned, it is also important that their outcomes are measured
so that these can actually be analysed. If a patient leaves the trial and
contributes no data for the outcome assessment, the researcher will need
to make assumptions about these missing data, which might not be reli-
able. Therefore, researchers need to make strenuous efforts to ensure
that, at least, the primary outcomes are measured and collected.
Reporting
After completion, trials need to be reported in full and in a timely man-
ner if the resources that have been invested in them by the researchers
and their funders and by the patients and practitioners who took part are
not to be wasted (Glasziou et al, 2014). Furthermore, trials should be
registered in advance of any data being available so that users of the
research know what studies have been done (Ghersi and Pang, 2009).
These actions will help future users of the trial to determine its relevance
to them and minimise the problems of selective reporting, which can lead
to bias in the availability of the findings of trials and misleading informa-
tion for decision makers. Trialists should follow reporting guidelines
82 Clarke
for randomised trials (see www.equator-network.org): for example, the
SPIRIT guidelines for the trial’s design (its protocol) (Chan et al, 2013)
and the CONSORT guidelines for reporting of the results in abstract
(Hopewell et al, 2008) or full form (Moher et al, 2010). Research has
shown that the quality of reports of cancer trials has improved since the
introduction of such guidelines, but much still remains to be done to
ensure that all trials are clearly and fully reported (Péron et al, 2012).
Publication bias arises when the results of a study have an influence
over its publication. This usually means that studies with results that
are favourable to a new treatment are more likely to be written up and
published in journals, compared to those in which the new treatment did
worse or did not appear to be different to the established intervention.
A Cochrane Methodology Review found that for every 100 trials with
positive findings, 73 would be published; whereas, only 41 of 100 trials
with negative or null results would make it into the literature (Hopewell
et al, 2009). Other work has also shown that, when trials are reported,
they might be selective about which outcomes and analyses to present,
with a greater focus on the positive results, which is called selective
reporting bias (Dwan et al, 2014). This neatly ends this chapter, by
highlighting how users of a randomised trial need to use their knowledge
of good trial design to do a careful appraisal of the report, to determine
whether the trial was conducted to a satisfactory standard, to assess the
risk of bias for the results that are presented and to determine the appli-
cability of its results to patients seen in their routine practice.
Conclusions
Randomised trials are vital to evidence-based cancer care. They, and the
systematic reviews that incorporate them, provide a means to estimate
the likely effects of different treatments for a future patient. However,
in order to do this, they need to be well conducted and clearly and fully
reported. This chapter highlights several key features of randomised tri-
als in cancer, and how users of that research need to consider these fea-
tures when deciding if a trial is sufficiently reliable and robust for its
findings to inform a choice between treatments.
Professor Clarke has reported no conflict of interest.
Further Reading
Chan AW, Tetzlaff JM, Gøtzsche PC, et al. SPIRIT 2013 explanation and elabo-
ration: guidance for protocols of clinical trials. BMJ 2013; 346:e7586.
Clarke M. Doing new research? Don’t forget the old. PLoS Med 2004; 1:e35.
Clarke M. Ovarian ablation in breast cancer, 1896 to 1998: milestones along
hierarchy of evidence from case report to Cochrane review. BMJ 1998;
317:1246–1248.
Moher D, Hopewell S, Schulz KF, et al. CONSORT 2010 explanation and elabo-
ration: updated guidelines for reporting parallel group randomised trials.
BMJ 2010; 340:c869.
References
Bonadonna G, Rossi A, Valagussa P, et al. The CMF program for operable
breast cancer with positive axillary nodes. Updated analysis on the disease-
free interval, site of relapse and drug tolerance. Cancer 1977; 39(6 Suppl):
2904–2915.
Chan A-W, Tetzlaff JM, Gøtzsche PC, et al. SPIRIT 2013 explanation and elabo-
ration: guidance for protocols of clinical trials. BMJ 2013; 346:e7586.
Clark S, Horton R. Putting research into context – revisited. Lancet 2010;
376:10–11.
Clarke M, Brice A, Chalmers I. Accumulating research: a systematic account of
how cumulative meta-analyses would have provided knowledge, improved
health, reduced harm and saved resources. PLoS One 2014; 9:e102670.
Clarke M, Hopewell S. Many reports of randomised trials still don’t begin or end with a
systematic review of the relevant evidence. J Bahrain Med Soc 2013; 24:145–148.
https://pure.qub.ac.uk/portal/files/141168172/Islands_2012_JBMS_2013.pdf.
(3 April 2018, date last accessed)
Clarke M. Doing new research? Don’t forget the old. PLoS Med 2004; 1:e35.
Davies C, Pan H, Godwin J, et al; Adjuvant Tamoxifen: Longer Against Shorter
(ATLAS) Collaborative Group. Long-term effects of continuing adjuvant
tamoxifen to 10 years versus stopping at 5 years after diagnosis of oestrogen
receptor-positive breast cancer: ATLAS, a randomised trial. Lancet 2013;
381:805–816.
84 Clarke
Dwan K, Altman DG, Clarke M, et al. Evidence for the selective reporting of
analyses and discrepancies in clinical trials: a systematic review of cohort
studies of clinical trials. PLoS Med 2014; 11:e1001666.
Fisher B, Ravdin RG, Ausman RK, et al. Surgical adjuvant chemotherapy in
cancer of the breast: results of a decade of cooperative investigation. Ann
Surg 1968; 168:337–356.
Gargon E, Gurung B, Medley N, et al. Choosing important health outcomes for
comparative effectiveness research: a systematic review. PLoS One 2014;
9:e99111.
Ghersi D, Pang T. From Mexico to Mali: four years in the history of clinical trial
registration. J Evid Based Med 2009; 2:1–7.
Glasziou P, Altman DG, Bossuyt P, et al. Reducing waste from incomplete or
unusable reports of biomedical research. Lancet 2014; 383:267–276.
Herbst RS, Ansari R, Bustin F, et al. Efficacy of bevacizumab plus erlotinib ver-
sus erlotinib alone in advanced non-small-cell lung cancer after failure of
standard first-line chemotherapy (BeTa): a double-blind, placebo-controlled,
phase 3 trial. Lancet 2011; 377:1846–1854.
Hollis S, Campbell F. What is meant by intention to treat analysis? Survey of
published randomised controlled trials. BMJ 1999; 319:670–674.
Hopewell S, Clarke M, Moher D, et al; CONSORT Group. CONSORT for
reporting randomised trials in journal and conference abstracts. Lancet 2008;
371:281–283.
Hopewell S, Loudon K, Clarke MJ, et al. Publication bias in clinical trials due
to statistical significance or direction of trial results. Cochrane Database Syst
Rev 2009; (1):MR000006.
Huiskens J, van Gulik TM, van Lienden KP, et al. Treatment strategies in colorec-
tal cancer patients with initially unresectable liver-only metastases, a study
protocol of the randomised phase 3 CAIRO5 study of the Dutch Colorectal
Cancer Group (DCCG). BMC Cancer 2015; 15:365.
James ND, Sydes MR, Clarke NW, et al. Addition of docetaxel, zoledronic acid,
or both to first-line long-term hormone therapy in prostate cancer (STAM-
PEDE): survival results from an adaptive, multiarm, multistage, platform
randomised controlled trial. Lancet 2016; 387:1163–1177.
James ND, Sydes MR, Mason MD, et al; for the STAMPEDE investigators.
Celecoxib plus hormone therapy versus hormone therapy alone for hormone-
sensitive prostate cancer: first results from the STAMPEDE multiarm, multi-
stage, randomised controlled trial. Lancet Oncol 2012; 13:549–558.
Macleod MR, Michie S, Roberts I, et al. Biomedical research: increasing value,
reducing waste. Lancet 2014; 383:101–104.
BMJ 2010; 340:c869.
O’Connell MJ, Klaassen DJ, Everson LK, et al. Clinical studies of biochemical
modulation of 5-fluorouracil by leucovorin in patients with advanced colo-
rectal cancer by the North Central Cancer Treatment Group and Mayo Clinic.
NCI Monogr 1987; 185–188.
Odgaard-Jensen J, Vist GE, Timmer A, et al. Randomisation to protect against
selection bias in healthcare trials. Cochrane Database Syst Rev 2011;
MR000012.
Péron J, Pond GR, Gan HK, et al. Quality of reporting of modern randomized
controlled trials in medical oncology: a systematic review. J Nat Cancer Inst
2012; 104:982–989.
Peto R, Baigent C. Trials: the next 50 years. Large scale randomised evidence of
moderate benefits. BMJ 1998; 317:1170–1171.
Pocock SJ, Simon R. Sequential treatment assignment with balancing for prog-
nostic factors in the controlled clinical trial. Biometrics 1975; 31:103–115.
Ringash J, Bernstein LJ, Cella D, et al. Outcomes toolbox for head and neck
cancer research. Head Neck 2015; 37:425–439.
Schwartz D, Lellouch J. Explanatory and pragmatic attitudes in therapeutical
trials. J Chronic Dis 1967; 20:637–648.
Treasure T, MacRae KD. Minimisation: the platinum standard for trials?
Randomisation doesn’t guarantee similarity of groups; minimisation does.
BMJ 1998; 317:362–363.
Williamson PR, Altman DG, Bagley H, et al. The COMET Handbook: version
1.0. Trials 2017; 18(suppl 3):280.
Yusuf S, Held P, Teo KK, Toretsky ER. Selection of patients for randomized
controlled trials: implications of wide or narrow eligibility criteria. Stat Med
1990; 9:73–86.
86 Clarke
Choice of Outcomes
(Including Core Outcome Sets 7
and Surrogate Outcomes)
M. Bellei1
A. Guida2
1
Dipartimento di Medicina Diagnostica, Clinica e di Sanità Pubblica,
Università di Modena e Reggio Emilia, Modena, Italy
2
Dipartimento di Oncologia ed Ematologia, Azienda Ospedaliero-universitaria
Policlinico di Modena, Modena, Italy
The choice of outcomes in clinical trials (see Chapters 5 and 6) is cru-

cial to answering the research questions posed by the trial. Reproducible,
valid and appropriate endpoints must be carefully selected to optimise
the design, ensure patient safety and produce meaningful results (Wilson
et al, 2015a; Wilson et al, 2015b).
Outcomes need to be relevant to the trial objectives, cancer type, clinical
situation (e.g. initial versus salvage treatment), trial phase and type of
treatment under investigation. It is also fundamental that consistent defi-
nitions of outcomes are used in order to allow comparison of results of
different studies with similar objectives (Cheson et al, 2007).
Outcomes (or Endpoints)

The objectives of a trial must be stated in specific terms to achieve valid
results. The outcomes (often referred to as endpoints) are clearly defined,
measurable, clinical and sometimes biological findings that are used for
the development and assessment of treatment options (Fiteni et al, 2014).
Outcomes may be based on efficacy, safety or other study objectives. An
endpoint can be numerical (e.g. serum level of a tumour marker, blood
cell count), dichotomous or categorical (e.g. death, severity of disease)
87
or time-to-event (e.g. time to disease progression) with censored obser-
vations (see Chapter 8).
Primary and Secondary Endpoints

Usually, one endpoint is chosen as the primary. The primary endpoint is
regarded as the most appropriate endpoint to answer the key question being
asked by the trial. The power needed to detect the difference between the
interventions being compared in a trial on the primary endpoint and, there-
fore, the sample size calculation is based on the primary endpoint.
Many recent studies have used multiple primary endpoints as part of
a composite outcome. This can be especially useful for exploratory
clinical investigations (e.g. studies exploring the pharmacodynamics or
pharmacokinetics of a candidate drug in an early stage of development,
see Chapter 5). Some clinical endpoints frequently used in clinical tri-
als are composite outcomes such as progression-free survival (PFS) or
event-free survival. Composite outcomes have the advantage of allow-
ing evaluation of a new treatment in a more comprehensive way with a
smaller number of patients. They can integrate different relevant events
in one endpoint. Their limitations are that the included outcomes rarely
have equal importance, or are equally meaningful, and the outcomes may
give inconsistent results.
A clinical trial can also address secondary, additional or exploratory
objectives to answer further relevant questions about the topic being
studied. Hence, secondary endpoints should also be defined.
When secondary endpoints are judged important, the trial might need
to be powered sufficiently to detect a difference in both primary and
secondary endpoints, and expert statistical and design advice may be
needed.
Core Outcome Sets

The selection of the most appropriate endpoints across all trials in the same
area allows decision makers to compare, contrast or combine the findings
from different trials dealing with the same condition in a useful way.
88 Bellei and Guida

In recent years, the Core Outcome Measures in Effectiveness Trials
(COMET) Initiative (http://www.comet-initiative.org) has encouraged
and facilitated the development and application of core outcome sets
(COSs) in order to minimise difficulties in trial comparison (Gargon et
al, 2015; Gorst et al, 2016; Williamson et al, 2017).
A COS is an agreed minimum set of outcomes that should be measured
and reported in all trials of a specific condition. It allows the results of
trials to be brought together as appropriate and ensures that all trials
provide usable evidence (Gorst et al, 2016). Besides the recommenda-
tion to measure and report the outcomes in the COS, researchers are also
encouraged to continue exploring other outcomes of relevance to their
research question. However, if the primary endpoint is not one included
in the COS, the investigator should explain the decision in the trial pro-
tocol and subsequent reports concerning the trial.
Categories of Outcomes
Outcomes used in oncology trials can be grouped into two general cat-
egories (Fiteni et al, 2014).
1. Patient-centred Clinical Outcomes

Treatment effectiveness has been defined as a clinically meaningful ben-
efit to the patient with the objective of the patient living longer or with
a better quality of life, or both, than without the treatment. These end-
points reflect a patient’s feeling of well-being or their survival (Biomark-
ers Definitions Working Group, 2001) by providing the most reliable
information on clinical benefit.
Overall survival (OS): Widely regarded as the ‘gold standard’ of pri-
mary endpoints in cancer trials, OS can be reported as a median survival
or survival probabilities at pre-specified time points using time-to-event
analyses (see Chapter 8).
Advantages of OS are that it is precise and easy to measure; investigator
bias and subjective interpretation are not possible (Driscoll and Rixe,
2009). Possible disadvantages are that deaths might be uncommon during
the study period (if the prognosis of patients in a trial is good) and a trial with
Choice of Outcomes (Including Core Outcome Sets and Surrogate Outcomes) 89
sufficient power to detect a difference in OS may require a very large
number of patients. The availability of increasingly effective salvage
treatments for many cancers results in longer survival, thus requiring pro-
longed observation to detect a treatment effect and delaying the release
of the trial’s results, resulting in increased costs for the research. Impor-
tantly, the outcomes of the study may become less relevant because of
these delays (Meyer et al, 2012). Furthermore, it becomes difficult to
separate the effect of the investigational strategy from other treatments
the patient receives, since cancer patients will often receive other treat-
ments before and after a trial.
Health-related quality of life (HR-QoL) endpoints are important since
many patients with cancer survive for a prolonged period (Wilson et al,
2015a). HR-QoL assessments may be used in several circumstances,
from assessing primary adjuvant therapy to palliative treatment (Osoba,
2011). Demonstrating an improvement in HR-QoL is of particular value
when the benefit in OS is small (Wilson et al, 2015b). HR-QoL might
also capture the effects of adverse events, albeit indirectly. The propor-
tion of studies incorporating HR-QoL is increasing, due to the establish-
ment of standardised questionnaires to evaluate physical, mental, social
and patient satisfaction of treatment and treatment outcomes.
Possible problems:
■ A large number of instruments are available, making different studies
difficult to compare if they have used different instruments (Wilson et
al, 2015b)
■ If a trial has a high drop-out rate, many patients do not complete the
HR-QoL instrument, and measuring HR-QoL can be expensive and
time consuming
2. Tumour-centred Clinical Endpoints

Tumour-centred clinical endpoints include biomarkers, mainly by
imaging, laboratory or histological, that are used to define response to a
therapeutic intervention, and some time-to-event endpoints (Biomarkers
Definition Working Group, 2001).
90 Bellei and Guida

Examples:
■ Tumour response (Cheson et al, 2007; Cheson et al, 2014; Eisenhauer
et al, 2009)
■ Circulating tumour cells (de Bono et al, 2008; Hayes et al, 2006;
Krebs et al, 2010; Punnoose et al, 2012)
■ Cell-free clonotypic assay (Roschewski et al, 2015)
■ Disease-free survival (DFS) (Robinson et al, 2014a)
■ PFS (Robinson et al, 2014b)
■ Event-free survival (EFS) (Maurer et al, 2014)
However, some of these endpoints might not always reflect clinical benefit
to the patients. Some tumour-centred endpoints are often used as surro-
gates for patient-centred endpoints, particularly OS (Fiteni et al, 2014).
Surrogate Endpoints
A surrogate endpoint is ‘a biomarker intended to substitute for a clinical
endpoint’, the latter being ‘a characteristic or variable that reflects how
a patient feels, functions, or survives’ (NIH Definitions Working Group,
2000). These surrogate endpoints provide information earlier than most
time-dependent endpoints such as PFS, DFS or OS. However, many of
them lack a standardised definition, thus preventing cross comparison
among different studies. HR-QoL constitutes a valid surrogate endpoint,
but is not sufficient to demonstrate efficacy of the treatment.
Surrogate endpoints in oncology have been intensively studied in recent
years, and are important from two distinct perspectives:
1. As endpoints in trials used for new drug approval by regulatory agencies
2. From the perspective of clinical treatment and patient care
A paper published more than 15 years ago, reporting on the US Food
and Drug Administration (FDA) approval of drugs between 1990–2002
(Johnson et al, 2003), found that during this time OS was not the end-
point of choice for approval of many drugs. It was often replaced by
response rate, PFS, time to progression (TTP), complete response
(CR) rate, duration of response (DoR) and DFS.

In 2005–2012, the FDA approved 41 indications for cancer drugs (20%
of the drugs approved during this period) based on 55 pivotal trials in
which surrogate endpoints were used in 84% of them (Downing et al,
2014). An evaluation by Martell and co-workers of 76 FDA indications
given in 2006–2011 reported that the primary endpoint was a time-to-
event in 33 cases, in 32 cases it was response rate, and only 11 used OS
as the primary endpoint (Martell et al, 2013).
In the period 2014 to mid-2015, 16 of the 17 approvals of oncology drugs
were based on an endpoint other than OS: PFS (6 approvals); EFS (1); over-
all response rate (ORR) (8); and CR with partial haematological recovery
rate (1) (FDA, 2007). A recent study that analysed surrogate endpoints and
OS in 65 trials found that the strength of the correlation between them was
either medium or strong in 48% of the trials (Prasad et al, 2015).
Example:
■ Recently proposed surrogate endpoints for the treatment of follicu-
lar lymphomas are fludeoxyglucose–positron emission tomography
(FDG-PET), minimal residual disease (MRD) after initial treatment
(Galimberti et al, 2014; Luminari et al, 2016; Trotman et al, 2014),
complete response at 30 months (CR30) (Shi et al, 2017) and clon-
oSEQ™ assays for diffuse large B-cell lymphomas (Roschewski et
al, 2015).
The main advantage of using a surrogate endpoint is the need for a smaller
sample size, reducing the duration and cost of a trial. However, an important
limitation to always consider is that they are only useful if they are a good
predictor of the clinical outcome of interest. If this relationship is not clearly
defined, surrogate endpoints could be misleading (Bakhai et al, 2006).
Endpoint Definition
Although guidelines exist for defining the different endpoints (EMA,
2012; FDA, 2007) (Table 1), investigators in many clinical trials use
different definitions than those proposed, and a standardised definition
of endpoints is needed. Indications on the most common primary end-
points in clinical trials in solid tumours are also provided in the Revised
RECIST (v.1.1) publication (Eisenhauer et al, 2009).
92 Bellei and Guida
Table 1 Efficacy Endpoints for Clinical Trials, Advantages and Limitations
(Source: EMA, 2012; FDA, 2007)
Endpoints Definition Advantages Limitations

Overall Time from randomisation* until • Universally accepted • May require a larger trial
survival death from any cause measure of direct population and longer
(OS) benefit follow-up to show statistical
difference between groups
• Easily and precisely
measured • May be affected by
crossover or subsequent
therapies
• Includes deaths unrelated
to cancer
Progression- Time from randomisation* until • Requires small sample • Validation as a surrogate for
free survival disease progression or death from size and shorter survival can be difficult in
(PFS) any cause follow-up time some treatment settings
compared with OS
EMA: defines also a PFS2 as Time • Not precisely measured
from randomisation* to objective • Not affected (i.e. measurement may be
tumour progression on next-line by crossover or subject to bias)
treatment or death from any cause subsequent therapies
• Definition may vary among
Time to FDA: Time from randomisation* until • Generally based trials
progression objective tumour progression; does on objective
and quantitative • Requires frequent
(TTP) not include deaths radiological or other
assessment
EMA: Time from randomisation* assessments
to observed tumour progression, • A methodologically
better endpoint than • Requires balanced timing
censoring for death not related to of assessment among
the underlying malignancy TPP would be the
cumulative incidence treatment arms
of progression,
treating death without
progression as
competing events
Time to FDA: Time from randomisation* to • Useful in settings • Does not adequately
treatment discontinuation of treatment for any in which toxicity is distinguish efficacy from
failure reason, including disease progression, potentially as serious other variables, such as
(TTF) treatment toxicity, and death as disease progression toxicity
(e.g. allogeneic stem
EMA: includes also add-on of new cell transplant)
anticancer therapy
Disease-free Definition may vary. Time from • Requires small sample • Not statistically validated
survival documentation of first response size and shorter as a surrogate endpoint in
(DFS) until recurrence of tumour or death follow-up time all settings; not precisely
as a result of the disease or the compared with OS measured; subject to bias,
acute toxicity of treatment especially in open-label
studies; definitions vary
between studies

Table 1 (Continued)
Endpoints Definition Advantages Limitations
Event-free FDA: Time from randomisation* • Similar to PFS; may be • Initiation of next therapy
survival to disease progression, death, or useful in evaluation of is subjective. Usually not
(EFS) discontinuation of treatment for highly toxic therapies encouraged by regulatory
any reason (e.g. toxicity, patient agencies because it
preference, or initiation of a new combines efficacy, toxicity,
treatment without documented and patient withdrawal
progression)
EMA: Time from randomisation* to
lack of achievement of CR, relapse
or death without relapse. Patient
not achieving a CR with induction
phase will be considered as having
an event at time of restaging after
induction therapy
Time FDA: Time from end of primary • For incurable diseases, • Not commonly used as a
to next treatment to institution of next may provide an primary endpoint
treatment therapy endpoint meaningful to • Subject to variability in
(TTNT) EMA: No definition patients practice patterns
Complete Disappearance of all signs and • Can be assessed in • Not a direct measure of
response/ symptoms of cancer single-arm studies benefit in all cases
remission • Durable complete • Not a comprehensive
(CR) responses can measure of drug activity
represent clinical • Small subset of patients
benefit with benefit
• Assessed earlier and
in smaller studies
compared with survival
studies
Objective Proportion of patients with • Can be assessed in • Not a comprehensive
/overall reduction in tumour burden of a single-arm trials measure of drug activity
response predefined amount (CR and PR) • Requires a smaller
rate (ORR) population and can
be assessed earlier,
Duration FDA: Time from documentation
compared with
of response of tumour response to disease
survival trials
(DoR) progression
• Effect is attributable
EMA: No definition
directly to the drug,
not the natural
history of the disease
*Not all trials are randomised. In non-randomised trials, time from study enrolment is commonly used.
If definition superimposes between FDA and EMA, no distinction is made.
Abbreviations: CR, complete response/remission; EMA, European Medicines Agency; FDA, Food and Drug
Administration, OS, overall survival; PR, partial response/remission.
94 Bellei and Guida

For lymphoma trials, the definition of endpoints is a little more stand-
ardised since most investigators follow the indications by Cheson et al
(2007) (Table 2). This paper is a source of COS for lymphoma trials, and
suggests both OS and PFS as primary endpoints.
Table 2 Efficacy Outcomes for Clinical Trials in Lymphoma
Modified from Cheson BD, Fisher RI, Barrington SF, et al. Recommendations for initial evaluation, staging, and
response assessment of Hodgkin and non-Hodgkin lymphoma: The Lugano classification. J Clin Oncol 2014;
32:3059-3068. Reprinted with permission. © 2007 American Society of Clinical Oncology. All rights reserved.
Outcome Patients Definition of relevant Measured from

endpoints
Primary
Overall survival All Death as a result of any cause Entry onto study
Progression-free All Disease progression or death as a Entry onto study
survival result of any cause
Secondary
Event-free survival All Failure of treatment or death as a Entry onto study
result of any cause
Time to All Time to progression or death as a Entry onto study
progression result of lymphoma
Disease-free In CR Time to relapse or death as a result Documentation of
survival of lymphoma or acute toxicity of response
treatment
Response duration In CR or PR Time to relapse or progression Documentation of
response
Lymphoma-specific All Time to death as a result of lymphoma Entry onto study
survival
Time to next All Time to new treatment End of primary treatment
treatment
Abbreviations: CR, complete response/remission; PR, partial response/remission.
Most Common Individual Outcomes Used in

Oncology Clinical Trials
For the definition of individual outcomes refer to Tables 1 and 2.
Objective/overall response rate (ORR): Complete response/remission (CR)
and partial response/remission (PR) combined. The definition of ORR
excludes stable disease and minimal responses. These exclusions may ena-
ble ORR to be directly attributable to drug effect (McKee et al, 2010). ORR

is often used in single-arm, Phase II trials in refractory cancer (FDA, 2007).
Duration of response (DoR): DoR may be reported in conjunction with
ORR. Its measurement is influenced by many factors (e.g. frequency of
follow-up depending on disease types and stages, treatment periodicity,
and standard practice). This limitation should be considered when com-
paring the results of different trials (Eisenhauer et al, 2009).
PFS and TTP: PFS may be preferred over TTP because it includes
deaths from any cause. Thus, it may correlate better with OS
(Pazdur et al, 2008), and may capture fatal treatment-related toxicities.
As with OS, PFS can be reported either as median or probabilities at
pre-specified time points in a time-to-event analysis. PFS seems to
be most useful as a surrogate for OS when the median survival after
progression is relatively short, as in some advanced solid tumours
(Ciani et al, 2014; Wilson et al, 2015b).
With respect to lymphoma trials, PFS is often considered the preferred
endpoint, particularly if devoted to incurable histological subtypes
(e.g. follicular lymphoma, other low-grade lymphoma, or mantle cell
lymphoma) (Cheson et al, 2007), but also for curable tumours such as
Hodgkin lymphoma (Radford et al, 2015; Raemaekers et al, 2014) or
diffuse large B-cell lymphoma (Lee et al, 2011). However, PFS might
not adequately predict survival in incurable tumours, such as follicular
lymphoma. The lack of correlation between PFS and OS is generally
due to patients rapidly progressing and dying after progression, effective
salvage treatments and toxicities in the progression-free interval.
Example 1
The Follicular Lymphoma Analysis of Surrogacy Hypothesis
(FLASH) analysed data of 3837 individual patients, from 18 first-
line randomised studies with PFS as the primary endpoint. This large
meta-analysis of chemo/immunotherapy trials establishes CR30 as
a robust surrogate endpoint for PFS in first-line follicular lymphoma
trials, and supports its use to expedite therapeutic development
(Shi et al, 2017).
96 Bellei and Guida

Time to treatment failure (TTF) and EFS: These similar endpoints are
composites that measure time from randomisation to treatment discon-
tinuation (see Table 1 and Table 2). The FDA no longer recommends
TTF as an endpoint for drug approval because it fails to clearly distin-
guish efficacy from toxicity, intolerance and withdrawal from the study
(FDA, 2007). TTF and EFS are instead used in clinical trials from a
clinical treatment and patient care perspective.
Example 2
In a study of patients with diffuse large B-cell lymphoma enrolled
in three trials from the University of Iowa/Mayo Clinic and in the
NCCTG-N0489 trial, validated with data of patients from the GELA
LNH2003B programme, it emerged that EFS at 24 months (EFS24)
is a robust endpoint for disease-related outcome. For patients achiev-
ing EFS24, the risk of future relapse in the following 5 years was the
same as the risk of deaths as a result of unrelated causes (8%). This
means that these patients have subsequent survival comparable to that
of the general population (i.e. a normal life expectancy). The authors
recommend using EFS24 for outcome studies or clinical trials in the
first-line setting (Maurer et al, 2014).
DFS: Definitions vary but DFS is usually defined according to the FDA
definition (Table 1). The most frequent use of DFS as a primary end-
point is in the adjuvant setting, after definitive surgery or radiotherapy
and in situations where survival may be prolonged, making an OS
endpoint impractical (McKee et al, 2010). DFS can also be an important
endpoint when a large percentage of patients achieve complete responses
with therapy.

Example 3
The ACCENT group analysed data from 20 898 patients from 18
randomised trials demonstrating that DFS after 2–3 years’ median
follow-up was an excellent predictor of 5-year OS. The association
between 3-year DFS and 5-year OS was strong (rank correlation coef-
ficient=0.88). Within-trial logrank testing using both DFS and OS
provided the same conclusion in 23 (92%) of 25 cases. The authors
recommend DFS after 2–3 years to be considered as the primary
endpoint in future colon cancer adjuvant trials, since it can pro-
vide information on the effects of treatments 2–3 years before OS
(Sargent et al, 2007).
Disease-specific survival (DSS): This endpoint relates to death from the

cancer. Deaths from other causes are not considered events or compet-
ing events. In some instances, outcome is better characterised by DSS
than OS, because OS underestimates the impact of treatment in patients
with favourable features, who are more likely to die of causes other than
their cancer (Gschwend et al, 2002). However, DSS can be an unreliable
measure of the effects of a treatment on mortality because it ignores
deaths from treatment toxicity.
MRD: This is an emerging tumour-centred endpoint. It detects the
remaining tumour burden at the end of treatment by means of new, highly
sensitive technologies that are able to detect persisting cancer cells at low
levels. MRD can be incorporated into a variety of endpoints, depending
on the type of cancer and the technology used. MRD negativity may be
an important criterion to evaluate treatment efficacy in haematological
tumours. It has been shown to correlate with survival in several clinical
studies (FDA, 2012).
Conclusions
The choice of the most appropriate outcomes is key to the success of
the research study and for providing clinical practice with reliable and
relevant information on new, innovative therapies. However, further efforts
98 Bellei and Guida

are needed to standardise definitions of different outcomes to make stud-
ies more comparable and their results more useful to decision makers. The
use of COSs would help people to compare, contrast and combine studies
using outcomes that matter most to decision makers, including patients and
practitioners. The validation of currently used surrogate endpoints and the
development of new ones would allow research in oncology to be less time
and cost consuming, and make results accessible to routine practice more
quickly.
Dr Bellei has reported no conflict of interest.
Dr Guida has reported no conflict of interest.
Further Reading
Cheson BD, Pfistner B, Juweid ME, et al. Revised response criteria for malignant
lymphoma. J Clin Oncol 2007; 25:579–586.
Ciani O, Davis S, Tappenden P, et al. Validation of surrogate endpoints in
advanced solid tumors: systematic review of statistical methods, results,
and implications for policy makers. Int J Technol Assess Health Care 2014;
30:312–324.
Eisenhauer EA, Therasse P, Bogaerts J, et al. New response evaluation criteria in
solid tumours: revised RECIST guideline (version 1.1). Eur J Cancer 2009;
45:228–247.
Fiteni F, Westeel V, Pivot X, et al. Endpoints in cancer clinical trials. J Visc Surg
2014; 151:17–22.
Gorst SL, Gargon E, Clarke M, et al. Choosing important health outcomes for
comparative effectiveness research: an updated review and user survey. PLoS
One 2016; 11:e0146444.
Johnson JR, Williams G, Pazdur R. End points and United States Food and Drug
Administration approval of oncology drugs. J Clin Oncol 2003; 21:1404–1411.
Luminari S, Galimberti S, Versari A, et al. Positron emission tomography
response and minimal residual disease impact on progression-free survival
in patients with follicular lymphoma. A subset analysis from the FOLL05
trial of the Fondazione Italiana Linfomi. Haematologica 2016; 101:e66–e68.
McKee AE, Farrell AT, Pazdur R, Woodcock J. The role of the U.S. Food and
Drug Administration review process: clinical trial endpoints in oncology.
Oncologist 2010; 15(Suppl 1):13–18.

Prasad V, Kim C, Burotto M, Vandross A. The strength of association between
surrogate end points and survival in oncology. JAMA Intern Med 2015;
175:1389–1398.
Wilson MK, Karakasis K, Oza AM. Outcomes and endpoints in trials of cancer
treatment: the past, present, and future. Lancet Oncol 2015; 16:e32–e42.
References
Bakhai A, Chhabra A, Wang D. Endpoints. In Wang D, Bakhai A (Eds). Clini-
cal Trials: A Practical Guide to Design, Analysis, and Reporting. London:
Remedica, 2006; pp37–46.
Biomarkers Definitions Working Group. Biomarkers and surrogate endpoints:
preferred definitions and conceptual framework. Clin Pharmacol Ther 2001;
69:89–95.
Cheson BD, Fisher RI, Barrington SF, et al. Recommendations for initial evalu-
ation, staging, and response assessment of Hodgkin and non-Hodgkin lym-
phoma: The Lugano classification. J Clin Oncol 2014; 32:3059–3068.
Cheson BD, Pfistner B, Juweid ME, et al. Revised response criteria for malignant
lymphoma. J Clin Oncol 2007; 25:579–586.
Ciani O, Davis S, Tappenden P, et al. Validation of surrogate endpoints in
advanced solid tumors: systematic review of statistical methods, results,
and implications for policy makers. Int J Technol Assess Health Care 2014;
30:312–324.
de Bono JS, Scher HI, Montgomery RB, et al. Circulating tumor cells predict
survival benefit from treatment in metastatic castration-resistant prostate
cancer. Clin Cancer Res 2008; 14:6302–6309.
Downing NS, Aminawung JA, Shah ND, et al. Clinical trial evidence support-
ing FDA approval of novel therapeutic agents, 2005-2012. JAMA 2014;
311:368–377.
Driscoll JJ, Rixe O. Overall survival: still the gold standard: why overall sur-
vival remains the definitive end point in cancer clinical trials. Cancer J 2009;
15:401–405.
solid tumours: revised RECIST guideline (version 1.1). Eur J Cancer 2009;
45:228–247.
EMA Guideline on the evaluation of anticancer medicinal products in man,
December 2012. http://www.ema.europa.eu/docs/en_GB/document_library/
Scientific_guideline/2013/01/WC500137128.pdf (24 January 2018, date
last accessed).
100 Bellei and Guida

FDA Guidance for industry: Clinical Trial Endpoints for the Approval of Cancer
Drugs and Biologics, May 2007. http://www.fda.gov (24 January 2018, date
last accessed).
FDA Minimal Residual Disease (MRD) as a Surrogate Endpoint in Acute
Lymphoblastic Leukemia (ALL) Workshop. April 18, 2012.
Fiteni F, Westeel V, Pivot X, et al. Endpoints in cancer clinical trials. J Visc Surg
2014; 151:17–22.
Galimberti S, Luminari S, Ciabatti E, et al. Minimal residual disease after con-
ventional treatment significantly impacts on progression-free survival of
patients with follicular lymphoma: the FIL FOLL05 Trial. Clin Cancer Res
2014; 20:6398–6405.
Gargon E, Williamson PR, Altman DG, et al. The COMET initiative database:
progress and activities update (2014). Trials 2015; 16:515.
Gorst SL, Gargon E, Clarke M, et al. Choosing important health outcomes for
comparative effectiveness research: an updated review and user survey. PLoS
One 2016; 11:e0146444.
Gschwend JE, Dahm P, Fair WR. Disease specific survival as endpoint of out-
come for bladder cancer patients following radical cystectomy. Eur Urol
2002; 41:440–448.
Hayes DF, Cristofanilli M, Budd GT, et al. Circulating tumor cells at each fol-
low-up time point during therapy of metastatic breast cancer patients predict
progression-free and overall survival. Clin Cancer Res 2006; 12:4218–4224.
Johnson JR, Williams G, Pazdur R. End points and United States Food and
Drug Administration approval of oncology drugs. J Clin Oncol 2003;
21:1404–1411.
Krebs MG, Hou JM, Ward TH, et al. Circulating tumour cells: their utility in
cancer management and predicting outcomes. Ther Adv Med Oncol 2010;
2:351–365.
Lee L, Wang L, Crump M. Identification of potential surrogate end points in
randomized clinical trials of aggressive and indolent non-Hodgkin’s lym-
phoma: correlation of complete response, time-to-event and overall survival
end points. Ann Oncol 2011; 22:1392–1403.
Luminari S, Galimberti S, Versari A, et al. Positron emission tomography
response and minimal residual disease impact on progression-free survival
in patients with follicular lymphoma. A subset analysis from the FOLL05
trial of the Fondazione Italiana Linfomi. Haematologica 2016; 101:e66–e68.
Martell RE, Sermer D, Getz K, Kaitin KI. Oncology drug development and
approval of systemic anticancer therapy by the U.S. Food and Drug Admin-
istration. Oncologist 2013; 18:104–111.
Maurer MJ, Ghesquières H, Jais JP, et al. Event-free survival at 24 months is a
robust end point for disease-related outcome in diffuse large B-cell lymphoma
treated with immunochemotherapy. J Clin Oncol 2014; 32:1066–1073.
McKee AE, Farrell AT, Pazdur R, Woodcock J. The role of the U.S. Food and
Drug Administration review process: clinical trial endpoints in oncology.
Oncologist 2010; 15 Suppl 1:13–18.
Meyer RM, Gospodarowicz MK, Connors JM, et al. ABVD alone versus radi-
ation-based therapy in limited-stage Hodgkin’s lymphoma. N Engl J Med
2012; 366:399–408.
NIH Definitions Working Group. Biomarkers and Surrogate Endpoints. Amster-
dam: Elsevier, 2000; 1–9.
Osoba D. Health-related quality of life and cancer clinical trials. Ther Adv Med
Oncol 2011; 3:57–71.
Pazdur R. Endpoints for assessing drug activity in clinical trials. Oncologist
2008; 13 Suppl 2:19–21.
Prasad V, Kim C, Burotto M, Vandross A. The strength of association between
surrogate end points and survival in oncology: a systematic review of trial-
level meta-analyses. JAMA Intern Med 2015; 175:1389–1398.
Punnoose EA, Atwal S, Liu W, et al. Evaluation of circulating tumor cells and
circulating tumor DNA in non-small cell lung cancer: association with clini-
cal endpoints in a phase II clinical trial of pertuzumab and erlotinib. Clin
Cancer Res 2012; 18:2391–2401.
Radford J, Illidge T, Counsell N, et al. Results of a trial of PET-directed therapy
for early-stage Hodgkin’s lymphoma. N Engl J Med 2015; 372:1598–1607.
Raemaekers JM, André MP, Federico M, et al. Omitting radiotherapy in early
positron emission tomography-negative stage I/II Hodgkin lymphoma is
associated with an increased risk of early relapse: clinical results of the pre-
planned interim analysis of the randomized EORTC/LYSA/FIL H10 trial.
J Clin Oncol 2014; 32:1188–1194.
Robinson AG, Booth CM, Eisenhauer EA. Disease-free survival as an end-point
in the treatment of solid tumours – perspectives from clinical trials and clini-
cal practice. Eur J Cancer 2014a; 50:2298–2302.
Robinson AG, Booth CM, Eisenhauer EA. Progression-free survival as an end-
point in solid tumours – perspectives from clinical trials and clinical practice.
Eur J Cancer 2014b; 50:2303–2308.
Roschewski M, Dunleavy K, Pittaluga S, et al. Circulating tumour DNA and
CT monitoring in patients with untreated diffuse large B-cell lymphoma:
a correlative biomarker study. Lancet Oncol 2015; 16:541–549.
102 Bellei and Guida

Sargent DJ, Patiyil S, Yothers G, et al. End points for colon cancer adjuvant
trials: observations and recommendations based on individual patient data
from 20,898 patients enrolled onto 18 randomized trials from the ACCENT
Group. J Clin Oncol 2007; 25:4569–4574.
Shi Q, Flowers CR, Hiddeman W, et al. Thirty-month complete response as a
surrogate end point in first-line follicular lymphoma therapy: an individual
patient-level analysis of multiple randomized trials. J Clin Oncol 2017;
35:552–560.
Trotman J, Luminari S, Boussetta S, et al. Prognostic value of PET-CT after first-
line therapy in patients with follicular lymphoma: a pooled analysis of central
scan review in three multicentre studies. Lancet Haematol 2014; 1:e17–e27.
Williamson PR, Altman DG, Bagley H, et al. The COMET Handbook: version
1.0. Trials 2017; 18 (Suppl 3):280.
Wilson MK, Collyar D, Chingos DT, et al. Outcomes and endpoints in cancer
trials: bridging the divide. Lancet Oncol 2015a; 16:e43–52.
Wilson MK, Karakasis K, Oza AM. Outcomes and endpoints in trials of cancer
treatment: the past, present, and future. Lancet Oncol 2015b; 16:e32–e42.
Statistical Issues (Including
Subgroups, Time-To-Event 8
Analyses, Multiplicity)
E. Hoster
Department of Internal Medicine III, University Hospital Munich;
IBE - Institute for Medical Information Processing, Biometry and
Epidemiology, Ludwig-Maximilians-University Munich, Munich, Germany
Introduction
Clinical research in oncology is performed through clinical studies,
which collect data from groups of patients having specific characteristics
(see Chapters 5 and 6). The data to be collected, the extent of the sample
size to have an adequate probability to detect a clinically relevant effect
in the trial, if one exists, the number and type of variables on which to
perform the analyses (see Chapter 7) and the analyses carried out once the
planned accrual is reached depend on statistical methods. Statistical meth-
ods allow researchers to return results that can enhance our knowledge.
The variables and effects or associations are commonly subject to
random variation or variations due to unknown factors. For example,
although there might be evidence that clinical variables such as age and
stage are associated with survival in a certain cancer type, unexplained or
random variation makes it impossible to predict the individual survival
time of a specific patient. Statistical methods applied in clinical research
are useful because they help us to distinguish systematic effects from
unexplained or random variation observed in empirical data. There
are many statistical methods, but some basic concepts are encountered
repeatedly. These are discussed in this chapter, because knowledge and
understanding of these concepts is essential for interpreting publications
of oncology research.
104
While reading a report of an oncology study, the following questions
related to statistical aspects can guide you in judging the validity and the
generalisability or applicability of the results.
■ Are the data adequately described?
■ Which quantities have been estimated?
■ Which statistical tests have been performed?
■ How is the Type-I error controlled?
■ Is the statistical power adequate?
For the purpose of illustration, these questions are examined using the pri-
mary publication of the RESONATE trial (Byrd et al, 2014). This was an
international randomised Phase III trial (see Chapter 6), investigating the
efficacy and safety of ibrutinib compared with ofatumumab in patients
with relapsed or refractory chronic lymphoid leukaemia (CLL) or small
lymphocytic lymphoma (SLL) who were at risk for a poor outcome. The
primary endpoint was progression-free survival (PFS), with overall
survival (OS) and best overall response (BOR) as secondary endpoints.
Are the Data Adequately Described?

The description of the baseline variables characterises the groups of
patients who were included in the trial. In randomised trials, the descrip-
tion of baseline characteristics by treatment groups aims to show whether
randomisation succeeded in generating groups with comparable prognos-
tic profiles, although it is worth noting that with simple randomisation there
may be imbalances between the groups for some participant subgroups
due to chance. In Table 1 of the publication of the RESONATE trial, the
characteristics of the patient cohort were summarised (see Table 1). As for
many statistical methods, the way in which variables of interest (both base-
line and follow-up) are described primarily depends on the data type.
Categorical variables like sex or type of previous therapy were described
by absolute counts and relative frequencies; numerical values like age or
lymphocyte count were described by median and range. The outcome
variables PFS and OS have a different data type (see Chapter 7); these
are so-called right-censored time-to-event variables and were described
Statistical Issues (Including Subgroups,Time-To-Event Analyses, Multiplicity) 105

Table 1 Characteristics of the Patients at Baseline in the RESONATE Trial*
From Byrd JC, Brown JR, O’Brien S, et al. Ibrutinib versus ofatumumab in previously treated chronic
lymphoid leukemia. N Engl J Med 2014; 371:213-223. Copyright © 2014 Massachusetts Medical Society.
Reprinted with permission from Massachusetts Medical Society.
Characteristic Ibrutinib Ofatumumab

(N = 195) (N = 196)
Patients with small lymphocytic lymphoma — no. (%) 10 (5) 8 (4)
Median age (range) — years 67 (30–86) 67 (37–88)
Male sex — no. (%) 129 (66) 137 (70)
Cumulative Illness Rating Scale score >6 — no. (%)† 38 (32) 39 (32)
Creatinine clearance <60 ml/min — no. (%) 62 (32) 61 (31)
Median haemoglobin (range) — g/dL 11 (7–16) 11 (6–16)
Median platelet count (range) — per mm3 116 500 122 000
(20 000–441 000) (23 000–345 000)
Median lymphocyte count (range) — per mm3 29 470 29 930
(90–467 700) (290–551 030)
ECOG performance status — no. (%)‡
0 79 (41) 80 (41)
1 116 (59) 116 (59)
Bulky disease ≥5 cm — no. (%)§ 124 (64) 101 (52)
Interphase cytogenetic abnormalities — no. (%)
Chromosome 11q22.3 deletion 63 (32) 59 (30)
Chromosome 17p13.1 deletion ¶ 63 (32) 64 (33)
β2-microglobulin >3.5 mg/litre — no. (%) 153 (78) 145 (74)
Previous therapies
Median no. (range) 3 (1–12) 2 (1–13)
≥3 — no. (%) 103 (53) 90 (46)
Type of therapy — no. (%)
Alkylator 181 (93) 173 (88)
Bendamustine 84 (43) 73 (37)
Purine analogue 166 (85) 151 (77)
Anti-CD20 183 (94) 176 (90)
Alemtuzumab 40 (21) 33 (17)
Allogeneic transplantation 3 (2) 1 (1)
Median time from last therapy (range) — months 8 (1–140) 12 (0–184)
Resistance to purine analogues — no. (%)‫ۅ‬ 87 (45) 88 (45)
* There were no significant differences between the two groups at baseline, except with respect to the presence of bulky
disease of 5 cm or more (P = 0.04) and the median time from last therapy (P = 0.02).
†
Scores on the Cumulative Illness Rating Scale range from 0 to 52, with higher scores indicating worse health status. Scores on
this test were required only for patients 65 years of age or older, and coexisting illnesses were not included in the scoring.
‡
Scores on the Eastern Cooperative Oncology Group (ECOG) performance status range from 0 to 5, with higher scores
indicating greater disability.
§
Measurement was based on the largest diameter of the longest lymph node at screening, according to the assessment of the
independent review committee.
¶
Patients were stratified at randomisation according to the presence or absence of this genetic abnormality.
‫ ۅ‬Resistance was defined as no response or a relapse within 12 months after the last dose of a CD20-based chemoimmuno-
therapy regimen that included a purine analogue.
106 Hoster
by Kaplan-Meier estimates. In principle, PFS is numerical, measured
as the time between randomisation and the occurrence of progression or
death. However, waiting until a PFS event has occurred in every patient
would require an unacceptable study duration, as there are very often a
few exceptionally good responders who will live for a long time. On the
other hand, methods such as Kaplan-Meier estimation and Cox regression
are used to analyse time-to-event data taking into account that the event
has not yet been observed in all patients. For these patients, instead of the
unobserved survival time, the total observation time during the study is
used for the analysis. Because this observation time is a lower bound of the
unobserved total survival time, it is called a right-censored survival time.
To obtain unbiased results with this method, individuals with censored sur-
vival time need to have had the same risk of failure at the time of censoring
as all individuals under observation at that time, i.e. the reason for censor-
ing must not be associated with the event probability. This assumption is
called ‘independent censoring’ (Kalbfleisch and Prentice, 1980), which
can be assumed when individuals enter their treatment group randomly
throughout the duration of the recruitment period and are followed until a
pre-specified data cut-off. In contrast, censoring due to loss to follow-up,
drop-out or documentation delays can give seriously biased results, even
with Kaplan-Meier estimation and Cox regression.
In the Kaplan-Meier plot of OS stratified by treatment arm for the RESO-
NATE trial (Figure 1), the distribution of censored survival times, varying
between 7 and 17 months, reflects the recruitment period from June 2012
until April 2013. Furthermore, almost all patients had been followed for at
least seven months, suggesting a fixed data cut-off calendar date at the end
of 2013, so independent censoring can be assumed for OS. Similarly, the
Kaplan-Meier curves for PFS (Figure 2) show censoring times depicting
the recruitment period and a fixed data cut-off for the reported analysis. In
contrast to OS, events and censoring for PFS mainly occurred around fixed
observation times of approximately 3, 5, 8, 11 and 14 months. This pattern
suggests that, in contrast to the survival status, which is assessed at each
patient contact (and deaths can occur and be recorded at any time between
patient contacts), the progression status was only assessed and recorded at
patient visits that were scheduled every 12 weeks.

Figure 1 Overall survival by treatment arm in the RESONATE trial.
From Byrd JC, Brown JR, O’Brien S, et al. Ibrutinib versus ofatumumab in previously treated chronic lymphoid
leukemia. N Engl J Med 2014; 371:213-223. Copyright © 2014 Massachusetts Medical Society. Reprinted with
permission from Massachusetts Medical Society. Abbreviations: CI, Confidence interval; NE, not estimable.
100
90
80
Progression-free Survival (%)
70 Ibrutinib
60
50
40
30
Hazard ratio for progression
20
or death, 0.22 (95% CI, 0.15–0.32)
Ofatumumab
10 P< 0.001 by log-rank test
0
0 3 6 9 12 15
Months
No. at Risk
Ibrutinib 195 183 116 38 7
Ofatumumab 196 161 83 15 1 0
Figure 2 Progression-free survival by treatment arm in the RESONATE trial.

From Byrd JC, Brown JR, O’Brien S, et al. Ibrutinib versus ofatumumab in previously treated chronic lymphoid
leukemia. N Engl J Med 2014; 371:213-223.. Copyright © 2014 Massachusetts Medical Society. Reprinted with
permission from Massachusetts Medical Society. Abbreviation: CI, Confidence interval.
108 Hoster
In summary, information on the recruitment period, data cut-off, amount
of and reasons for censoring of survival times, as well as the distribution
of censored survival times stratified by treatment group are important to
identify whether the analysis of right-censored time-to-event endpoints
might be biased.
Which Quantities Have Been Estimated?

In the RESONATE trial (Byrd et al, 2014), estimations for PFS and OS
at different time points and BOR rates were reported, stratified by treat-
ment group. For example, the 6-month PFS rate in the ibrutinib group was
estimated as 88% compared with 65% in the ofatumumab group. Further-
more, the treatment effect was estimated as a PFS hazard ratio of 0.22,
corresponding to a relative reduction of the hazard for progression or death
in the ibrutinib group of 78% compared with the ofatumumab group. The
final aim of the statistical description of outcome data like PFS is usually
to generalise the results from the study cohort to the underlying patient
population. This generalisation is only justified if the patients are repre-
sentative for the whole patient population. Furthermore, the reported PFS
hazard ratio of 0.22 between the treatment groups is a point estimate that
depends on the specific patient cohort in the trial. If the trial was repeated,
it might recruit by chance a somewhat different sample of the underlying
population and might obtain a different estimate for the PFS hazard ratio.
Therefore, to generalise the results from one trial to the underlying popula-
tion, we need to know how variable the results are, taking into account that
we have observed just one sample.
Statistical estimation usually provides an estimate for the variability
of this point estimate, as well as the point estimate itself. This variabil-
ity is best reported by a confidence interval. For the RESONATE trial
(Byrd et al, 2014), the 95% confidence interval for the PFS hazard ratio
was reported as 0.15 to 0.32. This means that if the trial was repeated
many times with different random samples of the same size, the 95%
confidence interval would include the true (but unknown) value of the
PFS hazard ratio about 19 times out of 20. With a single 95% confidence
interval from a single trial, there is a 95% probability that the true PFS

hazard ratio is covered, or that there is a 5% risk that a statement that
the true hazard ratio for patients similar to those in the trial lies between
0.15 and 0.32 is wrong. This is based on the notion of randomly sam-
pling another group of patients similar to those observed in the trial. The
probability connected with the confidence interval (e.g. 95% for the 95%
confidence interval) relates to the error probability (5%) or confidence
(95%) that the true value lies inside the estimated confidence interval,
but not on a potential probability distribution of the true hazard ratio.
Which Statistical Tests Have Been Performed?

For the RESONATE trial (Byrd et al, 2014), statistical hypothesis tests
were performed for the group comparisons of PFS, OS and the BOR rate.
The conclusion of the trial was that ibrutinib, as compared with ofatu-
mumab, improved PFS, OS and BOR among patients with previously
treated CLL or SLL. Figure 2 shows that individual variability of PFS is
substantial. Thus, when we compare PFS between treatment groups in a
randomised trial, we should always expect PFS differences that are due to
random variation as well as due to the true treatment effect. This random
variation in the data means that it is generally not possible to definitively
prove the hypotheses of interest for the study. Rather, the statistical test
allows us to accept or reject that hypothesis with controlled error prob-
abilities associated with this decision.
In the setting of a statistical hypothesis test, we usually call the hypothesis
that postulates observed differences to be only the result of random varia-
tion as the null hypothesis (i.e. the assumption that there is no true differ-
ence in the effects of the treatments being compared). In the RESONATE
trial, the null hypothesis for the primary question can be expressed as
‘ibrutinib has the same effects on PFS as ofatumumab’. The typical deci-
sion following a statistical hypothesis test is to either reject or fail to reject
the null hypothesis. Rejecting the null hypothesis means accepting the
alternative hypothesis, the converse of the null hypothesis (i.e. that there
is a difference in the effects of the treatments). In the primary comparison
of the RESONATE trial, the alternative hypothesis can be described as
‘ibrutinib and ofatumumab have different effects on PFS’.
110 Hoster
The decision we take following a statistical test may be correct or incor-
rect, but it is in general impossible to prove whether we were right, even
if we could re-run the trial many times. Instead, the statistical test enables
us to control the error probabilities associated with the decision. We distin-
guish two types of error: Type I and Type II.
■ A Type-I error occurs when we reject a true null hypothesis (i.e. we
conclude that there is a difference when the truth is that the treatments
have the same effect).
■ A Type-II error occurs when we fail to reject a false null hypothesis
(i.e. we fail to reject the hypothesis that the effects of the treatments are
the same, when they are truly different).
How is the Type-I Error Controlled?

In a statistical test, the likelihood of a Type-I error is usually directly
controlled by specifying in advance the significance level, often referred
to as α (alpha). If the significance level is set to 0.05, we accept 5%
probability that the test rejects the null hypothesis when it is true. Simi-
larly, if the significance level is set to 0.01, we accept 1% probability that
the test rejects the null hypothesis when it is true. With a pre-specified
null hypothesis and significance level, we call a test result ‘statistically
significant’ if we reject the null hypothesis.
In the RESONATE trial, the pre-specified significance level was α=0.05
(Byrd et al, 2014). This means that if ibrutinib had the same PFS effect as
ofatumumab, the probability (based on the observed data) that we wrongly
accepted the hypothesis that ibrutinib has a different effect is 5%.
One interim analysis of the RESONATE trial was pre-planned to allow
early stopping of the trial if there had been unexpectedly large differences
between the treatment groups. Interim analyses are an example of multi-
ple testing. Other examples are the confirmatory evaluation of multiple
endpoints related to the same question, or the comparison of more than
two treatment groups. When more than one statistical test is performed to
examine one hypothesis, we need to have a decision strategy based on the
results of the single tests. One such strategy is to reject the null hypothesis

if at least one single test rejects its null hypothesis. This strategy is often
used for interim analyses: as soon as one interim analysis rejects the null
hypothesis, the global null hypothesis of the trial is rejected. If this hap-
pens before the planned final analysis, the trial might be stopped earlier
than intended. With multiple testing, the control of the Type-I error rate
needs to be adapted, because each single test contributes to the Type-I error
for the whole analysis. For example, if the same significance level of 0.05
is used for both a single interim analysis and the eventual final analysis,
the overall probability to falsely reject a true null hypothesis would be
about 8.3% (McPherson, 1974). Therefore, to preserve the overall Type-I
error probability when multiple significance tests will be done of the same
hypothesis, the significance levels for the single tests need to be reduced.
In the RESONATE trial (Byrd et al, 2014), the statistical test for the pri-
mary endpoint at the interim analysis used a reduced significance level of
0.028. This does not mean that the overall significance level was 0.028,
but the formal significance level for the interim analysis itself was set to
0.028 to maintain the pre-specified, overall Type-I error probability of
5% for the procedure including one interim analysis. More specifically,
the RESONATE trial used the O’Brien-Fleming boundary (DeMets
and Lan, 1994) to control the Type-I error probability. These boundaries
are popular because there is no need to increase the sample size for the
trial to maintain the statistical power, even with multiple interim analy-
ses. On the other hand, these boundaries allow early stopping of a trial
only when the effects are substantially larger than anticipated.
Is the Statistical Power Adequate?

Any oncology trial is expected to have an adequate probability to detect a
clinically relevant effect if one exists. In a statistical test, the probability
of rejecting the null hypothesis when there truly is a difference in the
effects of the treatments is called the statistical power. This is the com-
plement of the probability for a Type-II error. A statistical test restricts
the probability of a Type-I error by means of the significance level but it
does not control the probability of a Type-II error. Rather, the statistical
112 Hoster
power largely depends on the true (but unknown and unalterable) effect
size and on the trial’s sample size. To assess the power of a statistical
test, one must have an idea about the smallest difference in the effects
of the treatments that would be considered clinically relevant. Statistical
methods allow the calculation of the sample size needed to detect such
a clinically relevant effect, with a pre-specified statistical power and
significance level.
For the RESONATE trial (Byrd et al, 2014), the sample size calculation
used a 90% probability to decide against the null hypothesis if the true
hazard ratio is 0.60, where this would correspond to an improvement in
6-month PFS rates from 65% to 77.2%. Table 2 shows the approximate
number of events or observations needed to detect selected effect sizes,
with a power of 90% in a two-group comparison and with a significance
level of 0.05. For example, the approximate number of events needed to
detect a PFS hazard ratio of 0.60 with 90% statistical power using the
logrank test with a significance level of 0.05 can be roughly calculated
to be 160. Since the RESONATE trial stopped at the interim analysis, the
observed number of events was probably lower, but it still had adequate
statistical power by the use of the O’Brien-Fleming boundary.
In the report of a trial’s results, any time a p-value is reported or a dif-
ference is called ‘statistically significant’ or ‘not statistically significant’,
a statistical hypothesis test has been performed, but this does not neces-
sarily mean that a sample size calculation was done in advance for the
specific question. A rough check of the statistical power to detect a rel-
evant difference with the obtained sample can be helpful to distinguish
the lack of a relevant effect from lack of statistical power in the inter-
pretation of those exploratory (hypothesis-generating) results, especially
when ‘non-significant’ results are reported without a preceding sample
size calculation. For example, while its sample size calculation was
based on the primary endpoint PFS, the sample size actually achieved
in the RESONATE trial (391 patients) was sufficient to detect with 90%
power a difference in grade 3/4 infection rates not smaller than 15%
(20% vs. 35%, see Table 2).

Table 2 Approximate Number of Events/Observations Needed to Detect an Effect of
the Indicated Size with a Power of 90% in a Two-group Comparison (Allocation Ratio 1:1
Unless Indicated Otherwise) with a Significance Level of 5%
Calculated using the freely available computer program PS-Power and Sample Size (Dupont and Plummer,
1990; available at http://biostat.mc.vanderbilt.edu/wiki/Main/PowerSampleSize) or, for the number of events
needed, applying the formula No. of events=4(1.96+1.28)2/ln(hazard ratio)2 (Schoenfeld, 1983).
Reading examples:
1) To detect a hazard ratio of 0.60, with significance level 5% and statistical power 90%, in two groups of equal
size, 161 observed events are needed.
2) To detect a difference in response rates of 40% vs. 60%, with a significance level of 5% and a statistical
power of 90% in two groups of equal size, 278 observations are needed.
3) To detect a difference in response rates of 40% vs. 60% with a significance level of 5% and a statistical
power of 90% in two groups of sizes 3:1, 372 observations are needed.
4) To detect a difference of half a standard deviation with a significance level of 5% and 90% power in two
groups of equal sizes, 170 observations are needed.
Statistical Effect Sample Effect Sample Effect Sample Effect Sample Effect Sample Effect Sample
test size size size size size size
(type of
endpoint)
Logrank HR 3786 HR 844 HR 508 HR 262 HR 161 HR 87
test (time- 0.90 (events) 0.80 (events) 0.75 (events) 0.67 (events) 0.60 (events) 0.50 (events)
to-event)
Fisher’s 40% vs. 40% vs. 40% vs. 40% vs. 40% vs. 40% vs.
exact test 45% 4186 50% 1076 55% 488 60% 278 70% 124 80% 68
(binary) 30% vs. 30% vs. 30% vs. 30% vs. 30% vs. 30% vs.
35% 3766 40% 992 45% 460 50% 268 60% 124 70% 72
20% vs. 20% vs. 20% vs. 20% vs. 20% vs. 20% vs.
25% 3008 30% 824 35% 394 40% 236 50% 114 60% 68
10% vs. 10% vs. 10% vs. 10% vs. 10% vs. 10% vs.
15% 1916 20% 572 25% 292 30% 184 40% 96 50% 60
5% vs. 5% vs. 5% vs. 5% vs. 5% vs. 5% vs.
10% 1240 15% 414 20% 226 25% 150 35% 82 45% 54
1% vs. 1% vs. 1% vs. 1% vs. 1% vs. 1% vs.
6% 642 11% 270 16% 168 21% 118 31% 70 41% 50
40% vs. 40% vs. 40% vs. 40% vs. 40% vs. 40% vs.
60% 60% 60% 60% 60% 60%
(6:1)* 560 (5:1)* 498 (4:1)* 430 (3:1)* 372 (2:1)* 312 (1:1)* 278
t-test
(normally
distributed) 1/4 SD 674 1/3 SD 380 1/2 SD 170 2/3 SD 96 1/1 SD 44 3/2 SD 20
*allocation ratio. Abbreviations: HR, hazard ratio; SD: standard deviation.
114 Hoster
Subgroup Analyses
For the RESONATE trial, PFS effects were shown for subgroups defined by
demographic, clinical, biological and pre-treatment characteristics (Fig-
ure 3). Since an effect in the full patient group was observed, subgroup
analyses ask the question whether the effect is of different size in differ-
ent subgroups of patients. For example, the PFS hazard ratio was similar
in men (0.22) and women (0.21), but the observed effect was larger for
patients treated in the USA (0.12) compared with those treated in Europe
and Australia (0.34). A heterogeneity test can be used to judge the prob-
ability of whether this difference in effect size between subgroups might
have occurred due to random variation. For the RESONATE trial, the heter-
ogeneity test for geographical regions was the only statistically significant
one reported among all the heterogeneity tests for the subgroup analyses.
The statistical issues outlined so far in this chapter also apply to subgroup
analyses. Since subgroups are smaller than the total group, the statistical
power will usually not be sufficiently high to test the trial’s primary hypoth-
esis in a subgroup in a confirmatory way. Therefore, p-values for effects in
the individual subgroups should be interpreted with great caution. Instead,
one should look for results of heterogeneity analyses to assess whether the
observed variations in the effects in the different subgroups are likely to be
due to random variation, rather than true differences between the groups.
However, there might be insufficient statistical power to detect heterogene-
ity in a trial powered to detect the effect in the whole sample. Since several
subgroups are often considered, multiple testing is also a critical issue for
the interpretation of these results. For example, it is misleading if a trial
only reports the subgroups with ‘significant’ results without specifying the
number of tests performed (Kirkham et al, 2010).
Subgroup analyses can give an indication that results are consistent across
subgroups or identify differences that suggest potential predictive factors
for treatment efficacy. Statistical tests for subgroups, including statistical
tests for heterogeneity of effects, should generally be considered hypoth-
esis generating. Finally, subgroup analyses should be interpreted with
greatest caution when the null hypothesis was not rejected in the trial as
a whole, because the probability of then finding significant effects in sub-
groups just by random variation is expected to be highly inflated.
Subgroup No. of Patients Hazard Ratio (95% CI)
All patients 391 0.21 (0.1 4–0.31)
Disease refractory to purine analogues
Yes 175 0.18 (0.1 0–0.32)
No 216 0.24 (0.1 5–0.40)
Chromosome 17p13.1 deletion
Yes 127 0.25 (0.1 4–0.45)
No 264 0.19 (0.1 2–0.32)
Age
<65 yr 152 0.17 (0.0 9–0.31)
≥65 yr 239 0.24 (0.1 5–0.40)
Sex
Male 266 0.22 (0.1 3–0.35)
Female 125 0.21 (0.1 1–0.40)
Race
White 351 0.21 (0.1 4–0.31)
Nonwhite 40 0.27 (0.0 7–0.96)
Geographic region
United States 192 0.12 (0.0 7–0.23)
Europe or other 199 0.34 (0.2 1–0.56)
Rai stage at baseline
0, I, or II 169 0.19 (0.1 0–0.37)
III or IV 222 0.22 (0.1 3–0.35)
ECOG score at baseline
0 159 0.26 (0.1 4–0.48)
1 232 0.18 (0.1 1–0.30)
Bulky disease
<5 cm 163 0.24 (0.1 3–0.44)
≥5 cm 225 0.19 (0.1 2–0.31)
No. of prior treatment regimens
<3 198 0.19 (0.1 0–0.36)
≥3 193 0.21 (0.1 3–0.34)
Chromosome 11q22.3 deletion
Yes 122 0.14 (0.0 6–0.29)
No 259 0.26 (0.1 6–0.40)
β 2-microglobulin at baseline
≤3.5 mg/liter 58 0.05 (0.0 1–0.39)
>3.5 mg/liter 298 0.21 (0.1 4–0.33)
0.001 0.03 1 3 5 10
Ibrutinib Better Ofatumumab

Better
Figure 3 Subgroup analyses for progression-free survival in the RESONATE trial. Shown
are forest plots of hazard ratios for death or disease progression among subgroups
of patients in the ibrutinib group and the ofatumumab group. The size of the circle is
proportional to the size of the subgroup. The dashed vertical line indicates the overall
treatment effect for all patients. The only test for heterogeneity that was significant was
for geographical region (p = 0.02), although the treatment effect remained significant
within each region (p <0.001).
From Byrd JC, Brown JR, O’Brien S, et al. Ibrutinib versus ofatumumab in previously treated chronic lymphoid leukemia.
N Engl J Med 2014; 371:213-223. Copyright © 2014 Massachusetts Medical Society. Reprinted with permission from
Massachusetts Medical Society. Abbreviations: CI, Confidence interval; ECOG, Eastern Cooperative Oncology Group.
116 Hoster
Conclusions
Statistical methods are used with the data observed in clinical studies
to distinguish random variation from true differences between groups
in the effects of interest. The specific methods used should correspond
to the scientific questions and the type of the data analysed. Knowledge
of the general principles of statistical hypothesis tests is important to
judge the validity of the reported results. In addition, with respect to the
applicability and reliability of the results for future patients, one should
check whether the group of patients in the study are representative for
the future population and whether study conduct and data analysis were
performed as pre-specified.
Dr Hoster has reported no conflict of interest.
Further Reading
Bender R, Bunce C, Clarke M, et al. Attention should be given to multiplicity
issues in systematic reviews. J Clin Epidem 2008; 61:857–865.
Clarke M, Halsey J. DICE 2: a further investigation of the effects of chance in
life, death and subgroup analyses. Int J Clin Pract 2001; 55:240–242.
DeMets DL, Lan KK. Interim analysis: the alpha spending function approach.
Stat Med 1994; 13:1341–1352; discussion 1353–1356.
EMA, ICH Topic E 9 Note for Guidance on Statistical Principles for Clinical
Trials (CPMP/ICH/363/96) 1998. Available from: http://www.ema.europa.eu
Kleinbaum DG, Klein M. Survival Analysis, A Self-Learning Text, 3rd edition.
New York: Springer Verlag, 2012.
Matthews JN, Altman DG. Interaction 2: Compare effect sizes not P values. BMJ
1996; 313:808.
Matthews JN, Altman DG. Interaction 3: How to examine heterogeneity. BMJ
1996; 313:862.
BMJ 2010; 340:c869.
Peto R. Current misconception 3: that subgroup-specific trial mortality results
often provide a good basis for individualising patient care. Br J Cancer 2011;
104:1057–1058.
Schulz KF, Altman DG, Moher D; CONSORT Group. CONSORT 2010 state-
ment: updated guidelines for reporting parallel group randomised trials. BMJ
2010; 340:c332.
Sterne JA, Davey Smith G. Sifting the evidence – what’s wrong with significance
tests? BMJ 2001; 322:226–231.
Sun X, Ioannidis JP, Agoritsas T, et al. How to use a subgroup analysis: users’
guide to the medical literature. JAMA 2014; 311:405–411.
Wang R, Lagakos SW, Ware JH, et al. Statistics in medicine – reporting of sub-
group analyses in clinical trials. N Engl J Med 2007; 357:2189–2194.
References
Byrd JC, Brown JR, O’Brien S, et al. Ibrutinib versus ofatumumab in previously
treated chronic lymphoid leukemia. N Engl J Med 2014; 371:213–223.
DeMets DL, Lan KK. Interim analysis: the alpha spending function approach.
Stat Med 1994; 13:1341–1352; discussion 1353–1356.
Dupont WD, Plummer WD Jr. Power and sample size calculations. A review and
computer program. Control Clin Trials 1990; 11:116–128.
Kalbfleisch JD, Prentice RL. The Statistical Analysis of Failure Time Data. New
York: John Wiley & Sons, 1980.
Kirkham JJ, Dwan KM, Altman DG, et al. The impact of outcome reporting bias
in randomised controlled trials on a cohort of systematic reviews. BMJ 2010;
340:c365.
McPherson K. Statistics: the problem of examining accumulating data more than
once. N Engl J Med 1974; 290:501–502.
Schoenfeld DA. Sample-size formula for the proportional-hazards regression
model. Biometrics 1983; 39:499–503.
118 Hoster
Systematic Reviews: A Key to
Support Evidence-Informed 9
Decision Making
I.D. Florez1
O. Levine2
M.C. Brouwers3
1
Health Research Methodology Program, McMaster University, Hamilton, Canada;
Department of Pediatrics, Universidad de Antioquia, Medellin, Colombia
2
Department of Oncology, McMaster University, Hamilton, Canada
3
Department of Oncology and Department of Health Research Methods,
Evidence and Impact, McMaster University, Hamilton, Canada; Escarpment
Cancer Research Institute, Hamilton, Canada
Introduction
A systematic review is a knowledge synthesis method that collects and
critically analyses multiple research studies on a specific topic (Higgins
and Green, 2011). The key steps of a good-quality systematic review as
described in Table 1 are: a clear research question; explicit eligibility crite-
ria; comprehensive search for and selection of studies; critical evaluation;
and interpretation of results. As shown in Table 2, a systematic review can
be performed for different scientific purposes. Some myths about system-
atic reviews are listed in Table 3. The aim of this chapter is to help readers to
better interpret the results of a systematic review and to recognise the steps
required to conduct a high-quality systematic review.
When is a Systematic Review Needed?

A comprehensive search for a systematic review is warranted when look-
ing for evidence to inform your clinical practice or when deciding to con-
duct a review yourself. Most bibliographic search engines will provide an
119
Table 1 Key Steps to Perform a Good Systematic Review
• PICO (details in Case 1): definition of research question
• Precise description and application of eligibility criteria
• Summary of literature search and definition of study selection process
• Explicit definition of data to be extracted
• Application of a critical quality assessment strategy for the individual selected studies (e.g. RoB tool)
• Appropriate strategies of synthesis of the results (e.g. qualitative description, meta-analysis).
• Evaluation of the overall body of evidence (e.g. GRADE approach)
• Interpretation and conclusions in balance with body of the evidence
• Further:
• Inclusion of search strategy (often in the appendix) and flow diagram of the search and selection process of
studies
• Inclusion of tables with description of study characteristics, quality of studies, study findings and quality of body
of evidence
• Inclusion of funnel plots and forest plots, when appropriate
Abbreviations: PICO, Patients/population, Intervention, Control/comparator, and Outcomes; RoB, Risk of Bias;
GRADE, Grading of Recommendations, Assessment, Development and Evaluation.
Table 2 Use and Significance of a Systematic Review

• Determine current knowledge in the field of the body of research. Particularly helpful when individual studies on
the same topic provide inconsistent findings or include low numbers of participants
• Identify research gaps and duplications in research effort
• Form evidence base for clinical practice guidelines or health care policy
• Justify the need for a new trial
• Faster and cheaper than conducting a new study
Table 3 Myths about Systematic Reviews

• You cannot conduct a systematic review if you do not have randomised trials. FALSE. Systematic reviews are
not dependent on a particular study design. You need to choose study designs that are most appropriate to
answer the research question. However, in reviews of the effects of interventions, selection of randomised
trials can facilitate formulation of more definitive conclusions.
• Systematic reviews and meta-analyses are the same thing. FALSE. Meta-analysis is a quantitative technique used
to synthesise results of studies in a review. Many systematic reviews do not include meta-analyses because of
the high degree of heterogeneity between the selected studies.
• Systematic reviews can compensate for problems inherent in the poor-quality primary studies they include. FALSE.
Even a really well-conducted systematic review cannot fix problems of poorly conducted primary studies.
• Systematic reviews are the only knowledge synthesis method available. FALSE. There are many knowledge
synthesis methods, including those that focus on synthesising qualitative studies or those aimed at looking at
the literature to build a new concept or idea. See Further Reading for more information.
120 Florez et al.

option to limit a search by publication type, while others are specifically for
reviews (e.g. Cochrane Library, epistemonikos.org). In addition, there is a
prospective registry of health-related systematic reviews that are in devel-
opment (http://www.crd.york.ac.uk/PROSPERO/). Finding an appropriate
existing review, or one in development, may avoid duplication in effort and
save a considerable amount of work, time and expense.
If a relevant systematic review exists, you will need to ask yourself if it
is appropriate to use by considering its quality, currency and relevancy.
■ Is it of good quality? Important methodological elements that can dif-
ferentiate systematic reviews of varying quality are described below.
■ Does it apply to the current situation? Have new primary studies emerged
since the most recent search? Do the results of these studies change the
conclusions or summary effect estimates of the systematic review?
■ Is it relevant in the current clinical landscape? Is the object of study,
such as a treatment, clinically relevant? For example, an influential post-
operative radiotherapy (PORT) meta-analysis in resected lung cancer
demonstrated that radiation is detrimental for patients with early-stage
disease (PORT Meta-analysis Trialists Group, 2005). However, the use
of cobalt-60 (an outdated radiation technology) in some of the included
trials raises a question of whether these conclusions are still valid and
whether modern radiation therapy techniques could perform better.
After searching and appraising the existing literature, you may find that a
systematic review is still needed to answer a clinical question.
Example 1
Until a few years ago, conflicting study results caused much contro-
versy around the use of bisphosphonate medications in the adjuvant
setting for women with early-stage breast cancer. Ultimately, a well-
conducted systematic review of the literature and individual patient
data meta-analysis including data for more than 18 000 women pro-
vided some clarity. Pooled results showed benefit in recurrence and
survival outcomes with use of bisphosphonates among postmenopau-
sal women, whereas benefit in premenopausal women was not estab-
lished (Early Breast Cancer Trialists’ Collaborative Group, 2015a).
Systematic Reviews: A Key to Support Evidence-Informed Decision Making 121

Example 2
A systematic review evaluating surgical management of patients with
lymph node metastases from cutaneous melanoma of the trunk or
extremities was used as the evidentiary basis for a clinical practice
guideline, which was released in 2012 and endorsed in 2016 (Easson
et al, 2012). In 2017, a new study publication contradicted the find-
ings of the review (Faries et al, 2017); indeed, the new high-quality
trial found that complete lymph node dissection was not a good option
for patients with a positive sentinel lymph node, as had been con-
cluded earlier in the systematic review.
These examples show that systematic reviews require updating over time
to ensure the completeness of the evidence base and validity of the con-
clusions that can be made.
Formulating the Question

A well-defined clinical question is essential for a high-quality system-
atic review. When defining the question, much attention should be paid
to definition of the Population, Interventions or exposures, Comparators
and Outcomes (PICO) (see Case 1).
Defining Eligibility Criteria

To identify which studies and sources of data would be eligible for the
review, a set of criteria is required. Inclusion criteria depend on the
defined question and are based on the elements of the PICO research ques-
tions (see Case 1). Furthermore, characteristics of eligible studies should
be specified, such as study design (randomised trials [and what phase],
observational studies [and what kind]); year of conduct; publication sta-
tus (full papers alone or with conference abstracts or other types of ‘grey’
literature); and language of publication. The choice of eligibility criteria
and their parameters have a major impact on the complexity of the review,
the time needed to complete it, the degree of difficulty in the interpretation
of the data, and the kind of the conclusions that can be drawn.
122 Florez et al.

Case 1 PICO and Generation of a Good Systematic Review Question
PICO criteria
Population: Should define the disease parameters (e.g. tumour site,
stage), patient characteristics (e.g. age, gender) and other clinical fea-
tures (e.g. performance status, presence of comorbid conditions, prior
exposure to therapies).
Interventions or exposures: Should define the condition being inves-
tigated (e.g. diagnostic procedure, treatment). The options of inter-
est should be similar enough to enable synthesis. For example, some
technologies (X-ray and low-dose computed tomography [CT] scans)
are in principle similar but different enough that combining them
together is, from a clinical point of view, not meaningful. In contrast,
classes of medications such as aromatase inhibitors may be similar
enough that all drugs in the class could be regarded as sufficiently
similar and synthesis performed (Early Breast Cancer Trialists’ Col-
laborative Group, 2015b).
Comparator: What is the most appropriate and relevant option to
which the intervention should be compared? If a new treatment option
is being considered, it might be compared to placebo or to the current
standard of care. If a standard of care exists, will a comparison of the
new treatment to placebo yield clinically useful information?
Outcomes: The question should indicate the primary and second-
ary outcomes of interest. Defining and applying common methods to
measure the outcomes are essential. Systematic reviews of cancer tri-
als often assess survival outcomes (e.g. overall survival, recurrence-
free survival, progression-free survival), response (e.g. complete
and partial responses), quality of life and adverse effects. Criteria
have emerged to help to better define outcomes (e.g. RECIST, Eisen-
hauer et al, 2009) and should be used when possible. Initiatives such
as COMET (Core Outcome Measures in Effectiveness Trials) are
facilitating the development of core outcome sets for specific areas
of health care and research.

Example: Clinical scenario
In the midst of a busy clinic, you are seeing Mrs CC, a 55-year-old
woman with metastatic colon cancer. She wishes to discuss the results
of a restaging CT scan after completing 6 months of first-line combi-
nation chemotherapy. You are satisfied with the results showing stable
disease and you recommend that she continues with the treatment.
However, she wants to know if a chemo-holiday would be a reason-
able option for her. She discloses that 6 months of chemotherapy
have been exhausting. With recent advances in chemotherapy and
targeted drugs for advanced colorectal cancer, the median survival
exceeds 2 years. Many studies are conducted to investigate stop-start
and de-intensified regimens and their impact on several outcomes.
PICO application
What would be a good research question for a systematic review that
could address this clinical challenge?
Population: Patients with advanced colorectal cancer (CRC). Since oligo-
metastatic disease is often managed with curative-intent resection, we
specify that we are interested only in patients with unresectable disease.
Intervention: Interventions involving a treatment break or a period
of maintenance therapy with a de-intensified regimen. Since multiple
treatment regimens are used for advanced CRC, we should not restrict
our search to any specific protocol.
Comparison: The comparison of interest is the use of continuous
treatment without a chemo-holiday.
Outcomes: In the presence of metastatic disease, earlier progression would
not be surprising for patients taking a chemo-holiday but does it have a
detrimental effect on overall survival? Does taking a chemo-holiday con-
tribute to an improvement of quality of life? We therefore could conclude
that the outcomes of interest are overall survival and quality of life.
A recent systematic review and meta-analysis by Berry et al (2015)
summarises the literature relating to this clinical challenge.
124 Florez et al.

Example 3
A systematic review that only includes randomised trials usually
facilitates more definitive conclusions on the effects of an interven-
tion, but, depending on the study question, data from observational
studies could also be informative, for instance in the case of long-term
adverse effects.
Search Strategy
When the research question and eligibility criteria have been defined,
a literature search is carried out to identify relevant studies. The search
strategy should be comprehensive, transparent and reproducible. A
careful selection of relevant databases to search is required and usu-
ally includes MEDLINE, Embase and the Cochrane Central Register of
Controlled Trials (CENTRAL). Additional relevant data may be obtained
by checking the references of eligible publications, searching conference
abstracts and making personal contact with experts in the field.
When conducting a search, one should be aware of publication bias:
low-volume, indeterminate or negative trials are less likely to be pub-
lished in peer-reviewed literature compared with high-volume, defini-
tive and positive trials. Trials with major impact and those showing
positive results are more likely to be published in English-language
journals. This may translate into an over-estimate of any benefit of
the intervention in a pooled analysis. A funnel plot should be used to
check for publication bias (Higgins et al, 2011). In this type of graph, a
measure of precision (such as 95% confidence interval [CI]) is plotted
against a point estimate of the effect (such as hazard ratio or odds
ratio) to explore the likelihood of missing studies of a particular size
or with a particular result (see Case 2). A full and symmetrical distri-
bution of data points in the triangle shape demonstrates low risk of
publication bias or selective outcome reporting.

Case 2 Funnel Plot
Funnel plot
10
A funnel plot helps to visually assess the risk of publication bias. In the
example below, a meta-analysis was conducted to assess disease-free
survival (DFS) and risk associated with positive margin after resection
of liver metastasis in patients with colorectal cancer. A measure of pre-
cision (standard error of log odds ratio of DFS) is plotted against effect
estimate (log odds ratio for DFS) for each of the 18 studies included in
the meta-analysis. The funnel plot shows a symmetrical and triangular
distribution suggesting the absence of publication bias. If the distribu-
tion was not symmetrical, there would be concern that the published
literature was not comprehensive and the validity of the meta-analysis
would be questioned. Small, indeterminate or negative trials are less
likely to be published in peer-reviewed literature than those that are
large, definitive and showing positive results. This pattern can often be
identified with the use of a funnel plot.
Funnel plot: Assessment for publication bias in meta-analysis of effect of positive margin
on disease-free survival after resection of liver metastasis in patients with colorectal cancer.
From Dhir M, Lyden ER, Smith LM, Are C. Influence of margins on disease free survival following hepatic
resection for colorectal metastasis: a meta-analysis. Indian J Surg Oncol 2012; 3:321-319. By permission
of Springer Nature, Indian Journal of Surgical Oncology.
Funnel plot for 18 studies included in meta-analysis

Standard error of log odds ratio
.1
.2
.3
.4
.5
-1.5 -1 -.5 0 .5 1
Log odds ratio
126 Florez et al.

Study Selection
To identify studies that meet eligibility criteria, titles and abstracts
retrieved by the search must be checked in detail. Those articles that are
potentially eligible must be obtained in full and reviewed to assess the
final eligibility for inclusion in the review. The process of study selection
is often carried out independently by two researchers, using a consensus
strategy to resolve any disagreements. Study flow diagrams should be
used to describe the process of study selection.
Assessing the Quality of the Studies and the

Body of Evidence
In order to determine the reliability of the results of eligible studies,
quality assessments should be carried out on studies that meet eligibility
criteria. Both the quality of the individual selected studies and the quality
of the whole body of evidence should be assessed.
1. Quality of the studies: The Cochrane Risk of Bias (RoB) tool is
often used for individual randomised trials (Higgins et al, 2011). This
tool evaluates the risk of key biases, namely systematic differences
between the tested groups in the trial that could influence the estimate:
selection, performance, measurement, attrition and reporting biases (see
Case 3). The RoB tool is categorised as low, unclear or high. Results of
these assessments of study quality should be illustrated by figures and
tables in the systematic review. Critical evaluation should be warranted
for all study designs, not just randomised trials, and appropriate tools
for individual study design should be used. For example, the Newcastle-
Ottawa scale is widely used for observational studies (Higgins and
Green, 2011) and a Cochrane RoB tool for non-randomised trials has
recently been developed (Sterne et al, 2016).
2. The evaluation of the whole body of evidence is generally performed
per outcome. This assessment takes into account not only the usual RoB
criteria but also additional elements. The Grading of Recommendations,
Assessment, Development and Evaluation (GRADE) approach has
emerged as an accepted method to assess the quality of the body of the
evidence; it is described later in this chapter (Guyatt et al, 2011a).
Case 3 Risk of Bias: Critical Quality Assessment of Individual Trials
Risk of Bias (RoB) criteria

Selection bias: When patients in the intervention and comparison
groups differ at baseline on important features. For example, if patients
in the control group have worse performance status than those in the
intervention group, the effects of the intervention may be overesti-
mated. To prevent this, robust randomisation and allocation proce-
dures should be implemented.
Performance bias: Where clinicians are aware of allocation of patients
and, therefore, they may provide different types of care (e.g. additional
tests performed for the control groups). To avoid this, patients and pro-
viders should be blinded to the allocation.
Measurement bias: The methods by which outcome measures are
consciously or unconsciously altered because the investigator is aware
of the group to which a patient is allocated. To reduce this bias, the
outcome investigators should be blinded to the allocation.
Attrition bias: When reasons for losses of patients during follow-up
are not clear or are related to the side effects of the interventions. Ide-
ally, losses should be low, balanced among the groups, disclosed and
strategies to mitigate them described (e.g. intention to treat analyses
and imputation methods).
Reporting bias: When the outcome results are not fully reported in the
results of the study.
Data Extraction from Studies

Extracting data from studies requires careful planning, designing and
testing of a data extraction form tailored to the study question and out-
comes of interest (see Chapter 7). The extraction form should contain:
■ General information about the study (e.g. author, title, journal, year, design)
■ Quality assessment
128 Florez et al.
■ Population characteristics
■ Intervention characteristics
■ Outcomes
It is common practice for reviewers to perform data extraction inde-
pendently with a pre-planned strategy to resolve any disagreements.
To improve transparency, the completed data extraction forms should
be included in the final report of the systematic review. In some cases,
the reviewers may need to contact the authors of the selected studies to
obtain additional information.
Synthesis
Once the data extraction is finished and the dataset is complete, it is
ready for the synthesis. The included studies should be listed in a sum-
mary table, along with their most important information. Text summaris-
ing this body of data should also be provided. This is called qualitative
synthesis and is important for all systematic reviews.
Sometimes, the individual studies are sufficiently homogeneous to justify
a quantitative synthesis using meta-analysis to provide an overall, aver-
age estimate of the effect of the intervention. Meta-analysis is the sta-
tistical combination of the results in order to obtain one summary effect
estimate for each outcome. It plots information about each study (e.g.
number of events or means and standard deviation of the outcome, in each
treatment group) and the final combined estimate with its CI in a forest
plot (see Case 4) (Lewis and Clarke, 2001).
The major issue when combining results across studies is heterogeneity,
i.e. the degree of the consistency across the studies. Heterogeneity can
be assessed visually (using the forest plot), by the Q test, or the I2 index
(Higgins and Green, 2011). When the results of the included studies are
very heterogeneous (e.g. forest plot patterns are not consistent; Q statis-
tic is significant or I2 percentage is large), it is recommended not to pre-
sent the results of a meta-analysis. Instead, the results of the studies can
be presented narratively (qualitative synthesis) or causes of heterogene-
ity might be explored statistically through subgroup analyses. These

analyses separate the studies into categories based on predetermined
patient (e.g. male versus female) or intervention characteristics (e.g. high
dose versus low dose) and meta-analysis is performed on each category
of studies separately. If the resulting subgroup I2 percentages are sub-
stantially smaller than those in the overall meta-analyses (or between
the subgroups), it can be assumed that these population features explain
heterogeneity, to some degree.
After the results have been combined, the overall quality of the body of
evidence for each outcome needs to be assessed. The GRADE approach
is a customary methodological strategy for this process (Guyatt et al,
2011a). It is based on five criteria: RoB, imprecision, indirectness,
inconsistency and publication bias (Higgins et al, 2011; Guyatt et al,
2011a, 2011b, 2011c, 2011d):
1. RoB evaluates the risk of key biases in the individual selected studies.
2. Imprecision can be determined by looking at the width of the CI
around the point estimates, and analysing how results could change if
the true effect falls to either extreme of the interval.
3. Indirectness of the evidence considers how study features compare
to the populations and interventions of interest.
4. Inconsistency refers to the heterogeneity (described above) of the
results of the individual studies.
5. Publication bias refers to the likelihood of bias because of incom-
plete inclusion of all the available evidence, for example because the
results of a study are not available or results are not available from
all studies for some of the outcomes, because of selective outcome
reporting.
The body of evidence can be graded up or down depending on the extent
to which these features are exhibited. With the use of all this information,
conclusions about the body of evidence can be drawn.
130 Florez et al.

Case 4 Forest Plot
The forest plot (Lewis and Clarke, 2001) is the most important figure of
a systematic review when a meta-analysis is conducted. In a review by
Teuffel et al (2001), 14 studies were included to determine the efficacy
of the outpatient management of patients with febrile neutropaenia. The
results for the treatment failure outcome (6 studies) are displayed in the
figure below. Each row is a study and displays the number of events,
the number of patients in each group and the effect estimate (risk ratio
[RR]), with its 95% CI. At the bottom of the figure is a final measure of
the pooled estimate (shown as the diamond) and 95% CI derived from the
6 included studies: RR 0.81; CI 0.55-1.19. Note that the CI crosses
the line of no effect (RR=1). We therefore conclude that there is not
a statistically significant difference between inpatient and outpatient
management in terms of treatment failure.
The forest plot also shows the subgroup analysis based on age, display-
ing results for adults and children separately. Finally, the I2 statistic
is displayed at the bottom of the plot for the subgroups (children and
adults) and for the total population estimate. It is 0% in all cases, which
means no statistical heterogeneity among the results.
Forest plot: Treatment failure in management of febrile neutropaenia, inpatient versus outpatient.
Teuffel O, Ethier M, Alibhai SM, et al. Outpatient management of cancer patients with febrile neutropenia: a
systematic review and meta-analysis. Ann Oncol 2011; 22:2358-2365. By permission of Oxford University
Press on behalf of the European Society for Medical Oncology.
Inpatient Outpatient Risk ratio Risk ratio
Study or subgroup Events Total Events Total Weight M-H, Random, 95% CI Year M-H, Random, 95% CI
Adults
Innes 6 60 10 66 16.5% 0/66 [0.26, 1.71] 2003
Rapoport 2 42 4 38 5.5% 0.45 [0.09, 2.33] 1999
Hidalgo 6 47 10 48 17.3% 0.61 [0.24, 1.55] 1999
Malik 19 85 19 84 47.6% 0.99 [0.56, 1.73] 1995
Subtotal (95% CI) 234 236 86.9% 0.79 [0.52, 1.20]
Total events 33 43
Heterogeneity: Tau2 = 0.00; Chi2 = 1.49, df = 3 (P=0.68); I 2 = 0%
Test for overall effect: Z = 1.10 (P=0.27)
Children
Ahmed 2 58 3 61 4.9% 0.70 [0.12, 4.05] 2007
Santolaya 4 71 4 78 8.2% 1.10 [0.29, 4.23] 2004
Subtotal (95% CI) 129 139 13.1% 0.93 [0.32, 2.71]
Total events 6 7
Test for overall effect: Z = 0.13 (P=0.89)
Total (95% CI) 363 375 100% 0.81 [0.55, 1.19]
Total events 39 50
Test for overall effect: Z = 1.08 (P=0.28) 0.01 0.1 1 10 100
Inpatient Outpatient
Abbreviations: CI, confidence interval; M-H, Mantel-Haenszel.

Conclusions
Systematic reviews are important scientific studies that are crucial
in facilitating the use of evidence in clinical practice with the aim of
improving population health, clinical outcome of cancer patients and the
strength of cancer healthcare systems. Like any study design, the qual-
ity of systematic reviews can be variable. This chapter should help with
assessing the quality of systematic reviews and recognising the key steps
in the process of conducting a high-quality systematic review.
Dr Florez has reported no conflict of interest.
Dr Levine has reported no conflict of interest.
Dr Brouwers has reported no conflict of interest.
Further Reading and Resources

Conducting a systematic review and meta-analysis
Cochrane. http://www.cochrane.org (24 January 2018, date last accessed).
• Methods: http://methods.cochrane.org
• RevMan Software: http://tech.cochrane.org/revman
• Training: http://training.cochrane.org
Reporting a systematic review and meta-analysis

Enhancing the QUAlity and Transparency Of health Research (EQUATOR)
Network. PRISMA: http://www.equator-network.org/reporting-guidelines/
Liberati A, Altman DG, Tetzlaff J, et al. The PRISMA statement for reporting sys-
tematic reviews and meta-analyses of studies that evaluate health care interven-
tions: explanation and elaboration. Ann Intern Med 2009; 151:W65–W94.
Moher D, Liberati A, Tetzlaff J, Altman DG, and the PRISMA Group. Preferred
reporting items for systematic reviews and meta-analyses: the PRISMA state-
ment. Ann Intern Med. 2009; 151:264–269.
Synthesis resources
Canadian Institutes of Health Research. http://www.cihr-irsc.gc.ca (24 January
2018, date last accessed).
132 Florez et al.

History of systematic reviews
Chalmers I, Hedges LV, Cooper H. A brief history of research synthesis. Eval
Health Prof 2002; 25:12–37.
Clarke M. History of evidence synthesis to assess treatment effects: personal reflec-
tions on something that is very much alive. J R Soc Med 2016; 109:154–163.
Page MJ, Shamseer L, Altman DG, et al. Epidemiology and reporting character-
istics of systematic reviews of biomedical research: a cross-sectional study.
PLoS Med 2016; 13:e1002028.
References
Berry SR, Cosby R, Asmis T, et al. Continuous versus intermittent chemotherapy
strategies in metastatic colorectal cancer: a systematic review and meta-analy-
sis. Ann Oncol 2015; 26:477–485.
Dhir M, Lyden ER, Smith LM, Are C. Influence of margins on disease free survival
following hepatic resection for colorectal metastasis: a meta-analysis. Indian J
Surg Oncol 2012; 3:321–329.
Early Breast Cancer Trialists’ Collaborative Group (EBCTCG). Adjuvant bispho-
sphonate treatment in early breast cancer: meta-analyses of individual patient
data from randomised trials. Lancet 2015a; 386:1353–1361.
Early Breast Cancer Trialists’ Collaborative Group (EBCTCG). Aromatase inhibi-
tors versus tamoxifen in early breast cancer: patient-level meta-analysis of the
randomised trials. Lancet 2015b; 386:1341–1352.
Easson AM, Cosby R, McCready DR, et al. Surgical management of patients with
lymph node metastases from cutaneous melanoma of the trunk or extremities.
Easson A, Salerno J (Reviewers). Toronto, ON: Cancer Care Ontario; 2012
Dec 4 [Endorsed 2016 Oct 3]. Program in Evidence-based Care Evidence-
Based Series No.: 8-6 Version 2 ENDORSED.
solid tumours: Revised RECIST guideline (version 1.1). Eur J Cancer 2009;
45:228–247.
Faries MB, Thompson JF, Cochran AJ, et al. Completion dissection or observation
for sentinel-node metastasis in melanoma. N Engl J Med 2017; 376:2211–2222.
Guyatt G, Oxman AD, Akl EA, et al. GRADE guidelines: 1. Introduction—
GRADE evidence profiles and summary of findings tables. J Clin Epidemiol
2011a; 64:383–394.
Guyatt GH, Oxman AD, Kunz R, et al. GRADE guidelines: 6. Rating the quality of
evidence—imprecision. J Clin Epidemiol 2011b; 64:1283–1293.
Guyatt GH, Oxman AD, Kunz R, et al. GRADE guidelines: 7. Rating the quality of
evidence—inconsistency. J Clin Epidemiol 2011c; 64:1294–1302.

Guyatt GH, Oxman AD, Montori V, et al. GRADE guidelines: 5. Rating the quality
of evidence—publication bias. J Clin Epidemiol 2011d; 64:1277–1282.
Higgins JP, Altman DG, Gøtzsche PC, et al. The Cochrane Collaboration’s tool for
assessing risk of bias in randomised trials. BMJ 2011; 343:d5928.
Higgins JPT, Green S (Editors). Cochrane Handbook for Systematic Reviews of
Interventions Version 5.1.0 [updated March 2011]. The Cochrane Collabo-
ration, 2011. Available from http://handbook-5-1.cochrane.org (2 February
2018, date last accessed).
Lewis S, Clarke M. Forest plots: trying to see the wood and the trees. BMJ 2001;
322:1479–1480.
PORT Meta-analysis Trialists Group. Postoperative radiotherapy for non-small
cell lung cancer. Cochrane Database Syst Rev 2005; 18:CD002142.
Sterne JA, Hernán MA, Reeves BC, et al. ROBINS-I: a tool for assessing risk of
bias in non-randomised studies of interventions. BMJ 2016; 355:i4919.
Teuffel O, Ethier MC, Alibhai SM, et al. Outpatient management of cancer patients
with febrile neutropenia: a systematic review and meta-analysis. Ann Oncol
2011; 22:2358–2365.
134 Florez et al.

Clinical Research in
Rare Cancers 10
I.M.E. Desar1
A. Constantinidou2
W.T.A. van der Graaf 1,3
1
Department of Medical Oncology, Radboud University Medical Centre,
Nijmegen, Netherlands
2
Medical School University of Cyprus and BoC Oncology Centre, Nicosia, Cyprus
3
The Institute of Cancer Research & The Royal Marsden Hospital, Sutton, UK
Introduction
Rare cancers are defined as cancers with an annual incidence of less
than 6 per 100 000. Altogether, rare cancers account for 22% of all cancer
diagnoses in Europe and 24% of people living with cancer in Europe
have a ‘rare’ cancer (Gatta et al, 2011).
Clinical research in rare cancers is challenging. Firstly, industry-
sponsored research focusses primarily on cancers associated with the high-
est potential for making financial profit. Other funders (e.g. governmental
or charity institutions) also prefer to focus on common malignancies, which
have the largest social impact. Secondly, less preclinical work is done with
rare cancers than with the more common cancers, which hampers progress
towards novel biologically-driven clinical studies (see Chapter 5).
One of the main hurdles in clinical research encompassing rare cancers is
the difficulty of conducting large randomised trials (see Chapter 6) within
reasonable timelines. Consequently, different subtypes of the rare cancer
are often grouped together, which could mask a clinical benefit limited to a
specific cancer subtype. As a result, treatment of rare cancers is commonly
based on insufficient evidence. Patients with rare cancers do not imme-
diately benefit from innovations within the cancer treatment field, as
135
innovations can be relatively slow to become treatment options for rare
cancer. On average, the 5-year relative survival is worse for patients with
rare cancers (47%) compared with 65% for patients with more common
cancers (Gatta et al, 2011).
This chapter discusses the challenges and pitfalls associated with clinical
research in rare cancers, based on past and present experience. It discusses
the interpretation of trials and describes approaches for trial methodology
that are most appropriate for rare cancers.
Challenges and Limitations in Clinical Research in

Rare Cancers
The Example of Sarcomas
The application of most classical designs for randomised trials is
extremely difficult in rare cancers, due to low numbers of patients and
the heterogeneity of the disease.
In sarcoma, for example, for many years patients with advanced disease have
received systemic chemotherapy comprising doxorubicin alone or in com-
bination with ifosfamide. There was no evidence in favour of or against the
combination compared with doxorubicin alone. The first study on doxorubicin
treatment in 36 patients with advanced sarcoma was published in 1975 (Ben-
jamin et al, 1975). Almost 40 years later the results of the first prospective
multicentre Phase III randomised trial, comparing doxorubicin alone versus
the combination with ifosfamide at a dose of 10 g/m2 plus granulocytecolony-
stimulating factor (G-CSF), were published (Judson et al, 2014). Remarkably,
it took this study 7 years to recruit 455 patients. For this academic study, coor-
dinated by the European Organisation for Research and Treatment of Cancer
(EORTC) Soft Tissue Bone Sarcoma Group, multinational collaboration was
of critical importance. Without the participation of 38 centres in 10 countries,
the necessary sample size would never have been reached. When the study
was designed, in 2003, many histological sarcoma subtypes were put together
without stratification for the main subtypes. The primary endpoint was
overall survival (OS), but this was later criticised as a complex and easily
confounded measure of therapeutic efficacy compared with progression-free
survival (PFS) and response rate (see Chapters 7 and 8). Perhaps the bar of
136 Desar et al.

treatment success should not have been set so high. The pace of the study was
hampered by differences throughout Europe in the preference for giving either
doxorubicin or the combination. This made the enthusiasm for a study with
a randomised design difficult. The study showed a significant benefit in PFS
(7.4 months [95% confidence interval (CI) 6.6–8.3] versus 4.6 months
[2.9–5.6]) for the combination arm (hazard ratio: 0.74; 95% CI 0.60–0.90),
as well as a higher response rate (26% versus 15%) for the combination arm,
but failed to show an improved OS. Toxicity was worse in the combination
arm and no patient-reported outcomes had been measured. As the trial’s design
had no stratification per subtype, its results should guide rather than dominate
the discussion of optimal treatment approaches for individual patients.
The EORTC sarcoma study represents a typical example of how diffi-
cult it is to appraise and interpret studies performed in rare tumours, even
when their design is considered ‘classic’ (Phase III randomised trial). For
example, the study was not blinded because of high costs and practical
implications. Furthermore, the ‘one size fits all’ approach in which differ-
ent tumour types are grouped together to achieve the necessary statistical
power has been shown to be suboptimal, which is certainly the case in
metastatic soft tissue sarcomas due to their heterogeneity of subtypes.
Despite some advances in drug development (see Chapter 5) and clinical trial
design, the need to further improve the understanding of the underlying molec-
ular mechanisms driving rare cancers remains critical in the development of
clinical research. International collaboration is the only solution to make any
progress. There is also a need for prospective databases with meaningful phe-
notype–genotype, histological and patient-reported outcome data. Finally,
methodologically smarter trial strategies for rare cancers are warranted.
Future Directions for Clinical Research in

Rare Cancers
Several key aspects play a role when designing a clinical trial in rare cancer:
1. The epidemiological data should be well known.
2. The estimation of patient accrual should be realistic.
3. Trial participation should be made as feasible as possible for the patient
(finding ways to minimise treatment burden, number of visits, travel distance).
Clinical Research in Rare Cancers 137

4. Patient accrual is usually easier for trials testing innovative medication.
For example, the PALETTE study (metastatic soft tissue sarcoma) with
the first oral targeted agent, pazopanib – randomised against placebo as
second-line therapy – had a far higher recruitment: 372 patients within
15 months in 72 centres worldwide (van der Graaf et al, 2012). The
recruitment was ahead of the pre-planned schedule, which was remark-
able as no crossover to active treatment was foreseen. The major attrac-
tion of this study was the opportunity to receive an oral and relatively
less toxic new drug for this disease. Moreover, the only way of receiving
this drug for free was within the context of the PALETTE study.
In addition to organisational aspects, improvements in the methodology
of trials are also necessary:
More Biologically Relevant Inclusion Criteria and Less Exclusion

Criteria for Early Phase Clinical Trials
The introduction of basket studies (studies focussing on the key drivers of
different tumour types as a target for therapy) represents a new approach
in developing clinical research, which can be particularly relevant for
rare malignancies. An example is the CREATE study (NCT01524926)
which tested the efficacy of crizotinib, which targets anaplastic lym-
phoma kinase and hepatocyte growth factor receptor in a diversity of
rare malignancies (i.e. anaplastic large cell lymphoma, inflammatory
myofibroblastic tumour, papillary renal cell carcinoma type 1, alveolar
soft part sarcoma, clear cell sarcoma and alveolar rhabdomyosarcoma).
At present, the use of sequencing platforms enables studies in histology-
agnostic specific molecularly characterised subgroups of patients.
Another way to improve the recruitment of patients with rare malignancies
in early clinical trials is to increase the span of the inclusion criteria, e.g.
by expanding age limits. Nowadays, a substantial proportion of paediatric
cancer trials either omit the upper age limit, or expand it far beyond the
classical limit of 18 years.
Selection of Endpoints
The selection of clinically relevant endpoints is crucial in conducting studies
138 Desar et al.

in rare cancers (see Chapter 7). Apart from commonly used endpoints such
as objective response rate, median PFS and OS, other endpoints such as
quality of life and improvement of range-of-motion assessments or use of
analgesics in diseases like pigmented villonodular synovitis (PVNS) could
also be considered. The acceptance of these unusual and novel endpoints
must be agreed with health registration authorities, such as the European
Medicines Agency and the US Food and Drug Administration.
As is the case for more common cancers, there is a need for biomarkers,
which are helpful to select the patients who will be likely to benefit from
treatment and those who will not. Collaboration between institutions
could lead to earlier validation of such biomarkers. For example, pro-
grammed death-ligand 1 (PD-L1) is recognised as a potential biomarker
for immune checkpoint inhibitors such as nivolumab or pembrolizumab,
but studies are needed to harmonise companion diagnostics for accurate
clinical assessment and application of PD-L1 inhibitors (Ma et al, 2016).
Finally, the road from study to reimbursement to regular prescription is a var-
ied and difficult one and, especially in the situation of rare cancers, timely
planning of registration and implementation procedures should be considered.
Smarter Trial Design

Improving trial design may help to optimise patient accrual in rare can-
cer clinical trials.
Crossover designs: In this study type, participants are randomly assigned
to the study drug or placebo for a certain intervention period. Subse-
quently, they switch to the alternative for a second intervention period.
Both the patients and the investigators are blinded to whether the drug or
placebo is being used in each period. This type of design is most suitable
for chronic conditions where no other treatment options are available. In
cancer, it may be most applicable to slowly progressing tumour types, but
is not commonly used. Recruitment is easier, because each patient is guar-
anteed exposure to the study drug. Furthermore, the study can be carried
out with half the number of patients who would be needed for a classical,
parallel-design randomised trial. However, the study duration is longer and
patients may not reach the crossover due to clinical deterioration.

Adaptive designs: In an adaptive design, researchers use data collected dur-
ing the study to dynamically adapt the design without undermining the trial’s
validity or integrity (Brown et al, 2009). Most commonly, the adaptive design
is used within one study, or for sequential parts of a study. Adaptive designs
may be analysed using classical statistics or Bayesian statistics (Admina et al,
2009; Renfro et al, 2016). In classical statistics, a choice within predetermined
options can be made at fixed interim analyses (see Chapter 8); for example:
the exclusion of a subgroup that meets the futility threshold in a previously
unselected study population. Bayesian statistics continuously use already
gathered multi-source study data in adapting the study protocol, calculating
the probability of a treatment effect (Admina et al, 2009; Renfro et al, 2016).
A particular example of an adaptive design is the multi-arm, multistage
(MAMS) design. In this design, several regimens are assessed simultane-
ously against a common control group. The multistage character encom-
passes discontinuation of patient recruitment to treatment arms that are
not showing sufficient effect, based on pre-planned lack of benefit sta-
tistics. The MAMS design reduces the number of patients and thus the
length of the total study time. A well-known trial that used this concept is
the STAMPEDE trial in prostate cancer (James et al, 2016).
Elements prone to modification in adaptive designs are sample size,
enrolment design, optimal dose and early stopping rules. These are of
importance in rare cancers and are discussed in more detail here.
Sample Size
The normal approach for large randomised trials is to determine sample size by
defining an effect size for the intervention (e.g. hazard ratio 0.70) that, if true,
would achieve a statistical power of 0.80 (see Chapter 8). In a two-stage adap-
tive design, first-stage data are used to estimate either the effect size or variance
in effect size and measures, thereby re-computing an appropriate sample size.
Enrolment Design
In cancer, response-adaptive allocation assignments can be used. The suc-
cess or failure of previous patients in each intervention group modify the
allocation probabilities of new participants in the trial, steering successive
patients to the perceived preferable treatment or, alternatively, enforcing a
140 Desar et al.

restriction on the number who have poor outcome in both arms. In conven-
tional statistics, a two-stage design with enrichment of the second stage,
based on results of the first stage, is used. A Bayesian statistics example is
discussed in the Example (Maki et al, 2007). Ranking and selection designs
can also be used, which typically incorporate two stages, with statistical
testing occurring only at the end of the second stage (Gupta et al, 2011).
Example
In 2007, the Sarcoma Alliance for Research Through Collaboration pub-
lished the results of an open-label Phase II study comparing single-agent
gemcitabine with the combination of gemcitabine and docetaxel in patients
with metastatic soft tissue sarcoma (STS) (Maki et al, 2007). The primary
endpoint of the study was tumour response, defined as complete or par-
tial response within 24 weeks, or stable disease lasting at least 24 weeks.
A Bayesian adaptive randomisation (AR) procedure was used to produce
an imbalance in the randomisation in favour of the superior treatment,
accounting for the treatment–subgroup interactions. Treatment success (S)
was defined as complete or partial response according to RECIST, and
failure (F) as progressive disease or death. Denoting the S and F prob-
abilities of these events by PS and PF, respectively, the AR procedure was
based on the weighted average P = 0.435(PS) + 0.565(1 − PF); the weights
reflect the utilities elicited from the investigators that non-failure was 30%
more important than treatment success. The AR method allowed for pos-
sible treatment–subgroup interactions for the four subgroups: leiomyosar-
coma (LMS) + prior pelvic radiation (PPR); non-LMS + PPR; LMS + no
PPR; or non-LMS + no PPR. Thus, the value of P was permitted to vary
among the four subgroups depending on treatment–subgroup interactions.
Data were collected and analysed continuously during the trial, since
the AR is based on the previously collected data. After equal random
assignment of the first 30 patients to the two treatment regimens, sub-
sequent patients were assigned treatment using the AR procedure.
Two log-normal regression models were fit. The first model included
only main effects: treatment + LMS + PPR + performance status. The
second model added treatment–covariate interactions.

As a result, 122 patients with STS were randomly assigned to gem-
citabine–docetaxel (n = 73, 60%) and to single-agent gemcitabine
(n = 49, 40%). The percentage of patients with LMS was higher in the
combination arm (40% versus 18%). The posterior probability that
the AR criterion for combination therapy was larger than for gem-
citabine alone showed superiority of combination therapy in all four
AR subgroups, in terms of the 24-week outcome. The probability that
gemcitabine–docetaxel is superior to gemcitabine is 0.98 for median
PFS and 0.97 for median OS (larger probability corresponds to greater
superiority, with 1 being the highest probability).
The primary endpoint (PR or CR after 24 weeks) was reached by 27%
versus 32% for gemcitabine alone versus combination therapy. Median
PFS was 3 months versus 6.2 months, and median OS 11.5 months
versus 17.9 months.
The AR procedure prevented 24 patients (73 minus 49) from unnec-
essary enrolment into the inferior treatment arm and also shortened
accrual time and efforts. Despite its advantages, Bayesian AR is infre-
quently used. It requires a very fast feedback of data input and action,
which makes it relatively time-consuming and less feasible for large
international trials.
Optimal Dose
More flexible designs are now also available for Phase I trials (see
Chapter 5) which use increasing dose steps according to dose-limiting
toxicity. These designs help the researchers to define the optimal rather
than the maximal tolerated dose. Adaptive designs use biomarkers and
pharmacokinetic data, as well as toxicity data, to define next dose steps.
Approved drugs can be prescribed based on a clinical biomarker. A good
example is the dosing of axitinib in metastatic renal cell carcinoma
(mRCC) based on the occurrence of hypertension (Rini et al, 2011).
142 Desar et al.

Early Stopping
Early stopping in sequential designs is based on planned interim analy-
ses with predefined criteria (see Chapter 8). Only highly significant find-
ings will lead to early cessation of a study in these interim analyses. One
example is the randomised EORTC TRUSTS study that had a seamless
Phase II/III design. It compared a classical doxorubicin schedule with
two schedules of trabectedin in first-line metastatic soft tissue sarcoma.
Although a Phase III trial was planned with the optimal trabectedin
schedule, the study was discontinued after the Phase II part because of
the poor efficacy of both trabectedin schedules (Bui-Nguyen et al, 2015).
Patient Involvement and Off-label Use

Patient advocates should be part of the team that develops a clinical trial.
This is increasingly recognised as needed by patients and researchers
and should be feasible for the participating patients. Research into rare
cancers should also consider the importance of compassionate use and
off-label use of treatments. This can generate new evidence and should
be an option in the planning of care for the future.
Conclusions
Design and conduct of clinical research in rare cancers remains challenging.
Patients with rare cancers should preferably be enrolled in exploratory proof-
of-mechanism and biomarker-led early phase studies, which are more likely
to produce meaningful results in a shorter period of time. Trial methodology
should be optimised (see Chapter 7) to assist the clinical appraisal of clinical
studies in rare tumours. Finally, registration of patient data within prospec-
tive databases and registries (see Chapter 4) may provide us with meaningful
observational data.
Dr Desar has reported no conflict of interest.
Dr Constantinidou has reported no conflict of interest.
Professor van der Graaf has received research grants from
GlaxoSmithKline and Novartis.

Further Reading
Sydes MR, Parmar MK, Masson MD, et al. Flexible trial design in practice
- stopping arms for lack-of-benefit and adding research arms mid-trial in
STAMPEDE: a multi-arm multi-stage randomized controlled trial. Trials
2012; 13:168.
References
Adamina M, Tomlinson G, Guller U. Bayesian statistics in oncology: a guide for
the clinical investigator. Cancer 2009; 115:5371–5381.
Benjamin RS, Wiernik PH, Bachur NR. Adriamycin: a new effective agent in the
therapy of disseminated sarcomas. Med Pediatr Oncol 1975; 1:63–76.
Brown CH, Ten Have TR, Jo B, et al. Adaptive designs for randomized trials in
public health. Annu Rev Public Health 2009; 30:1–25.
Bui-Nguyen B, Butrynski JE, Penel N, et al. A phase IIb multicentre study com-
paring the efficacy of trabectedin to doxorubicin in patients with advanced or
metastatic untreated soft tissue sarcoma: the TRUSTS trial. Eur J Cancer 2015;
51:1312–1320.
Gatta G, van der Zwan JM, Casali PG, et al; RARECARE Working Group. Rare
cancers are not so rare: the rare cancer burden in Europe. Eur J Cancer 2011;
47:2493–2511.
Gupta S, Faughnan ME, Tomlinson GA, Bayoumi AM. A framework for apply-
ing unfamiliar trial designs in studies of rare diseases. J Clin Epidemiol 2011;
64:1085–1094.
James ND, Sydes MR, Clarke NW, et al. Addition of docetaxel, zoledronic acid, or
both to first-line long-term hormone therapy in prostate cancer (STAMPEDE):
survival results from an adaptive, multiarm, multistage, platform randomised
controlled trial. Lancet 2016; 387:1163–1177.
Judson I, Verweij J, Gelderblom H, et al; European Organisation for Research
and Treatment of Cancer Soft Tissue and Bone Sarcoma Group. Doxorubicin
alone versus intensified doxorubicin plus ifosfamide for first-line treatment of
advanced or metastatic soft tissue sarcoma: a randomised controlled phase 3
trial. Lancet Oncol 2014; 15:415–423.
Ma W, Gilligan BM, Yuan J, Li T. Current status and perspectives in translational
biomarker research for PD-1/PD-L1 immune checkpoint blockade therapy.
J Hematol Oncol 2016; 9:47.
Maki RG, Wathen JK, Patel SR, et al. Randomized phase II study of gemcitabine
and docetaxel compared with gemcitabine alone in patients with metastatic
soft tissue sarcomas: results of sarcoma alliance for research through collabo-
ration study 002. J Clin Oncol 2007; 25:2755–2763.
144 Desar et al.

Renfro LA, Mallick H, An MW, et al. Clinical trial designs incorporating predic-
tive biomarkers. Cancer Treat Rev 2016; 43:74–82.
Rini BI, Escudier B, Tomczak P, et al. Comparative effectiveness of axitinib versus
sorafenib in advanced renal cell carcinoma (AXIS): a randomised phase 3 trial.
Lancet 2011; 378:1931–1939.
van der Graaf WT, Blay JY, Chawla SP, et al; EORTC Soft Tissue and Bone
Sarcoma Group; PALETTE study group. Pazopanib for metastatic soft-
tissue sarcoma (PALETTE): a randomised, double-blind, placebo-controlled
phase 3 trial. Lancet 2012; 379:1879–1886.

How to Become a Researcher
A.J. Templeton1
11
A. Ocana2
I.F. Tannock3
1
Department of Medical Oncology, St. Claraspital Basel and Faculty
of Medicine, University of Basel, Basel, Switzerland
2
Department of Medical Oncology and Translational Research Unit,
Albacete University Hospital, Albacete, Spain
3
Division of Medical Oncology & Hematology, Princess Margaret Cancer
Centre, Department of Medicine, University of Toronto, Toronto, Canada
Introduction
The ultimate goal of research in oncology is to contribute to improve-
ments in the quality and duration of life for cancer patients by generating
relevant scientific and clinical data. This can be achieved through basic,
translational and clinical research. It is essential that young oncologists
receive training in biomedical research if some of them are to go on to
produce high-quality research and if all of them are to make best use of
this research. During this training, they will experience emotions includ-
ing frustration and joy, but ultimately it can lead to personal satisfaction
by making real contributions to the management of patients, and under-
pin a successful career and international recognition. In this chapter, we
illustrate the journey to becoming a researcher by discussing the careers
of two fictional young oncologists, Marco and Florence.
Case Study 1: Marco

Marco is a 32-year-old man who will complete his oncology training at
a well-known Italian institution in around 15 months. He is enthusiastic
about his speciality training and works hard. With the help of his supervi-
sor, he has published a case-report of an unusual presentation of meta-
146
static breast cancer and a review of management of oesophageal cancer,
and he manages patients in industry-sponsored Phase II and Phase III
trials at his institution. His desire is to develop an academic career in
clinical research. The head of his department encourages him to con-
tact two or three academic institutions overseas to apply for a clinical
research fellowship and gives him the names of former residents who
spent time in North America and Australia.
Marco contacts two former residents and both describe their time abroad
as a positive experience due to the personal supervision received, the
research techniques learned and their ability to generate publications. He
is warned about the considerable paperwork needed to obtain a licence to
see patients. In comparing their experience with others, both emphasise
the importance of choosing a good mentor and ensuring that there is pro-
tected time for research. They advise Marco to plan at least a two-year
fellowship as research takes time and most papers will not be written
until the second year, or even later. Marco’s current mentors advise him
to select an area of research that interests him and to identify potential
supervisors who are publishing in these areas, rather than simply apply-
ing for a fellowship without stipulating his interests (Table 1).
Marco regularly skims through Annals of Oncology, Lancet Oncology
and the Journal of Clinical Oncology, reads articles that seem interest-
ing, and has decided that he would like to undertake research aimed at
improving the quality of clinical trials. He contacts two individuals who
have been corresponding authors of several articles dealing with bias in
the reporting of trials and critical evaluation of their endpoints. In his
email, he mentions his strong interest in their research and that he would
like to apply for a fellowship to work with them in this field. The clini-
cal speciality of his potential supervisors is not breast cancer – in which
Marco has a special interest – but he realises that learning new strategies
and techniques is more important than the site of the disease to which the
research is applied. His current mentor tells him: ‘If you know how to
treat breast cancer, you can learn quickly how to treat colorectal cancer,
but you can’t so quickly learn how to do good research’. He attaches
his curriculum vitae (CV) to the email and a letter outlining his career
goals and his intention to apply for a grant in Italy as well as submitting
How to Become a Researcher 147

funding applications to international organisations, such as the European
Society for Medical Oncology (ESMO) and the Union for International
Cancer Control (UICC), to support his fellowship.
Table 1 Do’s and Don’ts of Applying for Research Training
Do… Don’t…
• Define your own range of research interests • Apply for any research position that might be available
• Search PubMed and journals for investigators • Insist on studying a particular disease site
publishing in your preferred area of research
• Aim to learn skills in a specific area of research • Send a letter of interest to the department head or
education secretary
• Send a letter of interest to potential supervisors
outlining your career goals with your CV
• Contact former trainees of potential supervisors to ask
about their experiences
• In the interview, politely ask specifically about
supervisor’s time, research meetings and protected time
• Indicate willingness to apply for financial support
Both of the potential supervisors that Marco has contacted agree to meet
with him during an international conference. He prepares for the inter-
views and plans to ask politely about the availability of his supervisor and
the projects that have been done by their former mentees. The meetings
go very well, research that the former fellows have published is discussed
(examples are given in Table 2) and Marco is invited to visit the institution
of one of the potential supervisors. During this visit, several people from
the institution’s fellowship committee interview him and arrange for him
to speak with current fellows, who describe their projects and experience.
Eventually, Marco receives a letter offering him a clinical research fellow-
ship with instructions as to the administrative steps he needs to undertake.
Marco arrives at the host institution one year later. He sees patients with
gastrointestinal (GI) malignancies during 3 half-day clinics per week and
women with breast cancer in another clinic. The rest of his time is pro-
tected for his research. His mentor discusses with him ideas for various
projects related to clinical trial methodology, and together they decide
that he will investigate how surrogate and secondary endpoints can
lead to biased reporting of clinical trials. Marco learns how to undertake
a systematic review of the literature, applies this to his selected topic
148 Templeton et al.

and then discusses his findings with his mentor, who also recommends
noting ideas for potential further research questions in an ‘idea book’.
Table 2 Examples of Important Questions That can be Addressed by Clinical Research
Studies That are not Classical Phase I, II or III Trials (and Require Minimal Funding)
• Do patients receiving chemotherapy have cognitive decline?
• Is toxicity of new agents well established from pivotal randomised trials?
• Do patients in general practice have similar outcomes to those who received the same treatment in large
randomised trials?
• What is the probability that women receiving adjuvant hormonal therapy for breast cancer will have hair loss?
• Is a new treatment leading to a small gain in survival cost effective?
• What proportion of papers describing randomised trials of treatments for men with prostate cancer contain biases
that favour the new treatment?
• What is the agreement between physicians, caregivers and patients in evaluating cancer-related symptoms?
Marco is also given the opportunity to develop early clinical trials and,
based on preclinical work in his host institution, he works on a concept
for a clinical Phase II trial. His mentor proposes that he apply for the
ECCO-AACR-EORTC-ESMO Workshop on Methods in Clinical Cancer
Research (formerly known as the Flims workshop), and gives him details
of various other opportunities to learn about clinical research (see Table
3). His application is successful and during the workshop Marco writes a
protocol with the help of skilled facilitators.
The study is opened for accrual 6 months later, after receiving approval
from the ethics committee of his host institution and agreement of a phar-
maceutical company to supply an anticancer drug. Marco understands that
‘his’ trial may be completed only after the end of his fellowship. He attends
local trial meetings and enrols patients in other clinical trials during routine
clinics. In his protected time, he researches his major topic on reporting
bias, develops some related projects in discussion with his supervisor, helps
another fellow with a literature search for a systematic review and meta-
analysis (see Chapter 9) and drafts a grant application for a laboratory sub-
study of the clinical trial for which he wrote the protocol. He submits an
abstract describing his research to a scientific meeting, which is accepted
as a poster presentation. His mentor advises him on the content of the
abstract and emphasises the need to present a clear and easily-read mes-
sage in the poster (for suggestions, see Table 4) and the value of following

Table 3 Where Can Clinical Research be Learnt?
• Workshop on Methods in Clinical Cancer Research: This week-long workshop supported by several cancer
societies (ECCO, AACR, EORTC, ESMO) is held annually. Junior oncologists in any subspeciality learn
about the principles of good clinical trial design and lectures are designed to cover most aspects of clinical
research. Applicants arrive with a concept for a clinical research project and are guided by a faculty that
includes academic oncologists, scientists and statisticians (facilitators) to convert this (or sometimes a
different) concept into a full protocol. It is expected that the research will be undertaken and supported
at the host institution.
For more information: www.ecco-org.eu/Events/MCCR-Workshop and http://vailworkshop.org/Pages/
AboutWorkshop.aspx
• AACR workshops:
■Integrative Molecular Epidemiology Workshop
■Translational Cancer Research for Basic Scientists Workshop
■AACR-NCI-EORTC International Conference on Molecular Targets and Cancer Therapeutics
For more information: www.aacr.org
• International Clinical Trials Workshops: Two–three day workshops in collaboration with ASCO aiming to
further develop cancer research in economically emerging countries by teaching early career researchers
about trial design, practices in implementing clinical trials, patient accrual strategies, ethics and publishing
research findings.
For more information: https://www.asco.org/meetings/international-meetings-educational-opportunities
• Courses in clinical epidemiology and statistics at local universities
• Sessions during major conferences (ESMO, ECCO, ASCO, AACR etc.) directed towards young
oncologists addressing research methodology, grant writing, presentation techniques and other topics.
Abbreviations: AACR, American Association for Cancer Research; ASCO, American Society of Clinical Oncology; ECCO,
European CanCer Organisation; EORTC, European Organisation for Research and Treatment of Cancer; ESMO, European
Society for Medical Oncology; NCI, National Cancer institute.
reporting guidelines appropriate to his research design when writing the

paper (see www.equator-network.org). Marco also prepares a manuscript
describing his research, which is edited thoroughly by his mentor before
sending it to all co-authors. Thus, he learns the skills of presenting and writ-
ing research, as well as the principles of authorship. Marco is the first author
of the papers describing projects which he led and on which he did most of
the work, and is a co-author of papers where he contributed substantively
to the work of others.
Towards the end of his fellowship, Marco applies for a staff position
in the institution where he did his oncology speciality training, mak-
ing sure to negotiate protected time for research so that he can continue
to pursue this. He becomes one of the local leaders in GI oncology;
he participates actively in the GI group of the European Organisation

for Research and Treatment of Cancer (EORTC) and becomes the
local principal investigator of a collaborative EORTC group study.
He also takes the lead for a translational subproject of markers of host and
tumour-associated inflammation, after discussion with the study chair,
whom he had met during the ECCO-AACR-EORTC-ESMO Workshop.
He builds a relationship with pharmaceutical companies to obtain access
to new drugs for clinical research, and limits financial conflicts of inter-
est by having consultation fees paid to his institution, where he can use
them to support his research and that of the fellows that he now mentors.
Table 4 Recommendations for Presentation of Data from Research Studies
Abstract Poster Presentation Paper
(what to include)
Rationale for the study You must be able to Plan talk to match Follow published
Explicit definition of read every word from a audience guidelines (e.g. www.
primary endpoint and distance of 3 metres Tell a story – follow similar equator-network.org)
duration of follow-up Minimum font size is 24 order to a written paper Avoid:
Planned sample size or or 20 bold Minimum font size of 24 • Unnecessary length
power of study Use point-form and not (including figures) and (e.g. too general an
Sample size and brief long sentences minimum words on slides introduction)
description of participants Make no more than (e.g. rule of 7: a maximum • Conclusions not based
Magnitude of difference 3 main points of 7 lines per slide and on the primary endpoint
for primary endpoint with a maximum of 7 words • Multiple significance
95% confidence interval per line) tests and analyses of
(or 2-sided p-value) Talk slowly and allow one subgroups
Description of major data slide per minute • Criticism of others
toxicity Include the points
Conclusion (based on described for an abstract
primary endpoint) Avoid the same things as
Source of funding for written paper
Case Study 2: Florence

Florence is completing her training in medical oncology in a teaching hospital
in Belgium. She has long had an interest in basic science. She has attended
elective courses related to the basic science of oncology at her university and,
while at medical school, she worked for two summers in the laboratory of a sci-
entist studying the molecular genetics of cancer. Although the demands of her
residency training meant that she had limited exposure to laboratory research
during these years, she remains strongly motivated to become a physician-
scientist and to undertake laboratory-based translational research in oncology.

After seeking advice from the two clinician-scientists in the oncology
department of her hospital (Table 1), Florence realises that, in order to
be successful, she should undertake in-depth laboratory-based research
training leading to a PhD. She contemplates two options, either to apply
for a national or regional grant that would permit her to do a PhD in
her own country or to seek opportunities overseas. Both options have
attractions, but the main objective is to be admitted to a laboratory with
a strong scientific reputation. She would like to continue to do a small
amount of clinical work during the laboratory research training, but
recognises that this may be more difficult if she goes abroad. However,
she also realises that the common language of academic research is
English, and training in an English-speaking country would improve her
language skills.
Florence decides to obtain research training in cancer immunology,
which she recognises to be an expanding field with many opportunities.
She reviews the literature and contacts three potential supervisors work-
ing in this field in the United States, asking about the possibility of pur-
suing research leading to a PhD in their laboratories. A head of a research
group corresponds with her, reviews her CV and letters of support, and
they discuss her interests, motivation, mid-term and long-terms goals
by Skype. The potential supervisor also provides the contact details for
students and recent graduates from her laboratory so that Florence can
contact them. She does so and is pleased to hear about the good experi-
ences they had during their training. Both Florence and her supervisor
feel positive about her joining the laboratory. She is offered a provisional
position, because the laboratory needs her to obtain partial support to
fund her stipend. She is successful in obtaining a UICC fellowship and
receives the formal offer of a position.
After several months, Florence arrives in California to work in her
supervisor’s laboratory to obtain her PhD. Her research project seeks
to increase understanding of factors that modulate anti-tumour immune
responses. She accepts that she will not participate in patient care but
her supervisor encourages her to attend a half-day clinic per week as an
observer and to attend various clinical and research seminars. During
this time, she enjoys a multicultural and international environment with

students and post-doctoral fellows from several countries and she shares
both professional and social interactions with other team members. She
has her own primary project, but some components are shared with other
team members. She also receives some assistance from laboratory staff
as she learns a range of laboratory techniques. She reads extensively
about basic science, and contributes to a project where correlative stud-
ies are incorporated into a clinical trial evaluating immunotherapy for
bladder cancer.
Florence completes her PhD after spending 4 years in California. She
has published her studies in journals with a high impact factor, and has
become bilingual. Now, she wishes to return to Europe, and this turns
out to be the most difficult task. She needs to design her own project and
experiments, obtain competitive funding, and form a laboratory group
that will be competitive. It is easier to publish under the umbrella of
a large laboratory with many resources and Florence is encouraged to
work closely with a collaborative research group in France that shares
her interests. It is often difficult to find a position as a physician-scien-
tist in Europe that offers limited clinical practice with a large compo-
nent of protected time for research, while also facilitating interactions
with more basic scientists who can continue to provide mentorship and
collaboration.
Fortunately, Florence finds such a position at a large cancer centre in
Paris. Five years later she is leading a small laboratory group that is able
to do high-quality research, has attracted competitive funding and col-
laborates intensively with other groups in her country and overseas. She
holds a faculty position as a physician-scientist and is a talented mid-
career professional with a great future in cancer research.
Conclusions
These two fictional young oncologists illustrate important considera-
tions for those wishing to undertake in-depth research training in order to
develop an academic career that combines experience in clinical oncol-
ogy with cancer research. The journeys that they took and the guidance
given to them, as outlined in the tables, should help young oncologists

who are thinking of a future in research. If you decide to follow simi-
lar paths, we hope that you will find it as stimulating and rewarding as
Marco and Florence, and that you will provide those making decisions
about the care of people with cancer with high-quality evidence from
studies such as those described in other chapters in this book.
Dr Templeton has performed consultancy for Astellas, Janssen, Sanofi,
and Bristol-Myers Squibb (all without personal compensation).
Dr Ocana has received research funding from Entrechem.
Dr Tannock is Chair of IDMC committees for trials sponsored by
Janssen and Roche.
Further Reading
Fletcher RH, Fletcher SW, Fletcher GS. Clinical Epidemiology – The essentials,
5th edition. Alphen aan den Rijn: Wolters Kluwer, 2012.
Tannock IF, Hill RP, Bristow RG, Harrington L. The Basic Science of Oncology,
5th edition. New York: McGraw Hill Education, 2013.

Glossary
Glossary term Definition
3+3 models A design for Phase I trials that uses cohorts of three patients.
The first cohort is treated at a starting dose that is considered
to be safe based on animal studies. Subsequent cohorts of three
patients are treated at increasing dose levels.
Adaptive designs Designs for clinical trials that include modification to the trial
protocol in light of the data accumulating in the trial.
Adjusted ratio The effect calculated following adjustment of the data to take
account of variables or confounders that might have an impact
on the effect. See also: Unadjusted ratio.
Adverse events Any unfavourable and unintended sign (e.g. an abnormal
laboratory finding), symptom, or disease associated with the use
of an intervention (such as a drug), without any judgement about
causality or relationship to the drug.
Allocation concealment Process used to ensure that those involved in the decision that
a patient will join a trial do not know the allocated treatment
before the randomisation is done.
Allometric scaling Scaling of dose rates of drugs relative to the size or part of the
animal being treated.
α (alpha) value Is related to the level of statistical significance in an analysis. It is
typically set at 0.05 so that the results are judged to be statistically
significant when there is a 95% level of confidence that the result
(or something more extreme) would not have occurred by chance.
Alternative hypothesis The assumption that there is a difference in the effects of the
treatments being compared. See also: Null hypothesis.
Applicability (also The degree to which the results of a study are likely to hold true
generalisability) in other practice settings.
Area under the curve In drug development, the area under the plasma drug
(AUC) concentration versus time curve. A measure of drug exposure.
Attributable risk Proportion of the disease or outcomes in a group who have
been exposed to a particular factor, which can be attributed to
exposure to that factor.
Baseline variables Patient characteristics assessed at the beginning of the
observation or before the start of treatment in a study that
collects data at multiple time points.
Basket study A study that tests the effects of a treatment on a single mutation
in a variety of tumour types, at the same time.
155
Glossary

Bayesian algorithm A classification technique based on Bayes’ Theorem, which
assumes that the predictors are independent (i.e. the presence of
a particular feature in a class is unrelated to the presence of any
other feature).
Bayesian statistics A theory in the field of statistics in which the evidence gathered
in a study is expressed in terms of degrees of belief (Bayesian
probabilities).
Best overall response The best response recorded from the start of a treatment until the
(BOR) disease progression or recurrence.
Bias Distortion in the data that can lead to conclusions that are
systematically incorrect.
Biomarkers A measurable indicator of some biological state or condition.
Blinded / blinding (also Practice of keeping some or all of the people involved in a
masked / masking) study (e.g. patients, investigators and site staff, healthcare
practitioners, research team, sponsor, statistician, etc.) unaware
of which group of the trial each patient is assigned to.
Blocked randomisation Method used for stratified randomisation to ensure that the
allocation of patients to the different groups in the trial is
balanced for specific characteristics of the patients. Patients are
randomised within each block so that there will be the same
number of patients in each intervention group after all the
allocations in the block have been used.
Cancer prognosis Expected outcomes for the patients following a diagnosis
of cancer.
Case-control study Observational study in which the effect of an exposure is
measured by comparing the history of exposure between cases
(individuals who have, or die of, the disease) and controls
(individuals without, or who do not die of, the disease).
Cell-free clonotypic assay Adaptive Biotechnologies' clinical diagnostic assay
(clonoSEQ™) for measuring minimal residual disease in
lymphoid malignancies.
Censoring In a time-to-event analysis, (right-)censoring uses the observation
time at the most recent contact without the event (in patients
without observed events) as the duration of their exposure to the
risk of having the event. See also: Time-to-event analysis.
156
Glossary

Clinical benefit Improvement in at least one important symptom or element of the
quality of life of a patient that directly results from treatment.
Clinical trial Research studies designed to answer specific questions about
health-related interventions.
Clinical variables Patient characteristics observed by diagnostic procedures in the
context of medical diagnosis and treatment.
Cluster randomisation Randomisation by group rather than by individual, for example
by geographical area or hospital.
Cochrane Methodology Systematic review produced by the Cochrane Methodology
Review Review Group which examines the evidence on the methods
used for randomised trials, systematic reviews and other
evaluations of health and social care.
Cochrane Risk of Bias Framework developed by Cochrane (formerly, the Cochrane
Tool Collaboration) for a systematic assessment of risk of bias in a
study.
Cohort study Analytic study of a group (cohort) defined by exposure
characteristics or a process of recruitment. Outcomes are
ascertained and compared in all members of the cohort.
COMET Initiative Core Outcome Measures in Effectiveness Trials: international
initiative to facilitate the development and use of core outcome
sets in health and social care.
Complete response/ Disappearance of all signs of cancer in response to treatment.
remission (CR)
Composite outcome A combination of multiple endpoints or outcomes used to assess
the effects of an intervention or exposure taking account of all
the outcomes in a single analysis.
Conditional survival Probability of surviving given that a patient has already survived
a specific period of time after the diagnosis of a chronic disease.
Confidence interval A statistical measure of precision for an estimate of a population
parameter. Various levels of confidence in the point estimate can
be defined, but the 95% confidence interval is commonly used.
The interval shows the range of values in which the true value of
a parameter should occur 95 times out of 100 if the population
of interest is sampled repeatedly.
Confirmatory evaluation Statistical hypothesis test that is intended to confirm or reject a
scientific hypothesis with controlled error probabilities.
157
Glossary

Confounded / confounder A source of error in interpretation which occurs when the effect
/ confounding of an exposure on an outcome is affected by another exposure,
which is correlated with the first exposure.
CONSORT guidelines Consolidated Standards of Reporting Trials: guidelines on
reporting the results of a randomised trial.
Control group Group of patients who receive usual care, which acts as the
comparator for the group receiving the new intervention or the
exposure.
Conventional allometric Scaling of dose rates of drugs relative to the size or part of the
scaling animal being treated.
Core outcome set (COS) An agreed standardised set of outcomes that should be measured
and reported, as a minimum, in all clinical trials in a specific
area of health or social care.
Cox regression Multivariable statistical model to assess the association of one
or more variables with a time-to-event endpoint allowing for
independent censoring.
Crossover design Studies in which patients receive different interventions one
after the other, usually in a randomised order.
Crude probability Probability of dying of cancer in the presence of other causes
of death of death.
Data cut-off Date up to which the data of a follow-up study are collected for
a specific analysis or study report.
Demographic Characteristics used to describe human populations
characteristics (e.g. age, sex, race, geographical region, religion, etc.).
Differential bias Bias that arises in the reporting of factors (such as exposure to
risk factors for cancer) between people that have and have not
been diagnosed with the condition, because of their knowledge
of the diagnosis. See also: Non-differential bias.
Disease-free survival Length of time after treatment ends that the patient survives
(DFS) without any signs or symptoms of the disease, such as cancer.
Drop-out rate Proportion of patients (perhaps over a specific time period) who
leave the study without completing all the outcome measures.
Duration of response Length of time between the initial response to treatment and
(DoR) subsequent disease progression or relapse.
158
Glossary

Early stopping Closure of recruitment or stopping of further data collection in a
study before the planned end.
Ecological studies Studies of the effects of risk-modifying factors on outcomes based
on populations that are defined geographically or by time period.
Effectiveness trial (also A trial which seeks to test whether the treatments being compared
pragmatic trial) will have different effects when used in routine practice.
Efficacy / efficacy trial A trial which seeks to test whether, in ideal circumstances,
(also explanatory trial) a new intervention has different effects to existing interventions
or a placebo.
Empirical data Information observed or measured in a patient or a group of
patients in a study.
Endpoint A measurable variable used to assess an outcome of interest.
Error probability Probability of making an incorrect decision.
Event-free survival (EFS) Length of time after treatment ends that the patient survives
free of certain complications or events that the treatment was
intended to prevent or delay, including the return of the disease
or other certain symptoms.
Exclusion criteria Criteria that would exclude an otherwise eligible individual from
participating in a study. Reasons for considering exclusion include
safety issues, potential difficulties in the management of particular
participants or the need to control variables within the trial.
Explanatory trial (also A trial which seeks to test whether, in ideal circumstances,
efficacy trial) a new intervention has different effects to existing interventions
or a placebo.
Exploratory clinical Studies to identify candidate drug molecules in an early stage of
investigations development that show optimal or adequate pharmacodynamics
or pharmacokinetic characteristics in humans.
Exposure An element of behaviour or lifestyle, environmental exposure,
or genetic characteristic that is investigated as a modifier of an
outcome.
Exposure-response In drug development, studies that investigate the relationship
studies between the dose of the drug and the response to it.
Field trial Clinical trial carried out in a community setting.
Follow-up The monitoring of participants in a study for a period of time.
159
Glossary

Forest plot A graphical representation of the data from a meta-analysis,
showing a line of data for each included study and the overall
estimate from combining the results of the individual studies.
Funnel plot A graphical representation of effect estimates plotted against
a measure of size or precision for individual studies pertaining
to a research question. The resulting plot should have a
symmetrical, triangular distribution in the absence of biases
related to study size.
Futility threshold Threshold for the effect of an intervention below which it would
be concluded that the intervention is unlikely to produce a
clinically important benefit.
Generalisability (also The degree to which the results of a study are likely to hold true
applicability) in other practice settings.
Good Clinical Practice International quality standard for the design, conduct, recording
(GCP) and reporting of trials that involve the participation of humans.
Good Laboratory Quality guidelines and regulations for the organisation, process
Practice (GLP) and conditions under which laboratory studies are planned,
performed, monitored, recorded and reported.
Good Manufacturing Quality guidelines and regulations for the design, monitoring
Practice (GMP) and control of manufacturing processes and facilities.
GRADE Grading of Recommendations, Assessment, Development and
Evaluation: a systematic approach to making judgements about
the quality of evidence and strength of recommendations.
Hazard ratio A statistical measure of how many times more (or less) likely a
participant is to have an event at a particular point in time if they
receive the experimental rather than the control intervention.
Heterogeneity test Statistical test of the null hypothesis that the difference between
two groups being compared (e.g. treatment and control) is equal
across all patient or other subgroups.
High-resolution study Study that collects detailed data on each participant.
High throughput Method in drug development for testing a large number of
screening (HTS) compounds against optimised assays in a relatively short period
of time, usually through automated or robotic technologies.
Hit-to-lead process Process in the early stage of drug development where small
molecule hits from high throughput screening are evaluated to
identify promising lead compounds.
160
Glossary

Hypotheses Scientific statement that is postulated and can be investigated by
means of empirical data.
I2 index Quantitative measure of heterogeneity, which describes the
percentage of the variability of the effect estimate in a meta-
analysis that is due to heterogeneity rather than chance. I2=0%
means absence of statistical heterogeneity and, as statistical
heterogeneity increases, I2 increases to 100%.
Immortal time bias Period of time that participants in a cohort are followed up
during which death (or an outcome that determines end of
follow-up) cannot occur.
Imprecision One of the GRADE criteria for assessing the quality of the body
of evidence, which refers to how precise the effect estimate
will be based on the width of the confidence interval, and if our
judgement on the effect changes when we consider the opposite
sides of the confidence interval.
In silico Research conducted using computer modelling or computer
simulation.
Incidence Number of new cases of a particular disease diagnosed during a
certain period of time.
Incidence-based cohort Study that measures the effect of an intervention by comparing
mortality study incidence-based mortality in a cohort subsequent to the
introduction of the intervention to the expected incidence-
based mortality in the absence of the intervention (e.g. by
extrapolating historical patterns of incidence-based mortality).
Inclusion criteria Criteria that would make an individual eligible to participate in
a study.
Inconsistency One of the GRADE criteria for assessing the quality of the body
of evidence, which assesses how similar and consistent were
the results of the individual studies included. It is based on the
degree of statistical heterogeneity.
Independent censoring Censoring of time-to-event variables for reasons that are not
related to the actual hazard for the event.
Indirectness One of the GRADE criteria for assessing the quality of the
body of evidence, which refers to how direct the evidence in
the included studies is in relation to the evidence that was
being sought.
161
Glossary

Intention to treat An analysis in which participants are analysed in the group to
which they were originally recruited, regardless of whether they
adhered to this.
Interim analysis Evaluation of the data from a study before the planned final
analysis.
Kaplan-Meier estimates Method for statistical estimation of event-free probabilities
(e.g. survival probabilities) allowing for independent censoring.
Lead optimisation Process in drug development by which a candidate drug is
designed after an initial lead compound is identified.
Lead-time bias Interval between the screen-detection of a preclinical detectable
disease and the time at which the disease would have been
detected clinically. This lead-time means that the time between
diagnosis and death is longer as a result of screen-detection
even if the actual date of death is not delayed, and leads to bias
when comparing, for example, cancer survival rates between
intervention and control groups.
Length bias Bias caused by the length of the preclinical detectable phase
of tumours: slow-growing tumours, which have a rather good
prognosis and thus rather long survival, are more likely to be
detected at screening than fast-growing tumours, because of
their longer preclinical detectable phase.
Logrank test Statistical test to compare time-to-event endpoints between
groups with the null hypothesis of similar event-free
probabilities (e.g. survival probabilities).
Masking / masked (also Practice of keeping some or all of the people involved in a study
blinding / blinded) (e.g. patient, investigators and site staff, healthcare practitioners,
research team, sponsor, statistician, etc.) unaware of which
group of the trial each patient is assigned to.
Maximum tolerated dose Drug dose established from the occurrence of limiting toxicities
(MTD) in a given proportion of people given the drug.
Median The middle value in a distribution of values.
Median survival Length of time by which 50% of patients in the trial have died.
Meta-analysis Statistical combination of data from a series of studies (usually
in a systematic review) to obtain one summary effect estimate.
Results are often displayed in a forest plot.
162
Glossary

Me-too drug Drug with the same formulation and stated indications as a
previously approved drug.
Minimal residual disease Presence of residual cancer cells, even when so few are present
(MRD) that they cannot be found by routine means.
Minimisation A computer-based technique to ensure that the allocation of
patients to the different groups in a trial is balanced across a
range of specific characteristics of the patients.
Mortality Deaths of participants in the trial, usually within a specified
time period.
Multi-arm, multistage A form of adaptive design for a clinical trial in which
(MAMS) design comparisons may be between various interventions and a control
treatment, and where individual intervention groups
may be stopped earlier than others.
Nested case-control study Case-control study in which both cases and controls are drawn
from a pre-existing cohort study.
Net survival Theoretical probability of surviving cancer in the absence of
other causes of death.
Newcastle-Ottawa scale A framework for assessing the quality of non-randomised
studies (case-control or cohort studies), using three domains:
selection of study groups, comparability of groups and
ascertainment of exposure or outcome of interest.
Non-differential bias Bias that causes misclassification of exposure but which is not
related to knowledge of the diagnosis of a condition. See also:
Differential bias.
Null hypothesis The assumption that there is no true difference in the effects of
the treatments being compared. See also: Alternative hypothesis.
O’Brien-Fleming Method for adjusting the significance level for interim analyses
boundary to maintain the global statistical significance level.
Objective/overall A measurable response.
response
Objectives The questions a study is intended to answer.
Observational study A research study to measure the effect of an exposure/
intervention by observing the participants in their natural setting.
163
Glossary

Odds ratio The ratio of the odds that an event occurred in one group
(usually the intervention or exposure group) to the odds of the
event in a second group (usually the control group).
Optimum biological dose Dose demonstrated to be effective on a specific biomarker.
(OBD)
Outcome Any health-related event, the causation of which is being
studied.
Overall survival (OS) Length of time from either the date of diagnosis or the start of
treatment for a disease, such as cancer, that patients diagnosed
with the disease remain alive.
Partial response/ Decrease in the size of a tumour or in the extent of cancer in a
remission patient.
Patient-centred endpoints Variables that reflect a patient’s feeling of well-being or her
survival.
Patient-reported outcome Outcomes that come directly from patients, such as
measures (PROM) measurements of quality of life and the economic impact of the
treatment on the patient.
Phase I The first step in testing a new treatment in humans, to assess
safety and side effects, and the best way to administer the new
treatment.
Phase II Study testing whether a new treatment works for a specific
condition, such as cancer.
Phase III Study comparing the effects of interventions in a group of
patients with a condition, such as cancer.
Phase IV Study of the side effects caused over time by a new treatment
after it has been approved and is on the market.
Placebo Dummy treatment that might be given to participants in the
control group of a trial to ensure that they (and others) do not
know whether they are receiving the new treatment.
Point estimate Value derived from a patient sample that might be generalised to
a population similar to the study population.
Power Probability of avoiding a Type-II error of concluding that there
is no difference between the outcomes in the treatment groups
when there is a difference in the effects of the treatments.
164
Glossary

Pragmatic trial (also A trial which seeks to test whether the treatments being compared
effectiveness trial) will have different effects when used in routine practice.
Prevalence Number of cases with a particular disease who are alive on a
certain date.
Primary endpoint / Typically the outcome around which a study is designed,
outcome and which would be the most appropriate to answer the main
research question.
Progression-free survival Length of time during and after the treatment of a disease,
(PFS) such as cancer, that a patient lives with the disease without it
getting worse.
Propensity score A statistical matching technique that estimates the effect of
matching an intervention by accounting for covariates that predict that
someone will receive the intervention.
Publication bias Selective reporting of a research study based on its findings.
For clinical trials, this usually means that trials with positive
results are more likely to be published than those with null or
negative results.
p-value Probability of getting the observed data or data deviating even
more from the expected values if the null hypothesis is true.
Q test Statistical test (a chi-squared test) to assess the presence of
statistical heterogeneity among the results of the individual
studies.
Qualitative synthesis Narrative summary of the findings of the studies included in a
systematic review.
Quantitative synthesis Statistical summary of the results of the studies in a systematic
review. A meta-analysis is a common form of quantitative
synthesis.
Random error Error due to the inherent unpredictability of events or that are
inherent in the difference between a sample and the whole
population.
Random variation Variability of observations that arise by chance.
Randomisation Process by which chance is used to determine the group to
which a research participant is allocated.
Randomised (controlled) Study in which patients are allocated randomly to one of the
trial (RCT) groups being compared.
165
Glossary

Rare cancers Cancers with an incidence of less than 6 per 100 000 persons
per year.
Recall bias Bias due to differences in the accuracy or completeness of the
information from the past that is provided by study participants,
which might arise from, for example, the fact that they have
been diagnosed with a cancer.
Recommended dose for Dose that should be used in a Phase II study.
testing in Phase II (RP2D)
Recurrence-free survival Length of time that the treated patient survives without any
recurrence of their cancer.
Registration of trials Clinical trials should be registered in a publicly accessible
register of research before the first patient is recruited or
randomised. This increases transparency in what research is
being done and can help to minimise the impact of publication
and selective reporting bias because information would be
available on all trials, regardless of their results.
Relative risk / risk ratio The ratio of the risk that an event occurred in one group (usually
(RR) the intervention or exposure group) to the risk of the event in a
second group (usually the control group).
Relative survival Relative survival of a disease, such as cancer, is calculated by
dividing the overall survival after diagnosis by the survival as
observed in a similar population who have not been diagnosed
with the disease.
Reporting guidelines Guidance intended to improve the reliability and value of reports
of research studies.
Response rate The proportion of patients whose cancer shrinks or disappears
after treatment.
Right-censored survival Observation time until the most recent contact in a patient who
time has not had the event of interest, used as lower bound of the
unobserved time to event in estimations of time-to-event variables.
Risk The proportion of patients in a group who have the event of
interest.
Risk difference The difference in the risk that an event occurred in one group
(usually the intervention or exposure group) and the risk of the
event in a second group (usually the control group).
166
Glossary

Risk factor An element of behaviour or lifestyle, environmental exposure,
or genetic characteristic that is associated with the occurrence of
disease or other condition.
Risk of bias An assessment of the possibility that specific features of a study
will lead to bias in its findings.
Risk ratio / relative risk The ratio of the risk that an event occurred in one group (usually
(RR) the intervention or exposure group) to the risk of the event in a
second group (usually the control group).
Safety In healthcare, safety covers the reporting, analysis and
prevention of adverse events arising from use of an intervention.
Salvage treatment Treatment given after a disease such as cancer has failed to
respond to other treatments, or to patients who are unable to
tolerate other available therapies.
Sample size The number of participants in the trial. The intended sample size
is the number of participants planned to be included in the trial,
which is usually determined through a statistical calculation
before the trial begins.
Secondary endpoints / Outcomes that will be measured in a study, in addition to the
outcomes primary outcomes.
Selection bias Bias in choosing the individuals or groups to take part in a study,
which might make them systematically different from those who
do not take part.
Selective reporting bias The selective reporting of the specific findings of a study
because of those findings. For clinical trials, this usually means
that findings of subgroup analysis, or particular outcome
measures with positive results, are more likely to be published
than those with null or negative results.
Significance level Accepted probability for the Type-I error, in which the null
hypothesis would be rejected when it is true.
Small molecule Organic compounds with a low molecular mass (the usual upper
limit is 500–1000 Da) that bind to specific targets, thus altering
their function.
Sociodemographic factors Factors used to describe human populations that relate to
both personal characteristics (e.g. age, sex, race) and social
characteristics (e.g. income).
167
Glossary

Source population A hypothetical population from which the cases and controls in
a case-control study are drawn.
SPIRIT guidelines Standard Protocol Items: Recommendations for Interventional
Trials: guidelines on reporting the protocol of a randomised
trial.
Stage migration Apparent improvements in the survival of patients with
diseases, such as cancer, that arise by reclassifying them into
different prognostic groups, recognising more subtle disease
manifestations, or using diagnostic modalities that allow the
disease to be diagnosed at an earlier stage.
Statistical estimation Generalisation of values calculated from the sample of patients
in a study to a population similar to that population.
Statistical power Probability of correctly rejecting the null hypothesis.
Statistical (hypothesis) Statistical process to determine whether to reject or accept the
test null hypothesis based on the degree of the deviation of observed
data from those expected if the null hypothesis is true.
Statistically significant Result of a statistical test that rejects the null hypothesis based
on the deviation of observed data from those expected if the null
hypothesis is true.
Stratification Technique to ensure that patients in groups that are being
compared are balanced for specific characteristics of the
patients.
Stratified randomisation Method of randomisation that uses stratification to minimise
differences between the intervention groups being compared in
a trial.
STROBE guidelines Strengthen the Reporting of Observational studies in
Epidemiology: guidelines on reporting the results of
observational studies.
Subgroup analyses Restriction of analyses of a specific hypothesis to a subgroup of
the participants in a study.
Surrogate outcomes A biomarker or a physical sign that is intended to be used as a
substitute for a clinically meaningful endpoint.
Survival rate Proportion of patients diagnosed with a condition, such as
cancer, who are still alive after a certain period of time following
their diagnosis.
168
Glossary

Syngeneic models Animal models that are genetically identical to the tumours that
they are to be transplanted with.
Synthetic lethality Cellular condition in which two (or more) non-essential
mutations, which are not lethal on their own, become lethal
when present simultaneously within the same cell.
Systematic error Consistent error in either the study population or the information
gathered, leading to a measured value which deviates from the
true value.
Systematic review A rigorous method for knowledge synthesis that collects and
critically analyses multiple research studies on a specific
topic. Defining features of a good quality systematic review
are: a clear researchable question, explicit eligibility criteria,
comprehensive search for, and selection of, studies, critical
appraisal and synthesis of findings.
Targeted therapy/agent Treatment designed to inhibit a biological target.
Time-to-event analysis Statistical analysis to investigate the time elapsing before an
event is experienced, rather than simply whether or not an event
is experienced.
Time to progression Length of time between the initial response to treatment and
(TTP) subsequent disease progression.
Toxicity Extent to which something, such as a drug, is poisonous or
harmful.
Trend studies Analysis of the trends in age-standardised population disease-
specific mortality rates over time.
Tumour cell lines Cancer cells that keep dividing and growing over time in a
laboratory. Tumour cell lines are used in research to study the
biology of cancer and to test treatments.
Tumour-centred Include biological markers, mainly laboratory or histological,
endpoints used to define response to an intervention, and by time-to-event
endpoints.
Tumour Normal cells and molecules that surround and feed tumour cells.
microenvironment
Type-I error A conclusion that there is a difference between the outcomes in
the intervention groups when there is no difference in the effects
of the interventions (i.e. a false positive).
169
Glossary

Type-II error A conclusion that that there is no difference between the
outcomes in the intervention groups when there is a difference
in the effects of the interventions (i.e. a false negative).
Unadjusted ratio The effect calculated simply from the data with no adjustment for
any variables or confounders that might have an impact on the
effect. See also: Adjusted ratio.
Uncertainty principle An approach to setting the inclusion and exclusion criteria for a
trial such that patients are eligible if the people deciding about
their treatment (including the patient) are substantially uncertain
about which of the treatments in the trial would be better for them.
Validity Degree to which the results of a study correctly reflect the data
and adequately answer the underlying scientific questions.
Virtual screening Computational approach used to identify chemical structures
that are predicted to have particular properties.
Volume-outcome research Studies to investigate the relationship between the number of
patients treated in a hospital or by a surgeon and the outcomes
of those patients.
Xenograft models Animal models created by injecting human tumour cells into
immunocompromised mice.
170
Index
Abbreviations used in the index are listed on pages xiv-xvi
References to figures are indicated by ‘f’.
References to tables are indicated by ‘t’.
A Anti-EGFR agents, 31
Antiangiogenic therapy, 57, 68
Abstract, recommendations, 151t Antibodies
ACCENT group, 98 anticancer biologics and, 64
Adaptive designs, trials, 139–140, 155 libraries, 64
Adjusted ratio, 7, 8t, 155 Anticancer biologics, 64
Absorption, distribution, metabolism development, 64–65
and excretion (ADME) profiles, 65 Phase I studies, 66–67
Adverse events, 90, 155 screening, 64
Aetiology, cancer see Causation of Anticancer drugs see Drug(s)
cancer Antitumour antibodies, 64–65
Age Antitumour efficacy of drugs
breast cancer screening, 14, 23 biomarkers, Phase I trials, 68
colon cancer survival, 31f preclinical studies, 60–62
absolute vs relative survival, 34, Applicability of trial results, 73, 83,
34f 105, 109
Alcohol, high consumption, definitions, 155, 160
confounders and, 6–7 Area under the curve (AUC), 64, 67,
Allocation concealment, 78, 81, 155 155
Allometric scaling, 63, 155, 158 Aromatase inhibitors, 123
Alpha (α) value, 111, 155 ATLAS trial, 75
Alternative hypothesis, 110, 155 Attributable risk, 2, 155
American Association for Cancer Attrition bias, 128
Research (AACR) workshops, 150t Axitinib, 142
Anaplastic lymphoma kinase (ALK),
138
Animal models, 61–62 B
antitumour efficacy of antibodies, Baseline variables, 105, 155
64–65 Basket study, 138, 155
extrapolation to human tumours, Bayesian adaptive randomisation,
62–63 141–142
toxicology studies, 63–64 Bayesian algorithms, 66, 156
see also Murine models Bayesian statistics, 140–141, 156
171
‘Bedside to bench’ philosophy, 66 misclassification, 37–39
‘Bench to bedside’ philosophy, 66 non-differential, 6, 38–39, 163
Best overall response (BOR), 156 non-random allocation in trials, 81
RESONATE trial, 105, 109–110 observational studies see
BeTa randomised trial, 80 Observational studies
Bevacizumab, 77, 80 observer/interviewer, 39
Bias, 5–6 in outcome measurement, 6, 89
attrition, 128 overall survival (OS) preventing, 89
in cancer prognosis, 29, 37–39 performance, 128
confounding, 38 publication see Publication bias
information (measurement), randomisation for trials, 77, 81, 128
38–39 randomised trials, 4t, 37–38, 71
selection bias, 37–38 risk in, 167
trials vs observational studies, screening studies/trials, 16–17
36–39 see also Publication bias
in cancer screening studies/trials, 15 recall, 4t, 39, 166
due to cluster randomisation, reporting, 82–83, 128
16–17 risk, assessment, 17, 83, 167
observational studies, 17–19 Risk of Bias (RoB) criteria,
randomised trials, 16–17 127–128, 157
cancer stage migration and, 39 selection see Selection bias
in case-control studies, 4t, 6, 18 selective reporting, 82–83, 128,
censoring and, 105 165, 167
classification (information/ in self-reported exposures, 6
measurement bias), 5–6, 37–39, 128 self-selection, 18
Cochrane Risk of Bias (RoB) tool, systematic reviews and, 36, 72, 125,
127–128, 130, 157 127–128, 130
in cohort studies, 4t, 6 see also Publication bias
definition, 5, 156 time-related, 39
differential, 6, 39, 158 time-to-event data and, 105, 107
exposure measurement, 6 see also Confounding
immortal time, 39, 161 Bioengineering, 64
information (classification/ Biologics, anticancer see Anticancer
measurement bias), 5–6, 37–39, 128 biologics
investigator, 89 Biomarkers, 61, 67, 156
lead-time see Lead-time bias antitumour efficacy, Phase I study,
length, 15–16 68
measurement, 5–6, 37–39, 128 rare cancer trials and optimal dose,
minimisation, 7, 71,128 142–143
systematic reviews, 36 for rare cancers, 139
172 Index
as surrogate endpoint, 91 benefit, mortality reduction, 14
as tumour-centred clinical endpoint, biases in observational studies,
90 17–19
Bisphosphonates, 121 biases in randomised trials,
Blinding, in randomised trials, 81–82, 16–17
128, 156, 162 cost-effectiveness of
absent in sarcoma trial, 137 programmes, 22–24
Body of evidence, quality, for overdiagnosis, 19–22
systematic reviews, 127–128, 130 participation rates, 23
Bradford Hill’s Criteria for causation, randomised trials, 14–15, 17,
3, 3t 20–21
Brain cancer, mobile phones and, 3 see also Mammography
Breast cancer screening; Screening (cancer)
ductal carcinoma in situ (DCIS), 19 survival improvement over time, 32,
early-stage, bisphosphonates, 121 32f
incidence by age, 15
indolent invasive, 19
mortality reduction, 15
C
incidence-based cohort mortality Canadian National Breast Screening
studies, 17–18 Study-1/-2, 16, 20
observational studies, 14–15, Cancer cell lines, 60–61, 169
17–18 Cancer Genome Project, 56
randomised screening trials, Cancer prevention, 10
14–15 Cancer prognosis see Prognosis
trend studies, 18–19 (cancer)
net survival estimates, 35 Cancer registries, 43–54
overtreatment, 19 coding rules, 47–49, 48t
population-based screening, 14 incidence date, 48, 49t
randomised trials multiple tumours, 48, 48t
ATLAS, 75 completeness, 43–45
early chemotherapy trials, 73 5% acceptance, 45
screening for cancer, 14–15, 17, data in volume-outcome research, 52
20–21 data quality variations, 50
uncertainty principle, 74–75 ‘death certificate only’ (DCO)
risk, observational studies and bias cases, 44–45, 50
in, 21 epidemiological studies with data
risk/risk ratio, 2, 2t from, 50–51
screening, 14–15 in Europe (national), 43
age range and intervals, 14, 23 follow-up, 49
annual vs biennial, 23 gender and, 47–48
Index 173
hospital-based, 43 right-censored survival time, 107, 109
incidence and survival comparisons, survival time definition, 32–33, 107
50–51 Cervical cancer, population-based
incompleteness, 44, 50 screening, 14, 51
calculation, 44–45 Chemical libraries, 58–59
minimal data set, 45–46, 46t Chemo-holiday, 124
cancer classification (ICD-O), 46 Chemotherapy, 57
classification changes, 46 antibodies conjugated to drugs, 64
time-trend analysis, 46 breast cancer, 73
notification, 43–45 colon cancer see Colon cancer
compulsory, 44 combination, early randomised
sources, 44, 44t trials, 73
number in Europe, 43 Phase I studies, 66
population-based, 43 see also Phase I trials
completeness, 43–45 PICO application for systematic
prognosis comparisons, 37–38 review, 124
quality of care studies with data, sarcomas, 136
51–52 single agent, early randomised
screening efficacy evaluation, 51 trials, 73
supplementary items, 47, 47t targeted therapy vs, 57
cancer stage data, 47 Chronic conditions, crossover trial
treatment data, 47 design, 139
uses of data, 51–52 Chronic lymphoid leukaemia (CLL),
Cancer-specific drug targets, 57–58 105, 106t, 108t, 110
Cancer-specific survival, 33–35 see also RESONATE trial
Case-control studies, 3, 4t, 5 Classical statistics, 140
advantages/disadvantages, 4t Classification bias, 37–39
bias in, 4t, 6, 18 Clinical benefit, 89, 91, 94t, 135, 157
cancer screening, 14, 18 molecular targeted anticancer drugs,
definition, 5, 156 66
nested, 5 Clinical investigations, 88
Causation of cancer, 2–10 Clinical trials, 157
Bradford Hill’s Criteria, 3, 3t international workshops, 150t
establishing, 3 rare cancers, 137–143
see also Risk factors smart drug development and, 60
Celecoxib, 80 see also Randomised trials
Cell culture, 60–61 Clinical variables, 104, 157
Cell-free clonotypic assay, 91, 156 Cluster randomisation, 16–17, 157
Censoring, 156 CMF (cyclophosphamide, methotrexate,
independent, 107 5-fluorouracil) regimen, 73
174 Index
Cobalt-60, 121 survival improvement over time, 32,
Cochrane Central Register of 32f
Controlled Trials (CENTRAL), 125 TNM stages, survival rates, 30, 30f
Cochrane Methodology Review, 83, 157 understaging, 30
Cochrane Risk of Bias (RoB) tool, Combinatorial chemical synthesis, 58
127–128, 130, 157 COMET Initiative, 75, 89, 123, 157
Coding rules, for registries, 47–49, 48t Comorbid conditions, colon cancer
Cohort, definition, 4 survival, 31, 31f
Cohort studies, 3–4, 4t, 157 Comparators, in PICO criteria,
advantages/disadvantages, 4t 122–123
cancer prognosis assessment, 36–37 COMPARE algorithm, 60
definition, 4, 157 Compassionate use, 143
expansion, 66 Complete remission/response (CR)
lead-time bias, 39 rate, 91, 95
Colon cancer/colorectal cancer advantages/limitations, 94t
absolute vs relative survival, 34, 34f definition, 94t, 157
adjuvant chemotherapy, 31 FDA approval of cancer drugs, 92
observational study, 36–37 lymphoma trial, 96
overall survival (OS), 36–37 rare cancer (sarcoma) trial, 141–142
randomised trials, 36, 73 for systematic review question, 123
advanced, systematic review and Composite outcome, 88, 97, 157
PICO, 124 Compound libraries, 58–59
chemo-holiday, systematic review, sublibraries, 59
124 Conditional survival, 35–36, 157
disease-free/progression-free Confidence interval (CI), 129, 157
survival, 35 95%, 7, 8t, 125, 131
insulin exposure, risk, 8t mammography screening trials, 14
liver metastases resection, DFS, variability of point estimate, 109–110
126, 126f width, imprecision and, 130
population-based screening, 14 Confirmatory evaluation, 111–112, 157
primary endpoint of trials, 98 Confounding, 5–7
prognosis, trials vs observational avoiding in randomised screening
studies, 36–37 trials, 13–14
prognostic factors definition, 5–6, 38, 158
comorbid conditions and, 31, 31f minimisation, 7
sociodemographic, 31, 31f observational studies, on prognosis,
treatment-related, 31 37–38
tumour-related, 30, 30f sarcoma (rare cancer) trial, 136
RAS mutant, randomised trial, 77 Confounding factors, 6–7, 38
stage migration, 30, 39 CONSORT guidelines, 83, 158
Index 175
Control group, 5, 158 Data extraction forms, 128–129
case-control studies, 5, 18 Databases, small molecules, 59
contamination with screening, 1 Death
multi-arm, multistage (MAMS) causes
design, 140 misclassification, 34–35
randomised trials, 36, 76 unreliable in death certificates, 34
bias, overdiagnosis, 20, crude probability of, 33–34, 158
contamination, underestimation of cumulative probability, from cancer,
screening effect, 16 33
screening trials, 13–14, 16, 20 see also Mortality
sample size calculation and, 76–77 Death certificate, 34, 44
selection bias, 6, 128 as notification source for registries, 44
Conventional allometric scaling, 63, ‘Death certificate only’ (DCO) cases,
155, 158 44–45, 50
Core Outcome Measures in Demographic characteristics, 115, 158
Effectiveness Trials (COMET), 75, Denmark
89, 123, 157 cancer registry, 43
Core outcome set (COS), 75, 88–89, survival rates by per capita income,
99, 123 50
breast cancer screening, 22–24 Developmental Therapeutic Program, 60
definition, 89, 158 DFS see Disease-free survival (DFS)
lymphoma trials, 95, 95t Differential bias, 6, 39, 158
Cox regression, 107, 158 Differential misclassification, 39
CREATE study, 138 Disease-free survival (DFS), 35, 91, 158
Crizotinib, 138 advantages/limitations, 93t
Crossover design, trials, 139, 158 definition, 93t, 97, 158
Crude probability of death, 33–34, 158 liver metastasis resection, colorectal
cause-specific, 34–35 cancer, 126, 126f
Cumulative incidence of progression, lymphoma trials, 95t
93t uses, 97–98
Cumulative probability of death from Disease-specific survival (DSS), 98
cancer, 33 DNA repair defects, 57
Curable tumours, endpoints for trials, Docetaxel, 141–142
96 Documentation delays, censoring due
to, 107
Dose escalation, 67
D Doxorubicin, 136–137, 143
Data, extraction, for systematic Drop-out
reviews, 128–129 censoring due to, bias, 107
Data cut-off, 107, 109, 158 see also Follow-up, loss to
176 Index
Drop-out rate, 90, 158 phase/studies
Drug(s) for rare cancers, 135
analogues, 56 recent advances, 56
candidate, 65 smart, 60
concentration strategies, 56–65
in plasma, 67 target discovery preceding, 57–58
preclinical pharmacokinetic Drug discovery, 55
studies, 62 drug selection, 60
in tumour, 67–68 see also Preclinical assays
design, 59 small molecule, 58–59
FDA approval of indications, 91–92 analysis of candidate hits, 59
me-too, 65, 163 optimisation, 55, 59
mechanisms of action, 57–58 preclinical assays see Preclinical
multi-targeted, 58 assays
new, development see Drug synthesis, and preliminary
development screening, 58–59
novel agents, 56, 65, 135 target discovery preceding, 57–58
off-label use, 143 validation, 57–58
plasma-protein binding, 68 Drug–target interactions, 58–59
protein binding, 68 Duration of response (DoR), 91, 96, 158
repositioning, 56 advantages/limitations, 94t, 96
safety, 56 definition, 94t, 158
preclinical toxicology studies, lymphoma trials, 95t
63–64
screening, 60
virtual, small molecules, 59
E
see also Preclinical assays Early detection of cancer, 13
selection, 60 lead-time bias in survival, 39
target discovery, 57–58 Early stopping of trials, 111–112, 159
Drug development, 55–70 rare cancer trials, 143
anticancer biologics see Anticancer ECCO-AACR-EORTC-ESMO
biologics Workshop, 149, 150t, 151
clinical phase(s), 55–56 Ecological studies, 14, 159
see also Entries beginning Phase Effect
(e.g. Phase I trials) importance of, factors determining, 8
practice guidelines, 55 size of, 8
preclinical assays see Preclinical Effectiveness (of treatment),
assays definition, 89
preclinical phase see Preclinical Effectiveness trials, 74, 159, 165
Index 177
core outcome measures (COMET), rare cancer trials, 136, 138–139
75, 89, 123, 157 secondary, 88, 105, 167
see also Phase III trials lymphoma trials, 95, 95t
Efficacy (of treatment), 36, 56, 159 rare cancer trials, 138–139
febrile neutropaenia, forest plot, 131 researcher case study, 148–149
of new agents for Phase I trials, 65 see also Outcome(s), secondary
RESONATE trial, 105 surrogate see Outcome(s), surrogate
sarcoma (rare cancer) trial, 136 tumour-centred clinical, 90–91, 169
Efficacy trials, 74, 159 see also Outcome(s)
EFS see Event-free survival (EFS) Enrolment, design for rare cancer
Elderly trials, 140–142
cancer registry data, lung cancer, 51 EORTC Soft Tissue Bone Sarcoma
relative vs absolute survival, 34, 34f Group, 136–137
Electrophysiological study, 63 EORTC TRUSTS study, 143
Eligibility criteria, 36, 73–75 Epidemiological studies
Embase, 125 with cancer registry data, 50–51
Emigration, 49 design/types, 3–5, 4t, 36
Empirical data, 104, 159 see also Case-control studies;
Endpoint(s), 15, 87–88 Cohort studies; Randomised trials
clinical, 88 Epidemiology, 1
definition, 92, 93t–94t, 159 evidence of causation and, 3
efficacy, for clinical trials, 92, Erlotinib, 80
93t–94t Error(s)
health-related quality of life in interpretation, confounding and,
(HR-QoL), 90–91 6–7
multiple, 111 random, 5, 7, 165
novel, for rare cancer trials, source in risk factor studies, 5–7
138–139 systematic, 5–6, 169
patient-centred, 164 Type-I see Type-I error
primary, 82, 88–89, 92, 165 Type-II see Type-II error
core outcome sets and, 89 Error probability, 110–111, 159
lymphoma trials, 95, 95t, 96 Ethical issues, 37
multiple, 88 Ethnicity, in minimal data set for
overall survival, as ‘gold registry, 45
standard’, 89–90 EUROCARE-5 study, 30, 32, 32f
RESONATE trial, 105 EUROCARE studies, 50
sarcoma (rare cancer) trial, 136 European Code Against Cancer, 10
see also Outcome(s), primary; European Medicines Agency (EMA),
Overall survival (OS); 92, 93f, 94f
Progression-free survival (PFS) European Network of Cancer
178 Index
Registries (ENCR), 43 Follicular Lymphoma Analysis of
European Union (EU) Surrogacy Hypothesis (FLASH), 96
breast cancer screening, 22–23 Follow-up, 49, 159
cancer survival rates, 32, 32f ‘active’, in cancer registries, 49
Event-free survival (EFS), 88, 91, 97 affecting duration of response
at 24 months (EFS24), 97 (DoR), 96
advantages/limitations, 94t cancer registry data, 49
definition, 94t, 159 data, description in trials, 105
FDA approval of cancer drugs, 92 death after, immortal time bias, 39
lymphoma trials, 95t emigration, 49
time to treatment failure and, 97 inadequate, biased estimate of
Exclusion criteria, 35, 73, 159 overdiagnosis, 20
rare cancer trials, 138 incompleteness, outcome and, 50
Explanatory trials, 74, 159 length, trend studies and bias
Exploratory clinical investigations, 88, reduction, 18
113, 159 life-long
Exposure(s), 1, 159 bias reduction in observational
drug, immortal time bias, 39 studies, 21–22
effect of, 1 simulation models, 21–22
measurement, bias, 6 link to population registers, 49
outcome relationship, 7 loss to, 16
in PICO criteria, 122–123 attrition bias, 128
rare, cohort studies, 4 censoring due to, bias, 107
selection bias, 37 selection bias, 38
Exposure-response studies, 67, 159 outcome measurement and, 6
survival time definition, 32
Food and Drug Administration (FDA),
F 91–92, 93t–94t, 97
Familial cancers, 1 Forest plot, 129, 131, 131f, 160
FDG-PET, 92 prostate cancer risk in first-degree
Febrile neutropaenia, 131, 131f relatives, 9, 9f
Fellowship (2-year), 147–148, 150 Funnel plot, 125–126, 126f, 160
Field trials, 4, 159 prostate cancer risk in first-degree
First-in-class compounds, 65 relatives, 9, 9f
Fisher’s exact test, 114t Futility threshold, 140, 160
Flims workshop, 149
5-fluorouracil, 31, 36, 73
FOLFIRI regimen, 77 G
FOLFOX regimen, 77 Gambia Hepatitis Intervention Study, 5
Follicular lymphoma, 92, 96 GELA LNH2003B programme, 97
Index 179
Gemcitabine, 141–142 Hormone replacement therapy (HRT),
Gender, cancer registries, 47–48 breast cancer risk/risk ratio, 2, 2t
Generalisability of results, 105, 109, Human ether-a-go-go K+ (hERG-K+)
155, 160 conductance assay, 63
Genetically engineered cells, 64 Hyperthermic intraperitoneal
Genetically engineered mouse models chemotherapy (HIPEC), 31
(GEMMs), 62 Hypothesis (hypotheses), 3, 7, 110, 161
Good Clinical Practice (GCP), 55, 160 alternative, 110
Good Laboratory Practice (GLP), 55, null see Null hypothesis
160
Good Manufacturing Practice (GMP),
55, 160
I
GRADE approach, 127, 130, 160 I2 index, 129–131, 161
Ibrutinib, 105, 106t, 108f, 109–111
see also RESONATE trial
H Ifosfamide, 136
Haematological cancers Immortal time bias, 39, 161
minimal residual disease negativity, Immunotherapeutic agents, 57, 67
98 Immunotoxins, 64
time-trend analyses, 46 Imprecision, 130, 161
Half-life (T1/2), 67 in silico prediction software, 59, 161
Hazard ratio, 109, 113–114, 125, 160 in vitro screening assays, 60–62
sarcoma (rare cancer) trial, 137 Incidence, 161
Head and neck cancer, 76 cancer in Europe, 50
Health-related quality of life incompleteness, cancer registry
(HR-QoL) endpoints, 90–91 data, 50
problems, 90 multiple tumours, cancer registries,
‘Healthy worker’ effect, 6 48
Hepatitis B virus (HBV), vaccination, 5 rare cancers, 135
hERG-K+ conductance assay, 63 studies using cancer registry data,
Heterogeneity of studies, systematic 50–51
reviews, 129–130 Incidence-based cohort mortality
Heterogeneity test, 115, 116f, 160 studies, 14, 161
High-resolution study, 47, 160 bias, 17–18
High throughput screening (HTS), Incidence date, 48, 49t
59, 160 Incidence rates, 1, 48
‘Hits’, 59 coding rules, 48
Hits-to-lead process, 59, 160 Inclusion criteria, 35, 73–74, 161
Hodgkin lymphoma, 46, 95t, 96 rare cancer trials, 138
Hormonal therapy, 80 systematic reviews, 122
180 Index
Inconsistency, 130, 161 K
Incurable tumours, endpoints for trials,
96 Kaplan-Meier estimates, 107, 162
Independent censoring, 107, 161 Kidney cancer
Independent UK Panel on Breast metastatic, 142
Cancer Screening, 20 survival improvement over time, 32,
Indirectness, 130, 161 32f
Information bias (measurement bias), Knowledge synthesis methods, 119,
5–6, 37–39, 128 120t
Insulin, exposure, colorectal cancer
risk, 8t L
Intention to screen, 14
Lead optimisation, 59, 162
Intention to treat, 14, 82, 128, 162
Lead-time, 20–21
Interim analysis, 111–113, 162
Lead-time bias, 14–15, 18, 20, 162
rare cancer trials, adaptive design, 140
cohort studies, on survival, 39
International Agency for Research on
overdiagnosis and, 20–21
Cancer (IARC)
Length bias, 15, 162
coding rules for cancer registries, 48
Leucovorin, 31, 36, 73
mammography screening, breast
Libraries
cancer mortality reduction, 14
antibodies, 64
International Classification of
compound/small molecule, 58–59
Diseases (ICD), 46
Literature search
ICD-10, 46
researcher case study, 149
International Classification of
systematic reviews, 125
Diseases for Oncology (ICD-O), 46
Liver cancer
International collaboration, 136–137
prevention, HBV vaccination, 5
International researchers, 147, 150t,
survival rates, 30
152–153
Log odds ratio, 126, 126f
INTERPHONE Study, 3
Logrank test, 77, 98, 113, 114t, 162
Interpretation
Lung cancer
errors in, 6–7
alcohol and, smoking as
importance of effect, 8
confounder, 6–7
Phase I trials, 67–68
non-small cell see Non-small cell
Interventions
lung cancer (NSCLC)
choice of, in randomised trials, 72
post-operative radiotherapy (PORT)
in PICO criteria, 122–124
meta-analysis, 121
screening, overdiagnosis, 19
survival rates, 29–30
slow trend for improvement, 50
Index 181
Lymph node resection Maximum serum concentration
melanoma, systematic review, 122 (Cmax), 67
survival relationship, 52 Maximum tolerated dose (MTD), 66,
Lymphoma 162
Hodgkin, 46, 95t, 96 Me-too drug, 65, 163
non-Hodgkin Measurement bias, 5–6, 37–39
chronic lymphocytic, 105, 106t, Risk of Bias (RoB) assessment, 128
110 Median, 29, 105, 162
diffuse large B-cell, 92, 96–97 baseline data, 105, 106t
follicular, 92, 96 Median overall survival (OS), 139,
survival improvement over time, 142
32, 32f Median progression-free survival, 77,
trials, endpoints, 95, 95t, 96 96, 138, 142
EFS, 97 Median survival, 29, 89, 96, 124, 162
PFS, 96 MEDLINE, 125
Lymphoma-specific survival, 95t Melanoma, metastatic, 122
Meta-analysis, 10
bias (publication), 9
M definition, 129, 162
Malignant phenotype, 57–58 FLASH (follicular lymphoma), 96
Malmö I trial, 20 mammography screening trials, 14
Mammography screening post-operative radiotherapy
breast cancer mortality reduction, (PORT), 121
14–15, 17 researcher case study, 149
cost-effectiveness, 22–24 systematic review comparison, 120t
observational studies, 14 Metastases/metastatic tumours
bias in, 17–19, 21 drug concentration, 67–68
overdiagnosis estimates, 21–22 lymph node, in melanoma, 122
trend studies, 18–19 renal cell carcinoma, 142
opportunistic, 22–23 soft tissue sarcoma, 141–143
overdiagnosis, 15, 19–22 Mice, models see Murine models
rate, 22 Microsimulation models, life-long
randomised trials, 14–15, 17 follow-up simulation, 21–22
cluster randomisation bias, 16–17 Minimal data set, for cancer registry,
overdiagnosis estimates, 20–21 45–46, 46t
underestimation of effect, 16 Minimal residual disease (MRD), 92,
see also Breast cancer, screening 163
Masking, in randomised trials, 81–82, as tumour-centred endpoint, 98
156, 162 Minimisation, 80, 163
bias, 7, 36, 71, 89, 128
182 Index
confounding, 7 breast cancer, 35
randomisation, 80–81 Netherlands Cancer Registry, 30f–31f,
MISCAN (MIcrosimulation 34f, 36–37
SCreening ANalysis) model, 21–22 Newcastle-Ottawa scale, 127, 163
Misclassification bias, 37–39 Nivolumab, 139
Misclassification of treatment/disease Non-differential bias, 6, 38–39, 163
status, 38–39 Non-differential (random)
‘Mixing of effects’ see Confounding misclassification, 38–39
Mobile phones, brain cancer and, 3 Non-Hodgkin’s lymphoma see
Mortality, 3, 13, 163 Lymphoma, non-Hodgkin
cancer, 44 Non-small cell lung cancer (NSCLC)
non-cancer-related, 33 BeTa randomised trial, 80
as outcome in randomised trial, 75 stereotactic radiotherapy, 51
postoperative, 52 Notification, cancer registries see
studies using cancer registry data, Cancer registries
50–51 Novel agents, 56, 65, 135
see also Death Null hypothesis, 7, 110–112, 163
Multi-arm, multistage (MAMS) false, Type-II error and, 111–113
design, 140, 163 true, rejection (Type-I error),
Murine models, 61–62 111–112
toxicology studies, 63–64 Number needed to screen (NNS),
limitations, 64 absolute, 15
see also Animal models
O
N Objective of study, 32, 163
National Cancer Institute (NCI), US, randomised trials, 87
60, 76 Objective response rate see Overall
National Institutes of Health Cancer response rate (ORR)
Chemotherapy National Service Objectives, definition, 163
Center, 73 O’Brien-Fleming boundary, 112–113,
NCCTG-N0489 trial, 97 163
NCI-60 Human Tumor Cell Lines Observation time, for study, censoring
Screen, 60 and, 107
Negative studies, under-reporting, 9, Observational studies, 3, 4t, 163
83, 125, 165 advantages, 4t, 37
selective reporting bias, 83, 165, 167 bias, 6, 13, 17–19, 37–40
see also Publication bias cancer prognosis, 37–39
Nested case-control studies, 5, 163 cancer screening, 17–19, 21–22
Net survival, 33–34, 163 confounding, 37–38
Index 183
information bias, 37 ORR see Overall response rate (ORR)
selection, 37–38 OS see Overall survival (OS)
trend studies, 18 Outcome(s), 1, 87–103, 164
cancer prognosis, vs randomised adverse, 90
trials, 36–37 overdiagnosis as see
cancer screening see Screening Overdiagnosis
(cancer) antiangiogenic therapy, 68
disadvantages, 4t cancer-specific survival analysis, 33
ecological studies, 14 categories/types, 87–91
eligibility criteria for systematic patient-centred clinical, 89–91
reviews, 122 tumour-centred clinical, 90–91, 98
interpretation/reporting, STROBE see also specific outcomes/
initiative, 8 endpoints
Newcastle-Ottawa scale, 127 choice, 87, 98–99
population-based screening and see comparison, 51
Screening (cancer) composite, 88, 97, 157
starting point for survival analyses, core sets see Core outcome set
32 (COS)
see also Case-control studies; definitions, 87–88, 92, 93t–94t, 164
Cohort studies disease-free survival see Disease-
Odds ratio (OR), 7, 125, 164 free survival (DFS)
log odds ratio, 126, 126f health-related quality of life, 90–91
self-selection bias, mammography incompleteness in cancer registry
screening, 18 data, 50
unadjusted/adjusted, 7, 8t lymphoma trials, 95, 95t
Oesophageal cancer, survival rates, 30 measurement, 82
Ofatumumab, 105, 106t, 108f, bias, 6
109–111 head and neck cancer, 76
see also RESONATE trial most common, 95–98
Off-label use of drugs, 143 multiple, 111
Oncogenes, 57, 62 patient-centred clinical, 89–91
Oncologists patient-reported measures (PROM),
becoming researchers see 75–76
Researcher, becoming a primary, 35, 76, 82, 88–89, 165
training see Training, oncologists disease-free survival, 35
Opportunistic screening, 22–23 FDA approval of cancer drugs,
Optimisation, in drug discovery, 55, 59 91–92
Optimum biological dose (OBD), 66, lymphoma trials, 95, 95t, 96
164 OS as see Overall survival (OS)
Organoids, 61 PFS as see Progression-free
184 Index
survival (PFS) NSCLC and stereotactic
systematic reviews, 123 radiotherapy, 51
see also Endpoint(s), primary PFS as surrogate for, 96
randomised trial design, 72 as primary endpoint of trial, 89
relevance to trial objectives, 87 advantage as, 89–90, 93t
secondary, 88, 167 alternatives, 91
lymphoma trials, 95, 95t definition, 93t
systematic reviews, 123 FDA approval of cancer drugs,
see also Endpoint(s), secondary 91–92
selection, 75–76 limitations as, 89–90, 93t
surrogate, 35, 75, 91–92, 168 lymphoma trials, 95, 95t
advantages, 91–92 PFS relationship, 96
FDA approval of cancer drugs, rare cancer trials, 136, 138–139
91–92 RESONATE trial, 105, 110
limitations, 92 sarcoma (rare cancer) trial, 136
lymphoma trials, 95, 95t, 96 for systematic review question, 123
researcher case study, 148–149 Overdiagnosis, 6, 15
validation, 99 breast cancer screening, 19–22
systematic review questions control group (randomised trials),
(PICO), 122–124 20
tumour-centred clinical, 90–91, 98 definition, 19
see also Endpoint(s); specific lead-time bias and, 20–21
outcomes (e.g. progression-free lung cancer screening, 19
survival) prostate cancer screening, 19
Overall response rate (ORR), 95–96, screening harm, 19–22
163 estimation, 20–21
advantages/limitations, 94t observational studies, bias, 21–22
definition, 94t, 163 populations at risk, definition, 22
FDA approval of cancer drugs, 92 randomised trials, biases, 20–21
rare cancer trials, 138–139
Overall survival (OS), 15, 33, 82,
89–90, 164
P
confounding by indication p-value, 7, 113, 115, 165
(treatment), 38 Paediatric cancer trials, 138
disease-free survival (DFS) use vs, PALETTE study, 138
in colon cancer trials, 98 Pancreatic cancer, survival rates, 30
disease-specific survival (DSS) vs, Panitumumab, 77
98 Partial response/remission (PR), 95, 164
Kaplan-Meier estimates, 105, 107, rare cancer (sarcoma) trial, 141–142
108f for systematic review question, 123
Index 185
Patient(s) methodologies for, 66
accrual, rare cancer trials, 138 optimal dose in rare cancers, 142
characteristics, description in patient selection, 65–66
reports, 105–107, 106t pitfalls, 67–68
involvement in rare cancer trials, properties of new agents for, 65–66
143 rare cancer trials, 142
recruitment, rare cancer trials, safety assessment, 66–67
137–139 Phase II trials, 56, 164
stratification see Stratification common outcomes used, 95–96
Patient-centred clinical outcomes, 89–91 recommended dose for, 66
Patient-centred endpoints, 164 sarcoma trial, 141–142
Patient-reported outcome measures Phase III trials, 56, 74, 164
(PROM), 75–76, 164 RESONATE see RESONATE trial
Pazopanib, 138 in sarcoma/rare cancers, 136–137
Peer-reviewed literature, publication Phase IV trials, 56, 164
bias, 125–126 PhD research, 152–153
Pembrolizumab, 139 Physicians, responsibility to give
Performance bias, in Risk of Bias advice, 1
(RoB) assessment, 128 PICO (Population, Interventions
Peritoneal carcinomatosis, 31 or exposures, Comparators and
PFS see Progression-free survival (PFS) Outcomes) criteria, 122–124
Pharmacodynamic analyses, Phase I Pigmented villonodular synovitis
studies, 67 (PVNS), 138–139
Pharmacokinetic properties, of new Placebo, 72, 123, 138, 164
agents for Phase I trials, 65 Point estimate, 109, 125, 164
Pharmacokinetic studies Population, in PICO criteria, 122–123
Phase I studies, 67 Population-based studies, 37
preclinical, 62–63 see also Observational studies;
Pharmacophores, 58 Screening (cancer)
Pharmacovigilance, 56 Population registers, follow-up, 49
Phase(s), of drug development, 55–56 Post-marketing surveillance, 56
see also Randomised trials; specific Post-operative radiotherapy (PORT),
phases meta-analysis, 121
Phase I trials, 56, 65–68, 164 Poster presentation, 151t
aims, 65–67 Postoperative mortality, 52
challenges, 65 Power, 36, 164
criteria for agent entering, 65–66 randomised trials, 36, 77, 88, 90
description, 65 see also Statistical power
dose definition, 66–67 PR see Partial response/remission (PR)
interpretation, 67–68 Pragmatic trials, 74, 165
186 Index
Preclinical assays, 60–66 Progression-free survival (PFS), 35, 91
animal models see Animal models definition, 93t, 165
antitumour efficacy, 60–62 endpoints of trials, 88, 91, 96
assays ‘in parallel’, 60 advantages/limitations, 93t
in vitro screening assays, 61–62 lymphoma trials, 95, 95t, 96
pharmacokinetic, 62–63 hazard ratio, 109, 113, 114t, 115
toxicology studies, 63–64 Kaplan-Meier estimates, 105, 107
Preclinical phase/studies, 55 median, 77, 96, 138, 142
rare cancers, 135 rare cancer trials, 138–139
see also Drug discovery sample size calculation and, 77
Presentations, of research, 151t overall survival (OS) relationship, 96
Prevalence, 4, 165 reporting in trials, 107, 108f
Primary endpoint see Endpoint(s), RESONATE trial see RESONATE
primary trial
Primary outcome see Outcome(s), sarcoma (rare cancer) trial, 136–137
primary for systematic review question, 123
Prognosis (cancer), 29–42 Propensity score matching, 38, 165
definition, 29, 156 PROSPERO, 121
improvement over time, 32, 32f Prostate cancer
measures of, 29 crude probability of death, 33–34
as survival rate, 32 net survival, 33–34
trials vs observational studies, risk in first-degree relatives, 9, 9f
36–37 screening, overdiagnosis, 19
bias, types, 37–39 STAMPEDE trial, 74, 80, 140
confounding, 37–38 survival improvement over time, 32,
immortal time bias, 39 32f
information (measurement) bias, Protein binding, of drugs, 68
38–39 Public health, 8
lead-time bias, 39 Publication bias, 9, 83, 125, 130, 165
selection bias, 37–38 definition, 83, 125, 130, 165
see also Survival randomised trial design and, 72
Prognostic factors risk assessment, funnel plot, 9, 9f,
sociodemographic, 31, 31f 125–126, 126f
treatment-related, 30–31 search of studies for systematic
tumour-related, 29–30, 30f review, 125, 130
anatomical extent, 30
cancer type, 29–30
Programmed death-ligand 1 (PD-L1),
Q
139 Q test, 129, 165
inhibitors, 139 Qualitative synthesis, 129, 165
Index 187
Quality to force balanced allocations,
of clinical trials, research into, 78–79
147–148 minimisation, 80–81
of evidence, for systematic reviews, rare cancer (sarcoma) trial, 141
127–128 restricted methods, 81
of studies, for systematic reviews, simple, 78
127–128 disadvantages, 78, 105
Quality of care studies, 51–52 stratified (stratification), 78–79, 81,
uses/aims, 52 168
Quantitative synthesis, 129, 165 difficulties, several strata/levels,
Questions 79–80
clinical research studies, 149t switch between treatments and, 82
evaluation and reading a report, 105 third-party systems, 81
formulation time to treatment failure and, 97
randomised trial design, 72–73 uncertainty principle and, 74
systematic reviews, 122–124 weighted, 80
Randomised controlled trials (RCT), 51
limitations, 51
R see also Randomised trials
Radioimmunoconjugates, 64 Randomised trials, 3–4, 51, 71–86
Radiotherapy, stereotactic for NSCLC, advantages/disadvantages, 4t, 51
51 aims and objectives, 13, 71, 87
Random error, 5, 7, 165 application of results of, 73, 83
Random sampling, 110 baseline data description, 105–109,
Random variation, 104, 110, 115, 165 106t
Randomisation, 72, 77–81, 165 baseline variables, description, 105
allocation concealment, 78, 81, 155 bias see Bias
baseline variables, description, 105 blinding/masking, 81–82, 128
Bayesian adaptive, 141–142 cancer prognosis, vs observational
bias in, 77, 81, 128 studies, 36–37
bias minimisation, 128 cancer screening see Screening
blocked, 78–79, 156 (cancer)
cluster, 16–17, 157 chemotherapy efficacy, 36
computer-generated, 77, 80–81 chemotherapy regimen
inadequate, 16 comparisons, 73
individual, 16 combination chemotherapy vs
key elements, 77 placebo, 73
key principle, 77–78 comparison of two interventions,
methods, 77–78 72–73
188 Index
control group see Control group missing data/patient loss, 82
definitions, 4, 165 mortality (cancer), 13
designing, 3, 72–73, 83 multinational, 136–137
crossover design, 139 negative results, reporting, 9, 83
eligibility criteria, 73–75 outcomes see Outcome(s)
outcome selection, 75–76 patient number, 76–77, 90
question formulation for, 72–73 patient randomisation see
rare cancers see Rare cancers, Randomisation
clinical research quality assessment, 127–128
sarcoma trial, 136–137, 141–142 registration, 82, 139, 166
smarter, for rare cancers, reporting see Reporting,
139–143 randomised trials
use of systematic reviews for, 72 sample size see Sample size
difficulties in rare cancers, 135–136 single agents vs placebo, 73
drug development and, 56, 71 starting point for survival analyses, 32
early, for chemotherapy, 73 statistical analysis, 82
early stopping, 111–113, 159 stereotactic radiotherapy for
rare cancer trials, 140, 143 NSCLC, 51
effectiveness (pragmatic), 74, 123, stratification (patients), 67, 78–79,
159, 165 81, 168
Core Outcome Measures in, 75, rare cancer (sarcoma) trial, 136–137
89, 123, 157 subgroups, and subgroup analyses,
efficacy trial, 74 115–117, 116t
eligibility criteria, 36, 73–75 systematic reviews see Systematic
for inclusion in systematic reviews
reviews, 122 validity see Validity of trials
endpoints see Endpoint(s) see also Drug development
evaluation, 83 Rare cancers
explanatory trial, 74 clinical research, 135–145
generalisability of results, 105, 109, challenges and limitations,
155, 160 136–137
limited, 36 designing clinical trials, 137–143
as gold standard, 36–37 adaptive designs, 139–140, 155
head and neck cancer, 76 basket studies, 138
hypotheses, testing, 3 crossover design, 139
inclusion/exclusion criteria, 35, early stopping, 140, 143
73–74 endpoint selection, 138–139
rare cancers, 138 enrolment design, 140–143
paediatric, 138 inclusion/exclusion criteria, 138
intention to treat principle, 82 methodology, 138–143
Index 189
multi-arm, multistage Reporting bias, in Risk of Bias (RoB)
(MAMS), 140, 163 assessment, 128
optimal dose, 142 Reporting guidelines, 82–83, 150, 166
organisational aspects, 137–138 Representativeness, 8
sample size, 140 Research
smarter trials, 139–143 common malignancies, approach,
development of research, 137 135
future directions, 137–143 data presentation,
grouping subtypes, masking recommendations, 151t
clinical benefits, 135 learning, workshops, 149, 150t
off-label use of drugs, 143 pharma-based, 135
patient involvement, 143 rare cancers see Rare cancers,
definition, 135 clinical research
registries, 43 waste of resources, 72
RAS gene mutations, 31, 77 Researcher, how to become a,
Recall bias, 4t, 39, 166 146–154
Recommended dose, 166 case studies, 146–154
for Phase II testing, 66 fellowship (2-year), 147–148, 150
Recurrence-free survival, 35, 166 international experience, 147, 152
Registration of trials, 82, 139, 166 PhD, 152–153
Registries research training do’s and don’ts,
cancer see Cancer registries 148t
health-related systematic reviews, 121 RESONATE trial, 105
Relative risk (RR), 166–167 baseline data, description, 105–109,
mammography screening trials, 14, 17 106t
see also Risk ratio (RR) best overall response (BOR), 105,
Relative survival, 32–35, 32f, 34f, 166 109
rare cancers, 136 heterogeneity test, 115, 116f
rates, 32 interim analysis, 111–112
Renal cell carcinoma, metastatic, 142 null hypothesis, 110
Reporter systems, 60 O’Brien-Fleming boundary,
Reporting, randomised trials, 82–83 112–113
CONSORT guidelines, 83, 158 overall survival (OS) estimations,
guidelines, 82–83, 166 109
publication bias and see Publication Kaplan-Meier plots, 107, 108f
bias progression-free survival (PFS),
selective, 82–83 105, 109–110
selective reporting bias, 82–83, 128, hazard ratio, 109, 113, 114t, 115
165, 167 individual variability in, 108f, 110
SPIRIT guidelines, 83, 168 Kaplan-Meier plots, 107, 108f
190 Index
significance level, 111 definitions, 2, 166–167
subgroup analyses, 115–117, 116f mammography screening trials, 14, 17
quantities estimated, 109–110 RoB tool (Cochrane Risk of Bias),
sample size calculation, 113, 114t 127–128, 130, 157
significance level, 111–112 Rule-based models, 66
statistical power, 113
statistical tests, 110–111
Type-1 error control, 111–112
S
Response rates, 91, 166 Safety, 167
sarcoma (rare cancer) trial, 136–137 drugs, 56
Right-censored survival time, 107, Phase I trials, 66–67
109, 166 preclinical toxicology studies,
Right-censored time-to-event 63–64
variables, 105, 107, 109 RESONATE trial, 105
Risk, 166 Salvage treatment, 87, 90, 167
attributable (excess), 2, 155 Sample size, 104, 167
of breast cancer by HRT preparation calculation, 76–77, 88, 113
type, 2, 2t RESONATE trial, 113
of death from cancer, 33 O’Brien-Fleming boundary and,
definition, 1, 166 112
of failure, right-censored survival randomised trials, 76–77, 88, 140
time, 107 rare cancer trials, 140
measurement, 1–2 Sarcomas
see also Incidence rates PALETTE study, 138
of relapse, lymphomas, 97 Phase II study, 141–142
Risk difference, 2, 166 Phase III trial, 136–137
breast cancer by HRT preparation Screening (cancer), 13–28
type, 2, 2t benefit, 13–19
Risk factors, 1–12, 167 breast cancer see Breast cancer;
behavioural, 1 Mammography screening
bias, cluster randomisation, 16 cervical cancer, 14
cancer prevention, 10 colorectal cancer, 14
epidemiology, 2–3 cost-effectiveness, 22–24
importance, reasons, 1 current programmes, 24
representativeness of effects, 8 efficacy, evaluation, 51
sources of error in studies, 5–7 harm, 19–22
Risk of bias, 17, 83, 167 see also Overdiagnosis
Risk ratio (RR), 1–2, 7, 131 intention to screen, principle, 14
breast cancer by HRT preparation lead-time bias and, 15, 20–22
type, 2, 2t length bias and, 15
Index 191
observational studies, 13–14, 17–18 Self-selection bias, 18
biases in, 17–19, 21–22 Significance level, 111–112, 167
case-control studies, 18 Small lymphocytic lymphoma (SLL),
overdiagnosis estimation, 21–22 105, 106t, 110
trend studies, 18 see also RESONATE trial
types of studies, 17–18 Small molecule(s), 64, 167
opportunistic, 22–23 databases, 59
policies, 23–24 Small molecule drug discovery, 58–59
population-based, 13–14 Smart drug development, 60
overdiagnosis, 21 Smart trial design, for rare cancers,
trend studies, bias, 18 139–143
randomised trials, 13–15 Smoking
bias due to cluster randomisation, as confounder, 6–7
16–17 randomised trial unethical, 37
breast cancer mortality reduction, Sociodemographic factors, prognostic,
14 29, 31, 31f, 167
confounding, avoidance, 13–14 Socioeconomic status
control group contamination, 16 colon cancer survival, 31
overdiagnosis estimation, 20–21 survival rates and, 50
potential biases, 16–17, 20 Soft tissue sarcoma, metastatic, 138,
tumours with low growth rates, 19 141–143
underestimation of effect, 16 Source population, 5, 168
Screening (drug development) see SPIRIT guidelines, 83, 168
Drug development Stage migration, 30, 39, 168
Search engines, 119 STAMPEDE trial, 74, 80, 140
Search strategy, for systematic Statistical analysis, 104–118
reviews, 125 baseline data description adequacy,
Secondary endpoints, 88 105–109
SEER database, colon cancer survival Kaplan-Meier plots of OS and PFS,
rates, 30 107–109, 108f
Selection bias, 5–6, 37–38, 128 quantities estimated, 109–110
in case-control study, 4t randomised trial data, 82
definition, 37, 128, 167 statistical power adequacy,
loss to follow-up, 38 112–113, 114f
in Risk of Bias (RoB) assessment, subgroup analysis, 115–117
128 tests performed, choice, 110–111
Selective reporting, 82–83 Type-I error control, 111–112
Selective reporting bias, 82–83, 128, variability of results, 109–110
165, 167 see also Statistical tests/testing
Self-reported exposures, bias, 6 Statistical estimation, 109–110, 162, 168
192 Index
Statistical hypothesis test, 7, 113, 168 cancer-specific, 33–35
see also Statistical tests/testing relative survival, 34–35, 34f
Statistical power, 105, 164, 168 cancer types, 29–30
adequacy in trials, 112–113, 114t cause-specific, 34
definition, 112, 168 conditional, 35–36, 157
effect size and sample size, disease-free see Disease-free
112–113, 140 survival (DFS)
sarcoma/rare cancer trial, 137 disease-specific (DSS), 98
subgroup analyses and, 115 event-free see Event-free survival
see also Power (EFS)
Statistical tests/testing, 7, 8t, 105, factors influencing, 29–32
110–111, 113, 168 sociodemographic, 29, 31, 31f
adaptive design, rare cancer trials, 140 treatment-related, 30–31
more than one test, decision tumour-related, 29–30, 30f
strategy, 111–112 improvement over time, 32, 32f
multiple, 112, 115 influence of country of birth, 52
rare cancer trials, 141 lead-time bias and, 39
subgroup analyses, 115 length, incidence date and, 48, 49t
see also Statistical analysis lymph node resection and, 52
Statistically significant effects, 8, 17, lymphoma-specific, 95t
111, 113, 131, 168 median, 29, 89, 96, 162
Stratification, 67, 78–79, 81, 168 advanced colorectal cancer, 124
rare cancer trials, 136–137 net see Net survival
Stratified randomisation, 78–79, 81, overall see Overall survival (OS)
168 patient-centred clinical outcomes,
STROBE (Strengthening the 89–91
Reporting of Observational Studies progression-free see Progression-
in Epidemiology) initiative, 8, 168 free survival (PFS)
Study design, 3, 4t, 8 quality of care studies, 52
see also Case-control studies; random variation and, 104
Cohort studies recurrence-free, 35, 166
Subgroup analyses, 115–117, relative see Relative survival
129–130, 168 studies using cancer registry data,
Sublibraries, molecules, 59 50–51
Surrogate outcomes/endpoints see targeted therapy and, 52
Outcome(s), surrogate see also Prognosis (cancer)
Survival, 29, 32–36 Survival rates, 29, 32, 168
absolute, 34, 34f cancer types with best rates, 29
analyses, 32–36 cancer types with poor rates, 29–30
conditional survival, 35–36
Index 193
high, death certificate only (DCO) primary study problems and, 120t
cases, 45 quality of, 10, 121
relative survival, 32 quality of studies, assessment,
variations in Europe, 50 127–128
Survival time randomised trials for, 36, 71, 83,
definition, 32 120t, 125
right-censored, 107, 109, 166 randomised trials power
Syngeneic models/tumours, 61, 169 maximisation, 36
Synthesis, systematic review data, researcher case study, 148–149
129–131 search strategy, 125–126
qualitative, 129, 165 significance, 71, 120t, 121
quantitative, 129, 165 study designs for, 120t
Synthetic lethality, 57, 169 study selection, 127
Systematic errors, 5–6, 169 subgroup analyses, 129–130
see also Bias; Confounding synthesis, 129–131
Systematic reviews, 10, 36, 71, validity of conclusions, 121–122
119–134 waiting for consensus in, 10
bias minimisation, 36, 72
body of evidence, quality, 127–128
clinical relevance, 121
T
data extraction from studies, T-test, 114t
128–129 Tamoxifen, 75
definition, 119, 169 Target (drug) discovery, 57–58
for designing randomised trials, 72, Targeted therapy, 47, 52, 57, 124
83 definition, 57, 169
eligibility criteria, defining, 122 mechanism of action, 57
evaluation, 121 PALETTE study, 138
examples, 121–122 Phase I studies, 66–67
formulation of question (PICO), Therapeutic effect, half-life
122–124 relationship, 67
health-related, registry, 121 Therapeutic index, 63, 65
heterogeneity of studies, 129–130 Thiotepa, 73
impact of publication bias, 72 3+3 models, 66, 155
key steps, 119, 120t Time-to-event analysis, 88–90, 169
knowledge synthesis method, 119, Time-to-event data, 107
120t Time-to-event variables, right-
meta-analyses comparison, 120t censored, 105, 107
myths about, 120t Time to next treatment (TTNT), 94t
need for, reasons, 71, 119, 120t, advantages/limitations, 94t
121–122 lymphoma trials, 95t
194 Index
Time to progression (TTP), 91, 169 Tumour microenvironment, 58, 61,
advantage/limitations in trials, 93t 169
definition, 93t, 169 Tumour-related prognostic factors,
endpoints of trial, 96 29–30, 30f
lymphoma trials, 95t Tumour suppressor genes, 62
Time to treatment failure (TTF), 97 Type-I error, 105, 111, 169
advantages/limitations, 93t control, 111–112
definition, 93t multiple testing, 112
TNM classification, 30, 47 Type-II error, 111–113, 170
Toxicity, 63, 169 probability, 112–113
dose-limiting, cytotoxic agents, 66
molecular targeted anticancer drugs,
67
U
murine models, 63–64 Unadjusted ratio, 7, 8t, 170
prediction in humans, 63–64 Uncertainty principle, 74–75, 170
sarcoma (rare cancer) trial, 137 Under-reporting, negative studies see
time to treatment failure and, 97 Negative studies, under-reporting
Toxicology studies, preclinical, 63–64 United Kingdom, survival rates by per
Trabectedin, 143 capita income, 50
‘Trace back’, 44 United States (US)
Training, oncologists, 146 breast cancer screening, 22–23
research training opportunistic screening, 22–23
do’s and don’ts, 148t
where to learn, 149, 150t V
see also Researchers, becoming
Treatment-related prognostic factors, Validation
30–31 drug targets, 57
Trend studies, 169 surrogate outcomes (endpoints), 99
breast cancer mortality, 18–19 Validity, 170
cancer screening, 14 Validity of results, 105, 170
TRUSTS (EORTC) study, 143 systematic reviews, 122
Tumour(s) Validity of trials, 16, 36
low growth rates, screening, 19 adaptive design, for rare cancers,
see also specific cancers 139–140
Tumour, Node, Metastasis (TNM) outcome measures, 76
classification, 30, 47 Variability of results, 109–110
Tumour cell(s), circulating, 91, 98 Vascular endothelial growth factor
Tumour cell lines, 60, 169 (VEGF), 68
Tumour-centred endpoints, 90–91, 169 Virtual screening, 59, 170
Volume-outcome research, 52, 170
Index 195
W
Weighted randomisation, 80
WHO/IACR Minimal Data Set,
45–46, 46t
Workshops, research and clinical
trials, 149, 150t, 151
X
Xenograft models, 61, 170
Xenografted tumours, 61
196 Index

2018 ESMO Handbook of Interpreting Oncological Study Publications

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2018 ESMO Handbook of Interpreting Oncological Study Publications

Uploaded by

Copyright:

Available Formats

ESMO HANDBOOK OF

Henk van Halteren

© 2018 European Society for Medical Oncology

Printed through s | s | media limited, Rickmansworth, Hertfordshire, UK

Professor Mike Clarke

Professor Mike Clarke has 30 years’ experience of rigorous evaluations

Dr Veronika Ballova is a senior medical oncologist at the Onkologie

Dr Henk van Halteren

Dr Henk van Halteren is Consultant in Medical Oncology at the Admiraal

Bellei M. Dipartimento di Medicina Diagnostica, Clinica e di Sanità

AACR American Association for Cancer Research

We would also like to thank Nicki Peters, Claire Bramley and

Henk van Halteren

What you need to know...

Clinicians making decisions about the care of cancer patients need

Mike Clarke, DPhil

...and why you need to understand it.

In the past decade the scientific world of oncology has changed

In epidemiology, a risk factor, or exposure, is an event, condition

Why Should Oncologists Worry About Risk Factors?

Risk ratio and risk difference

Table 1 Relative Risk of Postmenopausal Breast Cancer, by Type of HRT Preparation

Mobile phones and brain cancer

The epidemiologist Bradford Hill (Hill, 1965) proposed certain aspects

Table 2 Bradford Hill’s Criteria for Causation

Sources of Error in Risk Factor Studies

Cases Controls Unadjusted odds ratio Adjusted odds ratio

X Andersson et al, 1996

Lesko et al, 1996

Lightfoot et al, 2000

McCahy et al, 1996

The Benefit of Screening

14 Sankatsing and de Koning

2. Bias due to cluster randomisation. Randomised trials generally use

16 Sankatsing and de Koning

Potential biases in observational studies

The effect of screening programmes can also be estimated by studying

18 Sankatsing and de Koning

The Harms of Screening: Overdiagnosis

20 Sankatsing and de Koning

To overcome bias related to lead-time, overdiagnosis is ideally estimated

Cost-effectiveness of Breast Cancer Screening

22 Sankatsing and de Koning

Most European breast cancer screening programmes are targeted at

24 Sankatsing and de Koning

26 Sankatsing and de Koning

28 Sankatsing and de Koning

Factors Influencing Cancer Survival

Sociodemographic Prognostic Factors

100 1999-2001 2005-2007 Difference

Prognosis and Survival

Cancer-specific Survival and Relative Survival

70 <65 years Absolute survival

Disease-free Survival, Progression-free Survival and Recurrence-free

Cancer Prognosis in Trials Versus Observational

Information Bias (Measurement Bias, Misclassification)

Notification and Completeness of Cancer Registries

Table 1 Main Notification Sources

Minimal Data Set

While in the past the International Classification of Diseases (ICD) was

Table 4 Main Coding Rules for Multiple Tumours

Table 5 Order of Declining Priority for the Incidence Date

Quality of Care Studies with Cancer Registry Data