Professional Documents
Culture Documents
The field of genetic epidemiology is still quite young. Our journal is now 15
years old, the International Genetic Epidemiology Society a mere 7. As our field
enters its adolescence, perhaps the time is about ripe for an identity crisis, while it
Contract grant sponsor: National Institutes of Health; contract grants: CA 52862 and ES-07048.
*Correspondence to: Duncan C. Thomas, Department of Preventive Medicine, University of Southern
California, 1540 Alcazar Street, CHP-220, Los Angeles, CA 90089-9011. E-mail: dthomas@usc.edu
Presented as the Presidential Address at the Annual Meeting of the International Genetic Epidemiology
Society, St. Louis, MO, September 910, 1999.
Received for publication 6 October 1999; revision accepted 2 January 2000
290
Thomas
tries to assert itself as a distinct entity from its parentsgenetics and epidemiology.
In this address, I want to examine the ways I think our field is really unique and the
directions I think it should be going.
Rao [1985] addressed the uniqueness question in an editorial accompanying the
first issue of Genetic Epidemiology, where he wrote: Genetic epidemiology differs
from epidemiology by its explicit consideration of genetic factors and family resemblance; it differs from population genetics by its focus on disease; it also differs
from medical genetics by its emphasis on population aspects.
These are key concepts in any definition of our field, but by focusing on the
differences between the fields, he did not explicitly mention the environment. In a
second editorial, Neel [1984] highlighted four developments:
the explosion of knowledge concerning the structure of DNA. Understanding the function...will not yield to classical Mendelian genetics but will require the epidemiologic approach
an aging population, whose diseases are often a complex mixture of the
inherited and the acquired...
the troublesome issue of screening for the workplace...
the genetic impact of a wide variety of potentially mutagenic agents being
introduced into the environment...
and he concluded that the genetic epidemiologist is slave to the concept of multifactorial causation (emphasis added). But theory and practice do not always go hand
in hand. Too many of the papers in the Annual Meeting and our Journal simply
reflect the narrow focus of our parent disciplines, without the integration our field is
supposed to provide.
So what do I see as the distinguishing characteristics of genetic epidemiology?
1. A focus on population-based research
2. A focus on studying the joint effects of genes and the environment,
3. Incorporation of the underlying biology of the disease into its conceptual
models.
By population-based research, I mean study designs that are aimed at estimating parameters and testing hypotheses that can be generalized to the population at
large, rather than narrowly defined sub-populations like population isolates, heavily
loaded or inbred pedigrees. Such special populations can be highly informative for
gene mapping, but there is much more to genetic epidemiology than just mapping
genes. Estimation of population parameters generally requires study samples that
can be referred to their target populations by means of some well-defined statistical
sampling scheme, however complex that might be.
The study of the joint effects of genes and environment requires a serious attempt to assess environmental risk factors, not simply to treat them as a binary indicator variable for exposed or not or as black box of correlated residual variation. It
entails serious consideration of exogenous exposures as complex, multivariate, timedependent variables, and an understanding of the metabolic pathways that influence
the internal environment and the genes that regulate these pathways. It also involves
291
disentangling residual familiality that could be due to either. It requires the investigator to confront the practical and conceptual difficulties of measuring complex environments in family studies, including the problem of missing or inaccurate data
that is non-randomly distributed. And it raises a host of ways in which the two aspects can combine to influence a phenotype, including as independent main effects,
gene-environment interactions, and gene-environment correlations.
Finally incorporation of disease biology involves a real understanding of the
causal pathways leading from genes and environment to disease causation, and expression of these concepts in our mathematical models for penetrance.
To illustrate these basic principles, I review the genetic epidemiology of breast
and ovarian cancer, probably the two cancers for which we have the most complete
story to tell. I begin by discussing some conceptual models for these diseasesboth
descriptive and mechanisticto illustrate what I mean by incorporation of disease
biology, then discuss the need for population-based study designs in the search for
major susceptibility genes and their characterization, and finally address gene-environment interactions.
THE GENETIC EPIDEMIOLOGY OF BREAST AND OVARIAN CANCER
Conceptual Models for the Joint Effect of Genes and Environment
From the beginning, it has been recognized that both environmental and genetic
factors play a role in these cancers. Established environmental risk factors for breast
cancer include hormonal factors and radiation, amongst others [Feigelson and
Henderson, 1998; Kelsey, 1993; Kelsey and Bernstein, 1996]. Menstrual and reproductive history variables suggest that endogenous estrogen levels play a role, along
with factors like diet, body build, and physical activity that can also influence hormone levels [Wu et al., 1999] The effect of oral contraceptive (OC) use and estrogen
replacement therapy is more controversial, but there is evidence for a modest effect
of high-dose OC use in young women. Obviously genetic factors are also important
in these cancers, but perhaps the historical focus on BRCA1 and BRCA2 has distracted attention from the role of other genes, particularly those involved in hormone
regulation. Until recently, however, genetic and environmental factors have seldom
been considered in combination.
Descriptive Models. There are a variety of conceptual models that might be
considered to account for the joint effects of genes and environment in breast cancer,
but we still do not have a good quantitative model that would be useful for, say,
genetic counseling. The Claus model [Claus et al., 1991] and variants thereof based
on actual mutation testing data [e.g., Struewing et al., 1997] provide a good description of the penetrance of the major genes BRCA1 and BRCA2, but take no account of
the possible modifying effects of environmental factors.
The Gail model [Gail et al., 1989] is the first serious attempt to provide predictive models of breast cancer risk in relation to the established epidemiologic risk
factors, but its treatment of genetic factors is very limited. Logistic models for 10-,
20-, and 30-year risk were fitted to the Breast Cancer Detection Demonstration Project
data as a function of initial age, age at menarche, age at first full-term pregnancy,
and the number of biopsies, as well as the number of affected first-degree relatives
and interactions with age and other risk factors.
292
Thomas
293
esis, in which ovarian cancer risk is related to the cumulative number of ovulatory
cyclesthe ovarian analogue of breast tissue age. We now think that ovarian cancer
originates in inclusion cysts, epithelial cells that get trapped in the lining of the
ovary following the ovulation, thereby exposing them to estrogen-rich follicular fluid
[Casagrande et al., 1979; Dubeau, 1999].
The two-event clonal expansion model [Moolgavkar and Knudson, 1981] addresses the joint effect of genes and the environment much more directly. In this
model, cancer again results from a single cell undergoing a sequence of mutational
events, but now only two rate-limiting mutational events are required, such as loss
of both alleles of a tumor suppressor gene. Also, intermediate cells resulting from
the first mutation undergo a birth-and-death process leading to clonal expansion.
Individuals with a hereditary predisposition to cancer start life with all cells in the
intermediate state, so all that remains is for any one of these cells to undergo the
second mutation. The rate of mutational events or the rates of intermediate cell proliferation can be influenced by exposures to carcinogens or promoters. The model
was initially developed to account for the genetics of retinoblastoma [Knudson, 1971],
but has subsequently been applied to many other cancers and environmental agents,
including breast cancer and reproductive factors. The model has been shown to provide an excellent fit the observed age-incidence breast cancer incidence rates in six
populations with quite different patterns [Moolgavkar et al., 1980].
Both models provide explicit predictions for the age-specific incidence as a function of genotype and time-dependent exposure histories. In 1987, Krailo et al. [1987]
fitted both the breast tissue aging and two-mutation models to case-control data on
breast cancer in young women [Pike et al., 1981], treating family history as either a
determinant of the cell proliferation rates or the probability that one event was inherited. Family history was found to have a strong effect in either model but its effect
was not appreciably altered by including the other risk factors.
Need for Population-Based Study Designs: Search for Major
Susceptibility Genes
Now to some genetics. For this audience, I hardly need review the BRCA1 and
BRCA2 story, but I would like to make a few points about the process from a genetic
epidemiology viewpoint. The story begins with the observation of family history as a
risk factor. Perhaps the most important of the familial aggregation studies was the
CDC Cancer and Steroid Hormone (CASH) study [Wingo et al., 1988], a large, multicenter case-control study with cases identified from various population-based cancer
registries and controls selected by random digit dialing. Data on OC use and the
standard epidemiological risk factors were obtained, together with a history of breast
cancer in mothers and sisters.
The three key papers on family history took different approaches to the analysis: Sattin et al. [1985] used a standard case-control approach to look at family history as a predictor of the disease status of the probands, while Claus et al. [1990]
compared the age-specific breast cancer incidence rates in the cohorts of family members of cases and controls. Claus et al. [1991] later revisited these same data with a
segregation analysis. The Sattin et al. analysis revealed that a positive family history
was a risk factor for breast cancer, more so if there was an affected first-degree
relative and particularly if both a mother and sister were affected; it also revealed a
294
Thomas
higher risk if the affected relative was young. The Claus et al. family history analysis
pursued this observation further in relation to the age of both the proband and her
relatives. Finally, the familiar estimates of penetrance from the Claus segregation
model show a dramatic decline in genetic relative risk with age.
Although the CASH data are certainly population based, they are not without
problems. The family history data were reported only by the probands and not verified. Now false positives can be eliminated by medical confirmationas some current studies like the NCI Cooperative Family Registries are doing [Anton-Culver et
al., 1996]but false negatives are harder to identify. In the CASH data, the rate of
reported breast cancer in relatives of case probands was no higher than the general
population and in control relatives it was about half the expected rate. Future genetic
epidemiology studies need to address this problem more carefully, perhaps with validation sub-studies to estimate both the false-positive and the false-negative rates and
incorporate them into the analysis of the main study.
Now the hunt was on to find these major susceptibility genes. As is well known,
the most efficient study design for this purpose is not population based, but exploits
large, multiple-case families. The first convincing linkage to chromosome 17q was
reported by Hall et al. [1990], who found stronger evidence for linkage in young
breast cancer families. Pooling 214 families from many groups, the Breast Cancer
Linkage Consortium [Easton et al., 1993] was able to localize the gene down to
about a 13-cM region of 17q21. The year 1994 saw the cloning of BRCA1 in this
region [Miki et al., 1994], followed in short order by the cloning of BRCA2 on 13q
[Wooster et al., 1995].
The search might now continue for BRCA3, but first it would be worth re-examining the evidence for the existence of other major genes after accounting for
BRCA1 and BRCA2. This again requires a return to population-based research.
Antoniou et al. [2000] recently described modified segregation analyses of two series of ovarian cancer families, one high-risk and one population-based, taking measured BRCA1 and BRCA2 genotypes into account and found no convincing evidence
for the segregation of a third major gene. Zhao et al. [1997] described this circle of
research from familial aggregation, to segregation, to linkage, to association, and
back again, and argued cogently for a unified framework of population-based research for accomplishing this entire agenda. Furthermore, if interacting factors
genetic and/or environmentalare important, power to find other major genes can
be increased if they are taken into account [Cox et al., 1999; Thomas and Gauderman,
2000]. But perhaps we are entering the era where efforts would be better devoted to
understanding the role of the more common, low penetrance metabolic genes.
Need for Population-Based Study Designs: Characterizing BRCA1
and BRCA2
295
Darpoux and Bonaiti-Pellie [1992], who coined the term mod score. The theoretical
basis for the technique is described by Hodge and Elston [1994]. The estimates of
BRCA1 penetrance from the linkage consortium using this approach were approximately 87% for breast cancer and 44% for ovarian cancer to age 70 [Easton, et al.,
1995]. A variety of other estimates have been reported since, some based on actual
BRCA1 mutation testing, some based on ovarian cancer families who are highly likely
to be BRCA1 carriers, and all showed similarly high penetrances.
The first population-based estimates were reported by Struewing et al. [1997],
based on genotyping an unselected series of volunteers from the Ashkenazi Jewish
community of Washington, DC, and comparing the incidence of breast and ovarian
cancer in first-degree relatives of carrier and non-carrier probands, a design that has
subsequently been dubbed the kin cohort design [Wacholder et al., 1998]. The
estimated penetrance by age 70 was only 56% for breast cancer and 16% for ovarian
cancer, both significantly lower than those estimated from the linkage families. Data
from the general population of Australia (not limited to Ashkenazi Jews) yields an
even lower breast cancer penetrance estimate of 40% by age 70 [Hopper et al., 1999].
Why are these two groups of estimates so different? The confidence limits are
too narrow for chance to be the explanation, but of course biases such as under- or
over-reporting of family history could be involved. What about other explanations
like ascertainment bias, locus, or allelic heterogeneity, model mis-specification?
In principle, the mod score approach should correctly deal with the restriction to
heavily loaded families, if their ascertainment is only through the phenotypes, not the
markers. But some analyses were restricted to linked families, because only some of
the families would be segregating BRCA1 and the others would simply add noise.
Simulation studies [Siegmund et al., 1999a] showed that this kind of ascertainment can
indeed over-estimate penetrance. Another possible explanation is allelic heterogeneity.
Maybe different BRCA1 mutations have different penetrances, and the common mutations in Ashkenazim could have lower penetrance. Linkage analyses have suggested
allelic heterogeneity, with some alleles more strongly associated with breast and some
with ovarian cancer [Easton et al., 1995], and mutation testing data [Gayther et al.,
1997] suggest that the breast/ovarian ratio is associated with the location of the mutation. There are also data that suggest that missense mutations in BRCA1 could be associated with some increase in breast cancer risk, but not as large as for truncating
mutations [Durocher et al., 1996; Dunning et al., 1997].
Yet another possibility is residual heterogeneity in the model parameters. Standard methods of segregation and linkage analysis assume that penetrances and allele
frequencies are homogeneous across the population, but why should they be? For
one thing, environmental risk factors as well as genes would tend to cluster in heavily
loaded families. If not allowed for in the model, the penetrance that would be estimated from such families would be higher than that estimated from random families
from the general population. Kraft and Thomas [2000] compared the relative efficiencies of various likelihood formulations, showed how their estimates can be biased if baseline rates or allele frequencies are heterogeneous, and described a
mixed-models approach to allowing for such heterogeneity.
These considerations illustrate the difficulties in interpreting the results from
heavily loaded families and the importance of population-based designs for the purpose of characterizing genes. It is often argued that population-based designs are
296
Thomas
inefficient for mapping rare major genes, and there is some merit in this position.
But careful attention to sampling strategies can go a long way to improving the
efficiency of population-based designs. Two examples are a recent extension of the
kin cohort design to larger pedigrees with additional genotyping [Gail et al., 1999]
and multi-stage sampling designs [Whittemore and Halpern, 1997] that are being
applied to the design of one of the Cooperative Cancer Family Registries for Colorectal
Cancer Studies [Siegmund et al., 1999b].
Gene-Environment Interactions in Breast and Ovarian Cancer
297
weakerindependence conditional on parental genotype, i.e., within rather than between families [Witte et al., 1999]. For late-onset diseases, sibling or cousin controls
are an attractive alternative, although there are still practical difficulties with them
too: not every case will have an eligible sibling, or even a cousin, and age-matching
is also difficult [Witte et al., 1999]. These study design issues were the major focus
of a recent NCI workshop [Thomas, 1999].
In my view, a study of the joint effects of OCs with BRCA1 and BRCA2 makes little
sense without at the same time considering the various polymorphic genes on the estrogen pathway, as it is highly plausible that these genes might interact with OC use. Each
of the genes on this pathwayCyp17, Cyp19, 17HSD1, and ERshow marked differences in prevalence between ethnic groups that parallel their mean hormone levels and
their breast cancer rates [Feigelson et al., 1996; Feigelson and Henderson, 1998].
We are currently studying these genes and their interactions with OC use in a
large multi-ethnic cohort study in Los Angeles and Hawaii [Kolonel et al., 2000].
Each of the 215,251 cohort members completed a questionnaire on dietary and other
risk factors, including family history, and is being followed by linkage to the statewide cancer registries. The design of the nested case-control study of genetic effects
involves identifying sibships with at least two affected and at least one unaffected
members (one of the affecteds could be a parent and the original cohort member
could be either affected or unaffected). We anticipate enrolling approximately 500
such sibships. Power calculations indicate that the design should be capable of detecting twofold interactions of OC use with the various estrogen pathway genes but
interactions with BRCA1 or BRCA2 would have to be at least 10-fold.
FUTURE DIRECTIONS FOR THE SOCIETY
With this brief tour of the genetic epidemiology of breast and ovarian cancer, I
return to the question I posed at the outset of how to define an identity for our field.
I suggest that the best way to promote our field is for each of us to practice these
principles in our own research. Things the International Genetic Epidemiology Society
can do are to recruit more investigators whose research illustrates these principles and
to encourage submissions of such papers to the Societys journal. Surprisingly few
epidemiologists have joined our society, despite the fact that they are turning in droves
to the study of molecular and genetic factors. We are currently exploring affiliations
with molecular epidemiology sections of other societies. Furthermore, more than half
of the papers in last years volume of Genetic Epidemiology were methodological and
not one of them mentioned gene-environment interactions! It seems that most mainstream epidemiologists and geneticists prefer to publish substantive papers in leading
epidemiology, genetics, or disease-oriented journals. Now in my view, our field is more
than just a collection of methods and our journal should try to do a better job of reflecting successful applications of these principles. I also think the Society could take a
more active role in helping the National Institutes of Health restructure study sections
and identify reviewers for grants in our field. We need to do more to foster creative,
high-risk, and multi-disciplinary studies.
Bioethics Initiatives
During the past year, the Society formed a committee to consider ethical, legal,
and social issues relating to our field. The charge to the committee includes the following items:
298
Thomas
Ethical issues in the conduct of genetic epidemiology research, such as informed consent, disclosure of genetic information to subjects, relatives, third
parties, etc.
Legal issues, such as access to medical records for research purposes and
protection of confidentiality.
National/cultural disparities in ethics standards and practices, including a
philosophical examination of the arguments for universality versus cultural
relativity and the justification for perceived differences between countries
as they relate to the field of genetic epidemiology.
Implications of genetic epidemiology research for genetic counseling, including the uncertainties and complexities of genetic risk estimates, the frequent
lack of population-based estimates, the role of environmental modifying factors, and how to deal with new research findings in a counseling situation.
Implications for public policy, such as reproductive freedom and problems of
disclosure, discrimination, and access to care resulting from genetic testing
information, particularly in countries without a universal health care system.
These are important issues that individual investigators and the Society should
be concerned with. In particular, many of you may be aware of the widespread discussion of a proposal by the U.S. Office of Management and Budget to extend the
Freedom of Information Act access to raw data from any federally funded research
project. On behalf of the membership, the IGES Board submitted a letter to the OMB
outlining four concerns about the proposal: the protection of confidentiality for subjects of genetic research; the impact on willingness of subjects to participate in family studies; the impact on investigators willingness to participate in multi-center
studies, particularly our foreign colleagues; and the protection of intellectual property rights. This issue is still far from resolved. Last spring, the proposal was returned to Congress for reconsideration, and a revised proposal is currently open for
public comment. The Committee is also considering the implications of a recent
ruling requiring that family members whose medical histories are being reported on
by study subjects be treated as human subjects themselves, possibly requiring individual informed consent.
This is an exciting time for our field. Extraordinary advances in molecular biology, statistical methods, and computing power, together with the creation of data and
biological sample resources like the NCI Cooperative Family Registries, offer unique
opportunities for discovering genes and for characterizing their interactions with environmental factors. I think the time is now ripe to start doing some real Genetic
Epidemiology.
REFERENCES
Andrieu N. 1996. Etude des interation entre les facteurs genetique et les facteurs de la reproduction
dans letiologie du cancer du sein. Doctoral dissertation. Paris: lUniversite de Paris XI, Centre
dOrsay:116 plus 5 annexes.
Andrieu N, Clavel F, Auquier A, et al. 1993. Variations in the risk of breast cancer associated with a
family history of breast cancer according to age at onset and reproductive factors. J Clin Epidemiol
46:97380.
Andrieu N, Duffy S, Rohan T, et al. 1995. Familial risk, abortion and their interactive effect on the risk
of breast cancera combined analysis of six case-control studies. Br J Cancer 72:74451.
299
300
Thomas
Kraft P, Thomas DC. 2000. Bias and efficiency in family-matched gene association studies: conditional, prospective, retrospective, and joint likelihoods. Am J Hum Genet 63:111931.
Krailo M, Thomas D, Pike M. 1987. Fitting models of carcinogenesis to a case-control study of breast
cancer. J Chron Dis 40:181S9.
Miki Y, Swensen J, Shattuck-Eldens D, et al. 1994. A strong candidate for the breast and ovarian
cancer susceptibility gene BRCA1. Science 266:6671.
Moolgavkar S, Day N, Stevens R. 1980. Two-stage model for carcinogenesis: epidemiology of breast
cancer in females. J Natl Cancer Inst 63:55969.
Moolgavkar S, Knudson A Jr. 1981. Mutation and cancer: a model for human carcinogenesis. J Natl
Cancer Inst 66:103752.
Narod S, Risch H, Moslehi R, et al. 1998. Oral contraceptives and the risk of hereditary ovarian cancer.
N Engl J Med 339:4248.
Neel J. 1984. Editorial. Genet Epidemiol 1:56.
Pike M. 1987. Age-related factors in cancers of the breast, ovary, and endometrium. J Chron Dis
40:59S69.
Pike MC, Henderson BE, Casagrande JT, Rosario I, Gray GE. 1981. Oral contraceptive use and early
abortion as risk factors for breast cancer in young women. Br J Cancer 43:726.
Pike M, Krailo M, Henderson B, Casagrande J, Hoel D. 1983. Hormonal risk factors, breast tissue
age and the age-incidence of breast cancer. Nature 303:76770.
Rao D. 1985. Editorial comment. Genet Epidemiol 1:3.
Risch N. 1984. Segregation analysis incorporating linkage markers. I. Single-locus models with an
application to type I diabetes. Am J Hum Genet 36:36386.
Sattin R, Rubin G, Webster L, Huezo P. 1985. Family history and the risk of breast cancer. JAMA
253:190813.
Schaid D. 1999. Case-parents design for gene-environment interaction. Genet Epidemiol 16:26173.
Siegmund KD, Gauderman WJ, Thomas DC. 1999a. Gene characterization using high-risk families: a
sensitivity analysis of the MOD score approach (abstract 2251). Am J Hum Genet 65:A398.
Siegmund K, Whittemore A, Thomas D. 1999b. Multistage sampling for disease family registries. In:
Schaid D, Thomas D, Whittemore A, editors. Monogr Natl Cancer Inst 26:4348.
Struewing J, Hartge P, Wacholder S, et al. 1997. The risk of cancer associated with specific mutations
of BRCA1 and BRCA2 among Ashkenazi Jews. N Engl J Med 336:14018.
Swift M. 1994. Ionizing radiation, breast cancer, and ataxia-telangiectasia. J Natl Cancer Inst 86:15712.
Thomas D. 1999. Gene characterization studies: an overview. Monogr Natl Cancer Inst 26:1723.
Thomas D, Gauderman W. 2000. The role of interacting determinants in the localization of genes. In:
Rao D, editor. Genetic dissection of complex traits: challenges for the next millennium. New
York: Academic Press (in press).
Ursin G, Henderson B, Haile R, Zhou N, Diep A, Bernstein L. 1997. Is oral contraceptive use more
common in women with BRCA1/BRCA2 mutations than in other women with breast cancer?
Cancer Res 57:367881.
Wacholder S, Hartge P, Struewing J, et al. 1998. The kin cohort study for estimating penetrance. Am J
Epidemiol 48:62330.
Whittemore A, Gong G, Itnyre J. 1997. Prevalence and contribution of BRCA1 mutations in breast
cancer and ovarian cancer: results from three U.S. population-based case-control studies of ovarian cancer. Am J Hum Genet 60:496504.
Whittemore A, Halpern J. 1997. Multi-stage sampling in genetic epidemiology. Statistics Med 16:15367.
Wingo P, Ory H, Layde P, Lee N, Cancer and Steroid Hormone Group. 1988. The evaluation of the
data collection process for a multicenter, population-baed, case-control design. Am J Epidemiol
128:20617.
Witte JS, Gauderman WJ, Thomas DC. 1999. Bias and efficiency in case-control studies of candidate
genes and gene-environment interactions: basic family designs. Am J Epidemiol 148:693705.
Wooster R, Bignell G, Lancaster J, Swigt S. 1995. Identification of the breast cancer susceptibility
gene BRCA2. Nature 378:789 92.
Wu AH, Pike MC, Stram DO. 1999. Meta-analysis: dietary fat intake, serum estrogen levels, and the
risk of breast cancer. J Natl Cancer Inst 91:52934.
Zhao L, Hsu L, Davidov O, Potter J, Elston R, Prentice R. 1997. Population-based family study designs:
an interdisciplinary research framework for genetic epidemiology. Genet Epidemiol 14:36588.