You are on page 1of 6

JOURNAL OF VIROLOGY, Aug. 2000, p. 70797084 0022-538X/00/$04.00 0 Copyright 2000, American Society for Microbiology. All Rights Reserved.

Vol. 74, No. 15

A Hypothesis for DNA Viruses as the Origin of Eukaryotic Replication Proteins


LUIS P. VILLARREAL1*
AND

VICTOR R. DEFILIPPIS2

Departments of Molecular Biology and Biochemistry1 and Ecology and Evolutionary Biology,2 University of California, Irvine, California 92697
Received 16 December 1999/Accepted 1 May 2000

The eukaryotic replicative DNA polymerases are similar to those of large DNA viruses of eukaryotic and bacterial T4 phages but not to those of eubacteria. We develop and examine the hypothesis that DNA virus replication proteins gave rise to those of eukaryotes during evolution. We chose the DNA polymerase from phycodnavirus (which infects microalgae) as the basis of this analysis, as it represents a virus of a primitive eukaryote. We show that it has signicant similarity with replicative DNA polymerases of eukaryotes and certain of their large DNA viruses. Sequence alignment conrms this similarity and establishes the presence of highly conserved domains in the polymerase amino terminus. Subsequent reconstruction of a phylogenetic tree indicates that these algal viral DNA polymerases are near the root of the clade containing all eukaryotic DNA polymerase delta members but that this clade does not contain the polymerases of other DNA viruses. We consider arguments for the polarity of this relationship and present the hypothesis that the replication genes of DNA viruses gave rise to those of eukaryotes and not the reverse direction. Divergence of the bacterial and eukaryotic lineages appears to represent the deepest split in the tree of life (22). Because the DNA replication proteins of these groups are of fundamental importance and interact through complex mechanisms, it seems likely that the genome replication system, like the translational system, would contain the most conserved coevolved genes among all related lineages. Obvious functional homologues of replication genes are found in bacteria, eukaryotes, and archaea, including proteins involved in origin recognition, helicases, DNA-binding proteins, DNA synthesis, sliding clamp processivity factors (PCNA), ligation, and primer removal (see reference 7 and references therein). However, there are clear differences in sequence similarity that separate the replication proteins of bacteria from those of the archea and eukaryotes (7). The bacterial replication genes thus appear evolutionarily unrelated to those of eukaryotes and archaea. For example, the replicative DNA polymerase (Pol) III of Escherichia coli belongs to the family C DNA Pol group and does not have similarity to either of the two mammalian replicative DNA family B DNA Pols (alpha priming and delta extending; see reference 30). As such, phylogenetic analysis of these replicative DNA Pols results in polyphyletic groupings that are contrary to accepted species trees (6). Such wide existence of functionally identical yet nonorthologous genes presents a dilemma when they are being used for connecting the universal tree of life, and this has led some to propose that the cenancestor of bacteria, archea, and eukaryotes had an RNA genome (7, 17). However, it is now clear that between bacteria and eukaryotes, perhaps several hundred functional genes are homologous (e.g., DNA synthesis genes). This suggests that the putative prokaryotic-eukaryotic ancestor possessed many genes inherited by both lineages (for references, see references 14 and 8). Proper replicative transmission of such a large number of essential genes seems unlikely given the small size
* Corresponding author. Mailing address: Department of Molecular Biology and Biochemistry, 3205 Bio Sci II, University of California, Irvine, Irvine, CA 92697. Phone: (949) 824-6074. Fax: (949) 824-8551. E-mail: lpvillar@uci.edu. 7079

of RNA genomes and the error-prone nature of their replication (14). It therefore appears more likely that the common ancestor had a DNA genome, which leaves unexplained how the replication systems underwent the transition during the divergence of bacteria from archaea and eukaryotes. DNA viruses, however, also possess a full set of independent DNA replication and repair proteins that include members of family A and B DNA Pols (12). When rst sequenced, it was noteworthy how similar phage T4 DNA Pol was to DNA Pols alpha and delta of eukaryotes, Epstein-Barr virus, human cytomegalovirus, and other DNA viruses of eukaryotes, but not adenoviruses or E. coli Pol I or III (23). This similarity includes the conservation of ve of six sequential domains (31), as well as resistance to various family B-specic inhibitors (3). Other phage DNA Pols, however, such as T7, show similarity to bacterial DNA Pol I but not to Pols of eukaryotes. With the sequencing of the entire T4 genome, it was additionally surprising to see that this strictly lytic bacteriophage had more genes similar to those of eukaryotes (including genes for selfsplicing RNA [13]) than to bacterial genes (4). Viruses are usually thought to impose negative selection on their hosts. In addition, recombination between host and viral genomes is a commonly observed phenomenon, such as with retroviruses acquiring cellular protooncogenes (5, 28). Yet viruses are rarely considered a source of host genes, and hence viral sequences are not taken into account when reconstructing the tree of life. However, a viral genome can evolve up to a million time faster than that of its host. If a DNA virus could impose a stable persistent (or genomic) infection on its host, it might then also provide genes altering host evolution, as we have previously reasoned (29). This raises the question: Could a DNA virus have been the origin of replicative eukaryotic DNA Pols? In this report, we consider the hypothesis for the viral origin of eukaryotic replication proteins in the context of DNA viruses that infect host species which are likely representative of the earliest eukaryotes. We examine DNA Pols from two families of DNA viruses prevalent as acute infections of parasitic microalgae (Chlorella-like viruses) (27) and persistent infections of lamentous brown algae (Feldmania species virus) (9,

7080

VOL. 74, 2000

VIRAL ORIGIN OF EUKARYOTIC REPLICATION PROTEINS

7081

15, 16, 21, 27). These algal species represent some of the earliest eukaryotes for which clear archaeological data exist (11). We perform sequence similarity and phylogenetic analyses which indicate that these viral proteins appear related to the progenitor of all eukaryotic Pol delta sequences and consider arguments that a DNA virus may have been the origin of the eukaryotic DNA replication system.
MATERIALS AND METHODS The open reading frame that codes for the DNA Pol or Pol-like gene from Chlorella virus (NT2A; GenBank M86836; 913 amino acids [a.a.]) and Feldmania species virus (GenBank AF013260; 996 a.a.) were retrieved from GenBank. Using these sequences, a gapped Tblastn (version 2.0.4) analysis against the translated nonredundant database was performed. It was observed that essentially all of the replicative DNA family B Pols from eukaryotes showed similarity to both sequence probes. In addition, the DNA Pol sequences from most large DNA viruses of animals were also identied. Although the analysis suggests that all eukaryotic replicative DNA Pols (alpha and delta) are similar, the DNA Pol delta genes were most similar to these phycodnavirus-like genes. Interestingly, although Feldmania virus and Chlorella virus are both DNA viruses of algae, each of these DNA Pol sequences was more similar to a lower eukaryotic host DNA Pol gene (Schizosaccharomyces pombe, Candida albicans, Glycine max, or Saccharomyces cerevisiae) than to each other. In addition, the DNA Pols of several lytic phages (T4 and RB69) were identied. Also present were the DNA Pol II genes from various archaebacterial and bacterial (i.e., nonreplicative E. coli) species. Absent were the replicative DNA polymerases (Pol III) and Pol I from bacteria as well as the DNA Pols of other lytic phages (T7), adenoviruses, and related linear plasmids of fungi. Following the elimination of redundant and incomplete proteins, the remaining sequences were aligned using ClustalW to aid in identication of homologous regions. After this alignment, four regions (labeled I, II, III, and IV) of high conservation were easily identiable between most of the taxa and are shown listed in color patterns corresponding to similar amino acids and in biologically related groups (Fig. 1). As had previously been established, the family B polymerase sequences contain up to six specic domains (23, 31). We compared our conserved domains to those previously identied and determined that our regions II, III, and IV corresponded roughly to the respective regions II, III, and IV which were identied in DNA Pol alpha by Wang et al. and that our region I had been previously identied as the phosphonoacetic acid-resistant domain of herpes simplex virus type 1 DNA Pol in the study to T4 DNA Pol by Spicer et al. (23). Because there is large variation in length among these DNA Pol genes, the sequences are shown as a roughly proportional line drawing in which the locations of the four highly conserved domains are indicated, and the sequences were centered to the most highly conserved region II domain (Fig. 2). The two smallest sequences correspond to fragments of Micromonas pusilla virus and Chrysochromulina species virus (phycodnavirus). The next largest was the full gene (313 a.a.) for the Pol alpha of Endotrypanum (Leishmania) monterogeni, then the Helicoverpa armigera nuclear polyhedrosis virus DNA Pol (623 a.a.), and all other genes were complete sequences. The largest gene (encoding 1,855 a.a.) was the DNA Pol alpha of Plasmodium falciparum. In general, domains I and II are adjacent to each other and occur at variable positions from the amino terminus, although some Archaea species Pol II genes have a region I domain well displaced toward the amino terminus. With the exception of Halteria species DNA Pol alpha (ciliated hypotrichous), the order of the domains was conserved, although DNA Pol alpha genes of hyptrochous species were often lacking domains II and IV. In addition, the DNA Pol II of several archaea (lineage A) had domains III and IV displaced well towards the carboxy terminus. These highly conserved regions were then used to aid in the alignment of the remaining regions as follows. First, using the sequence editor GeneDoc version 2.5 (18), each taxon was examined to determine which if any of the four domains were present in the protein sequence. Next, these regions were used as anchors from which to optimize the alignment of amino acids in the intervening sections. These interregion sequences were extracted and aligned using ClustalW. Following this procedure, the alignments were again optimized by eye, focusing mostly on the similarity within each of the major clades. Once an overall alignment was obtained, a phylogenetic tree was constructed using the more conserved amino terminus of the protein sequence that included region I and amino acids thereafter. Phylogenetic analysis was performed using the neighbor-joining algorithm with 500 bootstrap replications (20) as implemented by PAUP version 4.0b2 (25). Pairwise distances were calculated as mean observed substitutions per site. The unrooted tree is shown in Fig. 3 and is color coded to mark clear clades.

RESULTS The results suggest that the relationships are robust: 68% of the nodes had 90% bootstrap frequency support, and all nodes were 50%. The unrooted tree shows DNA Pol sequences falling into seven clades that correspond to biologically coherent gene sets. The two largest clades correspond to variants of DNA Pol alpha (pink) and DNA Pol delta, respectively. In the DNA Pol delta clade (black), the Feldmania species virus (which causes a prevalent persistent infection of lamentous brown algae) DNA Pol is near the base (labeled pol delta) and the Chlorella-like viral Pol genes are slightly more derived. Other Pol delta proteins appear to correspond roughly with accepted evolutionary relationships. The topology of the DNA Pol alpha group is more complex. Near its root, the trypanosomes and Leishmania species branch rst, followed by insects and mammals, which, interestingly, are grouped separately from Saccharomyces and Schizosaccharomyces pombe. Also branching near the base of this clade are the macronuclear genes of various binucleated hypotrich species. There are three distinct clades of viral DNA Pols. Two of these correspond to the poxvirus family (light gray) and the baculoviruses of insects that includes the nucleopolyhedrosis virus family (green). Both of these groups branch from the most unresolved region at the center of the tree. The third clade corresponds to the animal herpesviruses (red). It is interesting that the herpesviruses appear to share an ancestor with the Feldmania DNA Pol, which corresponds to the base of the cellular DNA Pol delta clade. The herpesviruses are further branched into three monophyletic subgroups corresponding to the alphaherpes-, gammaherpes-, and cytomegaloviruses. The placement of the herpesvirus ancestor near the unresolved center of the tree suggests a very old origin of these genes. The remaining two groups include the replicative DNA Pol II genes from various archaea (methanogens and Thermococcus, Pyrococcus, and Sulfolobus species), which were known to be similar to family B DNA Pols (19). DNA Pol II of archaea species appears to exist as two distinct lineages, both of which are thought to be involved in genome replication (7, 26). The larger of these groups appear to share an ancestor with the DNA Pol alpha genes (blue). The smaller clade (gold) corresponds to DNA Pols found in Solfolobus and pyrodiococci archaea species. The archaeal DNA Pols on this smaller branch are closer but not directly connected to the Pol delta group. This cluster is rooted near the unresolved center of the tree. Also originating near the unresolved center are the Pols from lytic phages T4 and RB69 and from E. coli DNA Pol II (nonessential Pol). DISCUSSION With sequences obtained from a similarity search using DNA Pols from DNA viruses that infect microalgae and lamentous brown algae as a probe, we generated a phylogeny in which the base of the monophyletic group containing the replicative DNA Pol delta of eukaryotes resembles viral sequences. Although an earlier analysis of DNA Pol genes gave

FIG. 1. Amino acid alignment of four highly conserved DNA Pol protein regions. Taxon names are color coded according to clade as in Fig. 3 and are labeled A0 to L5 according to the branch tips therein. Gaps inserted to improve the alignment are indicated by a dash (). Amino acids are color coded according to side group properties using the following scheme: red, negatively charged (D or E); orange, positively charged (H, K, or R); light green, amide (N or Q); blue, alcohol (S or T); purple, aliphatic (L, I, or V); gray, aromatic (F, Y, or W); brown, small (A or G); dark green, sulfur-containing (M or C); white, proline (P). Abbreviations: Hu, human; VZV, varicella-zoster virus; HSV, herpes simplex virus; cytomeg., cytomegalovirus; HHV, human herpesvirus.

7082

VILLARREAL AND DEFILIPPIS

J. VIROL.

FIG. 2. Protein map indicating proportional lengths of DNA Pol (black lines) and relative locations of the four conserved Pol protein domains (labeled I to IV). Proteins are mostly centered so that region II is aligned.

rise to similar patterns, the authors did not attempt to explain this result (6). Since it is unrooted, the phylogeny does not directly establish the polarity or direction of evolutionary change. It therefore remains formally possible that the phycodnaviruses acquired DNA Pol genes from their algal hosts and maintained similarity to them for unknown reasons. As the algal host DNA Pol genes have not been sequenced, we cannot place them on this tree. Even if they were subsequently to be placed phylogenetically near the phycodnavirus genes, this would still be unlikely to resolve the issue of evolutionary direction. However, we believe several considerations argue that the direction of transmission was from virus to host. First, only under this circumstance could the dilemma of dissimilar replication genes now present in bacteria and eukaryotes be resolved. In addition, all the other viral DNA Pols examined form distinct monophyletic groups (i.e., herpesviruses, poxviruses, and baculoviruses) that do not include host Pols. Therefore, these other viruses did not appear to acquire their Pol genes from a host species. The DNA delta clade is clearly monophyletic yet includes all the diverse phycodnavirus Pols of both microalgal and lamentous algal hosts. Thus, the phycod-

naviruses are clearly evolutionarily exceptional DNA viruses. The simplest way to account for these observations is to propose that host Pol delta genes are derived from an early DNA viral gene that resembles that present in Feldmania virus. Trees of life have been generated using different genes, yielding multiple evolutionary histories (8). Phylogenetic analysis of DNA Pol sequences presents patterns inconsistent with accepted organismal phylogenies. These phylogenetic disparities are difcult to explain if most genetic variation during evolution of species occurs by random genetic change and vertical gene transmission. Genomic analysis has suggested that horizontal transfer of gene sets may have been more prevalent then previously believed, especially in bacterial species. Horizontal transmission of DNA replication genes, however, would suggest the transfer of fundamental, complex, cellular components and the involvement of a DNA virus. We have argued that the persistence of a genetic parasite (a virus or its defective derivatives) is a life strategy that can allow the superimposition of complex molecular genetic control systems onto its host (29). As such, a persistent agent (like Feldmania virus) can potentially provide new systems of genetic control,

VOL. 74, 2000

VIRAL ORIGIN OF EUKARYOTIC REPLICATION PROTEINS

7083

FIG. 3. Unrooted neighbor-joining phylogeny based on amino-terminal portion of DNA Pol protein sequences as discussed in the text. Labels at branch tips represent taxa as presented in Fig. 1. Numbers at branch nodes indicate percent bootstrap support for that node based on 500 replications.

including genome replication, to its host, particularly if it is integrated into the genome. We suggest at least in the case of DNA Pol delta an evolutionary link of the bacteria and eukaryota (and archaea) via the DNA Pol of an ancient DNA virus, not the replicative host genes. Our analysis also suggests that DNA Pol alpha may share an ancestor with DNA Pol II of archaea that diverged after the initial divergence of bacteria from eukaryotes and archaea. Two other DNA Pols resemble the family B replicative Pols of eukaryotes and archaea. One is the nonessential Pol II of E. coli, and the other is the Pol from lytic phages T4 and RB69. Both branch from the largely unresolved center of the tree. As the phages represent a much more transmissible system than E. coli Pol II, and as T-like phages infect both bacteria and archaea (Euryachaeota kingdom [32]), it is easier to envision substitution of functional homologues for DNA replication genes if such a virus was involved. Other DNA replication genes may also t this pattern, since it is known that DNA viruses also code for various ligases, helicases, and PCNA-like genes as well as repair-like DNA Pols, such as DNA Pol beta, found in entomopoxvirus (1). Many of the crucial regulatory genes of DNA viruses, such as the T antigens of polyomaviruses, have no known host analogues, even though these viruses are phylogenetically con-

gruent with their host species over long periods of time (29). Thus, at least for these regulatory genes, they are viral, not host, creations. Viral genomes can evolve much faster than host genomes, and populations are known to exhibit much greater genetic variability, as demonstrated by the frequent occurrence of mutants and defectives. Thus, viral systems have an enhanced capacity to produce genetic novelty. Although some examples of virus-mediated horizontal gene transfer have recently been proposed (2), in most of these proposals it is suggested that the host, not the virus, is the original source of the transferred gene. We now suggest that such infectious and/or persisting agents may be a general source for acquisition of complex molecular systems and phenotypes.
ACKNOWLEDGMENT This research was supported by the Irvine Research Unit in Animal Virology.
REFERENCES 1. Afonso, C. L., E. R. Tulman, Z. Lu, E. Oma, G. F. Kutish, and D. L. Rock. 1999. The genome of Melanoplus sanguinipes entomopoxvirus. J. Virol. 73: 533552. 2. Baldo, A. M., and M. A. McClure. 1999. Evolution and horizontal transfer of dUTPase-encoding genes in viruses and their hosts. J. Virol. 73:77107721. 3. Bernad, A., A. Zaballos, M. Salas, and L. Blanco. 1987. Structural and

7084

VILLARREAL AND DEFILIPPIS

J. VIROL.
19. Pisani, F. M., C. De Martino, and M. Rossi. 1992. A DNA polymerase from the archaeon Sulfolobus solfataricus shows sequence similarity to family B DNA polymerases. Nucleic Acids Res. 20:27112716. 20. Saitou, N., and M. Nei. 1987. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4:406425. 21. Sengco, M. R., M. Braeutigam, M. Kapp, and D. G. Mueller. 1996. Detection of virus DNA in Ectocarpus siliculosus and E. fasciculatus (Phaeophyceae) from various geographic areas. Eur. J. Phycol. 31:7378. 22. Sogin, M. L., J. H. Gunderson, H. J. Elwood, R. A. Alonso, and D. A. Peattie. 1989. Phylogenetic meaning of the kingdom concept: an unusual ribosomal RNA from Giardia lamblia. Science 243:7577. 23. Spicer, E. K., J. Rush, C. Fung, L. J. Reha-Krantz, J. D. Karam, and W. H. Konigsberg. 1988. Primary structure of T4 DNA polymerase. Evolutionary relatedness to eucaryotic and other procaryotic DNA polymerases. J. Biol. Chem. 263:74787486. 24. Staskawicz, B. J., F. M. Ausubel, B. J. Baker, J. G. Ellis, and J. D. Jones. 1995. Molecular genetics of plant disease resistance. Science 268:661667. 25. Swofford, D. L. 1993. PAUP: a computer program for phylogenetic inference using maximum parsimony. J. Gen. Physiol. 102:9A. 26. Uemori, T., Y. Ishino, H. Doi, and I. Kato. 1995. The hyperthermophilic archaeon Pyrodictium occultum has two alpha-like DNA polymerases. J. Bacteriol. 177:21642177. 27. Van Etten, J. L. 1994. Algal viruses, p. 3540. In R. G. Webster and A. Granoff (ed.), Encyclopedia of virology, vol. 1. Academic Press, Inc., San Diego, Calif. 28. Varmus, H. E. 1984. The molecular genetics of cellular oncogenes. Annu. Rev. Genet. 18:553612. 29. Villarreal, L. P. 1999. DNA virus contribution to host evolution, p. 391420. In E. Domingo, R. G. Webster, and J. J. Holland (ed.), Origin and evolution of viruses. Academic Press, San Diego, Calif. 30. Wang, C. C., L. S. Yeh, and J. D. Karam. 1995. Modular organization of T4 DNA polymerase: evidence from phylogenetics. J. Biol. Chem. 270:26558 26564. 31. Wang, T. S., S. W. Wong, and D. Korn. 1989. Human DNA polymerase alpha: predicted functional domains and relationships with viral DNA polymerases. FASEB J. 3:1421. 32. Zillig, W., D. Prangishvilli, C. Schleper, M. Elferink, I. Holz, S. Albers, D. Janekovic, and D. Gotz. 1996. Viruses, plasmids and other genetic elements of thermophilic and hyperthermophilic Archaea. FEMS Microbiol. Rev. 18:225236.

4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16.

17. 18.

functional relationships between prokaryotic and eukaryotic DNA polymerases. EMBO J. 6:42194225. Bernstein, H., and C. Bernstein. 1989. Bacteriophage T4 genetic homologies with bacteria and eucaryotes. J. Bacteriol. 171:22652270. Bishop, J. M. 1983. Cellular oncogenes and retroviruses. Annu. Rev. Biochem. 52:301354. Braithwaite, D. K., and J. Ito. 1993. Compilation, alignment, and phylogenetic relationships of DNA polymerases. Nucleic Acids Res. 21:787802. Edgell, D. R., H. P. Klenk, and W. F. Doolittle. 1997. Gene duplications in evolution of archaeal family B DNA polymerases. J. Bacteriol. 179:26322640. Forterre, P. 1999. Displacement of cellular proteins by functional analogues from plasmids or viruses could explain puzzling phylogenies of many DNA informational proteins. Mol. Microbiol. 33:457465. Goldbach, R., and P. De Haan. 1994. RNA viral supergroups and the evolution of RNA viruses, p. 105119. In S. S. Morse (ed.), The evolutionary biology of viruses. Raven Press, Ltd., New York, N.Y. Kapp, M. 1998. Viruses infecting marine brown algae. Virus Genes 16:111 117. Knoll, A. H. 1992. The early evolution of eukaryotes: a geological perspective. Science 256:622627. Knopf, C. W. 1998. Evolution of viral DNA-dependent DNA polymerases. Virus Genes 16:4758. Kutter, E., K. Gachechiladze, A. Poglazov, E. Marusich, M. Shneider, P. Aronsson, A. Napuli, D. Porter, and V. Mesyanzhinov. 1995. Evolution of T4-related phages. Virus Genes 11:285297. Lake, J. A., R. Jain, and M. C. Rivera. 1999. Mix and match in the tree of life. Science 283:20272028. Mueller, D. G., M. Braeutigam, and R. Knippers. 1996. Virus infection and persistence of foreign DNA in the marine brown alga Feldmannia simplex (Ectocarpales, Phaeophyceae). Phycologia 35:6163. Muller, D. G., M. Sengco, S. Wolf, M. Brautigam, C. E. Schmid, M. Kapp, and R. Knippers. 1996. Comparison of two DNA viruses infecting the marine brown algae Ectocarpus siliculosus and E. fasciculatus. J. Gen. Virol. 77:23292333. Mushegian, A. R., and E. V. Koonin. 1996. A minimal gene set for cellular life derived by comparison of complete bacterial genomes. Proc. Natl. Acad. Sci. USA 93:1026810273. Nicholas, K. B., H. B. Nicholas Jr., and D. W. Deereld II. 1997. GeneDoc: analysis and visualization of genetic variation. EMBNEW.NEWS 4:14. (http://www.cris.com/ ketchup/genedoc.shtml) annotating multiple sequence alignments, version 2.5.

You might also like