Genetics, Vol. 148, 1461-1473, April 1998, Copyright © 1998
A Species Barrier Between Bacteriophages T2 and T4: Exclusion, Join-Copy and Join-Cut-Copy Recombination and Mutagenesis in the dCTPase Genes
Todd P. Garya,
Nancy E. Colowicka, and
Gisela Mosiga
a Department of Molecular Biology, Vanderbilt University, Nashville, Tennessee 37235
Corresponding author:
Gisela Mosig, Department of Molecular Biology, Vanderbilt University, Box 1820 Sta. B, Nashville, TN 37235, mosigg{at}ctrvax.vanderbilt.edu (E-mail).
 | ABSTRACT |
|---|
Bacteriophage T2 alleles are excluded in crosses between T2 and T4 because of genetic isolation between these two virus species. The severity of exclusion varies in different genes, with gene 56, encoding an essential dCT(D)Pase/dUT(D)Pase of these phages, being most strongly affected. To investigate reasons for such strong exclusion, we have (1) sequenced the T2 gene 56 and an adjacent region, (2) compared the sequence with the corresponding T4 DNA, (3) constructed chimeric phages in which T2 and T4 sequences of this region are recombined, and (4) tested complementation, recombination, and exclusion with gene 56 cloned in a plasmid and in the chimeric phages in Escherichia coli CR63, in which growth of wild-type T2 is not restricted by T4. Our results argue against a role of the dCTPase protein in this exclusion and implicate instead DNA sequence differences as major contributors to the apparent species barrier. This sequence divergence exhibits a remarkable pattern: a major heterologous sequence counterclockwise from gene 56 (and downstream of the gene 56 transcripts) replaces in T2 DNA the T4 gene 69. Gene 56 base sequences bordering this substituted region are significantly different, whereas sequences of the dam genes, adjacent in the clockwise direction, are similar in T2 and in T4. The gene 56 sequence differences can best be explained by multiple compensating frameshifts and base substitutions, which result in T2 and T4 dCTPases whose amino acid sequences and functions remain similar. Based on these findings we propose a model for the evolution of multiple sequence differences concomitant with the substitution of an adjacent gene by foreign DNA: invasion by the single-stranded segments of foreign DNA, nucleated from a short DNA sequence that was complementary by chance, has triggered recombination-dependent replication by "join-copy" and "join-cut-copy" pathways that are known to operate in the T-even phages and are implicated in other organisms as well. This invasion, accompanied by heteroduplex formation between partially similar sequences, and perhaps subsequent partial heteroduplex repair, simultaneously substituted T4 gene 69 for foreign sequences and scrambled the sequence of the dCTPase gene 56. We suggest that similar mechanisms can mobilize DNA segments for horizontal transfer without necessarily requiring transposase or site-specific recombination functions.
ACCUMULATION and fixation of multiple mutations resulting in sequence divergence can ultimately lead to the emergence of new species. Inhibition of recombination because of sequence divergence is a major factor contributing to genetic isolation, that is, species barriers (RADMAN 1991
). SOS-inducible recombination and repair proteins play important opposing roles in such genetic isolation (VULIC et al. 1997
; ZAHRT and MALOY 1997
) both in bacteria and in eukaryotes (DE WIND et al. 1995
; FOSTER et al. 1996
; HARRIS et al. 1996
; DATTA et al. 1997
, and references therein). The accumulation of multiple mutations generating sufficient divergence to establish species barriers is less well understood. It is thought to occur during rare physiological conditions of stress (ROSENBERG 1994
; ARBER 1995
; DE WIND et al. 1995
; ROSENBERG et al. 1995
, HUNTER et al. 1996
; LECLERC et al. 1996
; MATIC et al. 1997
; TORKELSON et al. 1997
, and references therein).
Mutagenesis studies in phage T4, reviewed in detail by DRAKE and RIPLEY 1994
, have elucidated multiple substrates, pathways, and enzymes that play important roles in generating the raw material for evolutionary changes. It is now obvious that even mutations belonging to a given class, for example, frameshift mutations, large deletions, or duplications, can be generated by more than one pathway. T4 DNA polymerase and its accessory proteins play important roles in the fidelity of all these processes. Potential contributions of recombination to T4 mutagenesis are more ambiguous.
Exquisite and insightful sequence analyses of mutations in genes e (lysozyme), rII, ac, and others have shown that many frameshift mutations, base substitutions, deletions, or duplications can be explained by STREISINGER's slipped mispairing model (STREISINGER et al. 1966
) and its variations (DRAKE and RIPLEY 1994
; WANG and RIPLEY 1998
, accompanying article). Slipped mispairing that is potentially mutagenic can occur between simple or complicated repeats (PRIBNOW et al. 1981
), in imperfect palindromes, or between nearly homologous sequences located at considerable distances, that is, so-called ectopic sequences (RIPLEY 1990
). Moreover, topoisomerase-dependent staggered cuts, with associated synthesis or degradation, can lead to additions or deletions of a few base pairs (RIPLEY et al. 1988
; RIPLEY 1990
; BROWN et al. 1993
). Nevertheless, certain frameshift mutations are still unexplained. DRAKE and RIPLEY 1994
have discussed the possibility that pairing of T4 DNA with ectopic foreign DNA sequences can generate mutations. This possibility is attractive because there is ever-increasing evidence for transfer of genes or gene segments between phages, resident plasmids, and host genomes, which may contain complete or defective prophages (HERSHEY 1962; CAMPBELL 1988
; BRUSSOW and BRUTTIN 1995
; HILL et al. 1995
; BROWN et al. 1996
; KUTTER et al. 1996
; WALDOR and MEKALANOS 1996
; CALENDAR et al. 1998
; NELSON et al. 1997
; TYNDALL et al. 1997
; WALDOR et al. 1997
). The evolution of tail fiber genes and the transfer of mobile intron DNA are particularly good examples of such exchanges in T4 (CLYMAN et al. 1994
; HENNING and HASHEMOL-HOSSEINI 1994
; TETART et al. 1996
). REPOILA et al. 1994
and KUTTER et al. 1996
have summarized and discussed the similarities and differences of the many T4-related phages and the apparent horizontal transfer of genes or gene segments to generate their present-day mosaic genomes.
There are now numerous examples demonstrating that moderate sequence divergence poses barriers against genetic exchanges (RADMAN 1991
). KUTTER et al. 1996
point out that "current concepts of homologous recombination cannot account for the formation of such chimeric genes, and the recombination mechanisms responsible are not known."
Here we have investigated the interrelation of recombination, mutagenesis, and exclusion in the genetic region encompassing gene 56 of the two closely related bacteriophage species, T2 and T4. Because this region has been reported to be important for exclusion of phage T2 by T4 (STREISINGER and WEIGLE 1956
; RUSSELL and HUSKEY 1974
; OKKER 1981
; OKKER et al. 1981
), we suspected that such experiments might yield insights into how species barriers evolve and how they are maintained.
Gene 56 codes for a dCTPase, dCDPase/dUTPase, dUDPase (called dCTPase hereafter), an enzyme essential for normal development of all T-even phages (KUTTER and WIBERG 1969
). These phages replace deoxycytosine with deoxy-hydroxymethyl-cytosine in their DNA and then glucosylate these residues to various extents. These modifications protect T-even DNA from restriction enzymes that would degrade cytosine-containing DNA (for review see CARLSON et al. 1994
). Gp 56 prevents the incorporation of dC into phage DNA by hydrolyzing dCTP and dCDP; its associated dUTPase activity also prevents incorporation of U instead of T. It has been postulated that T4 dCTPase is responsible for the apparent restriction or exclusion of T2 development by co-infecting T4 phage (OKKER 1981
; OKKER et al. 1981
).
Electron microscopy of heteroduplexes has revealed a major heterology between T2 and T4 DNA in or near gene 56 (KIM and DAVIDSON 1974
; YEE and MARSH 1981
). Our comparison of the T2 gene 56 and its adjacent regions with T4 DNA indicates that the major heterology is because of substitution of T4 gene 69 for a completely different sequence in T2. Sequences bordering this heterologous region are partially diverged. They contain, among other features, evolutionary evidence for multiple base substitutions and frameshifts suggestive of slipped mispairings during recombination. We propose a model to explain this unusual pattern based on mispairing of ectopic sequences (STREISINGER et al. 1966
) and recombination-dependent initiation of DNA replication (LUDER and MOSIG 1982
; MOSIG et al. 1991
; MOSIG 1994
), combined with a strong selection for a functional dCTPase.
To distinguish potential effects of differences between the proteins from those of differences between the base sequences of their genes on present-day exclusion of T2 by T4, we have constructed and tested chimeric T4/T2 derivatives of gene 56. We found that at least two mechanisms contribute to the apparent exclusion of T2 by T4, one of which depends on yet unknown host functions. Results of our experiments probing the host-independent mechanism, seen in CR63 bacteria, are consistent with the idea that sequence divergence now generates genetic barriers to recombination. Other results argue against a role of the T4 dCTPase protein in either the host-dependent or the host-independent exclusion of phage T2 by T4.
 | MATERIALS AND METHODS |
|---|
Bacteriophages are listed in Table 1. The T2 gene 56am mutation was obtained from S. HATTMAN in a T2 double mutant
gt-56am (REVEL et al. 1965) that we crossed with wild-type T2 to isolate the single T2 56am.
Bacteria: E. coli B strains B, B/2, B/4, and S/6 (restrictive for am mutants) and the K strain CR63 (supD) (permissive for am mutants) were initially obtained from A. H. DOERMANN, then at Vanderbilt University, and have been maintained by us since 1965. B/2 is resistant to T2 and B/4 is resistant to T4. The K strain UT481 (supD), permissive for am mutants, was constructed and kindly sent by CYNTHIA LARK, University of Utah, in 1985. The K strain M5219 (
N +, cI857, supo) and the plasmid vector pPLc2833 (REMAUT et al. 1981
) were kindly supplied by WALTER FIERS, University of Ghent, Belgium. The expression plasmid pLAM71* was constructed by MICHAEL TRUPIN in our lab by standard methods (SAMBROOK et al. 1989
) by inserting a 1.9-kb T4 fragment with in vitro attached BamHI linkers into the BamHI site downstream of the lambda PL promoter of plasmid pPLc2833. During or after construction of pLAM71*, T4 gene 69 acquired two mutations: one of five as preceding position 970 (Figure 1B) was deleted, and the C at position 1515 (between two T runs) was changed to T. Thus, pLAM71* contains the wild type T4 gene 56, gene 69 with a frameshift mutation and parts of dam and soc. It was used to transform E. coli M5219.

View larger version (72K):
In this window
In a new window
Download PPT slide
|
Figure 1.
Base sequences and predicted amino acid sequences of deduced proteins in the region between the middle promoter of gene 56 and the late gene soc. (A) T2. (B) T4. (C) A map comparing the gene arrangements in T2 and T4. The -10 regions of the middle promoters upstream of gene 56 are doubly underlined. The late promoter upstream of soc is boxed. The two possible T2 ORFs between genes 56 and soc are called soc.1 and soc.2. The T2 56am mutation at position 137 of the T2 sequence, the T4 amE51 mutation at position 512, the tsA90 mutation at position 614 and the amC153 mutation at position 817 of the T4 sequence are underlined. The numbering of the T4 sequence corresponds to the numbers in our Genbank submission with accession number 30001.
|
|
Phage crosses were done in bacteria in H-broth at 37°, using equal multiplicities of 3 to 6 of each of two parents, and burst sizes and recombinant frequencies were determined as described (MOSIG et al. 1977
).
Thermocycle DNA sequencing was done without prior amplification as described (MOSIG and COLOWICK 1995
), using DNA released from phage and denatured by heating to 95° as template and oligonucleotides end-labeled with [32P]- or [33P]
-ATP as primers. DNA Inspector and Gene Inspector programs (Textco, West Lebanon, NH) were used to analyze and align DNA sequences. National Center for Biotechnology Information databases were searched with Basic Local Alignment Study Tool and Position Specific Iterated-Basic Local Alignment Study Tool (ALTSCHUL et al. 1997
).
 | RESULTS |
|---|
The gene 56 DNA sequences are partially different, and sequences downstream of it are completely different, in T2 and T4:
We determined phage T2 DNA sequences directly from virion DNA without prior amplification (MOSIG and COLOWICK 1995
). This sequence (Figure 1A) is compared with the corresponding T4 sequence and the deduced open reading frames (ORFs) starting at the -10 region of a middle promoter at position 16.819 of the T4 genome (MACDONALD and MOSIG 1984
; KUTTER et al. 1994
; Figure 1B). Two T4 am mutations, and one T2 am mutation that we sequenced, confirm the reading frames of gene 56 in T4 and in T2. A T4 ts mutation (A90) affects a conserved amino acid of gp56.
Gene 69 of T4, located between genes 56 (dCTPase) and soc (small outer capsid protein), appears to be replaced by two completely different short ORFs (T2 soc.1 and T2 soc.2) in T2 DNA. Genes 56 and soc are partially homologous in the two phages (Figure 2). The deduced amino acid sequences of T4 and T2 dCTPase are 66% identical (Figure 3), and a BLAST search (ALTSCHUL et al. 1997
) shows them to be the closest relatives in the nonredundant GenBank database. Closer inspection of Figure 2 shows that the T2 and T4 sequences are almost identical between positions 1 and 91 (numbering of the T2 sequence), but they diverge considerably beyond this position in a remarkable pattern. All bases apparently added in T2 (relative to T4) are matched by equal numbers deleted bases nearby, as in compensating frameshift mutations, except for a triplet that adds an entire codon in T2. In four internal segments of gene 56 (positions 140160, 163185, 319349, and 491519) T2 and T4 sequences are identical for at least 20 bp, the length of perfect homology required in yeast to initiate heteroduplex formation (DATTA et al. 1997
), but less than 50 bp, the apparent size limit for efficient T4 recombination (GOLDBERG 1966
; BAUTZ and BAUTZ 1967
; DRAKE 1967
; SINGER et al. 1982
; SINGER 1988
). After the stop codon of gene 56 (position 559) the sequences diverge completely. They converge again at position 948, the first base of the late promoter of soc (MACDONALD et al. 1984
). There are several differences within soc, but fewer than within gene 56: the addition of an A at position 962, upstream of the coding sequence; the addition of an A at position 1019 with a compensating deletion of an A at position 1027; and the deletion of GAA at position 1038 in T2 as compared with T4.

View larger version (56K):
In this window
In a new window
Download PPT slide
|
Figure 2.
Alignments of the DNA sequences of T2 (upper) and T4 (lower) in genes 56 and soc that are partially homologous in T2 and T4 DNA. The completely heterologous region between these two genes is omitted and indicated by two asterisks. Mismatches are printed red; unmatched bases are indicated by dashes. In the chimeric phage T2 (T4 56+69 -) the sequence up to position 160 is replaced by T4 sequence.
|
|

View larger version (32K):
In this window
In a new window
Download PPT slide
|
Figure 3.
Alignments of the amino acid sequences of T2 (upper) and T4 (lower) dCTPases. Amino acids that differ are printed in red; an additional amino acid in T2 is matched with a dash in T4.
|
|
A cloned T4 gene 56 complements a T2 gene 56 am mutant without excluding it:
Several independent investigators have reported that T4 phage exclude T2 and that different T2 genes are differentially excluded (STREISINGER and WEIGLE 1956
; RUSSELL and HUSKEY 1974
; OKKER et al. 1981
). The distinction between T2 and T4 is based on their differential adsorption to B/2 and B/4 bacteria, which are resistant to T2 and T4, respectively. These differences reflect differences in the tail fiber genes and in the molecules used as receptors in the cell wall of different bacterial strains (HENNING and HASHEMOL-HOSSEINI 1994
). Differential exclusion became apparent when T4 am mutants in different genes were crossed with T2 containing the corresponding wild-type allele, and the progeny phage were plated on B/2 or B/4 bacteria to distinguish T2 and T4. In this situation there appeared to be fewer T2 than T4 progeny. In general, the apparent exclusion was stronger when the am mutations were located in the clockwise direction from the tail fiber genes. This locus-related exclusion has been explained by assuming that splice recombination yields lethal combinations of certain genes whose products interact (RUSSELL and HUSKEY 1974
). However, the correlations are not strict. A few alleles, foremost among them those in gene 56, appeared to be excluded even from patch recombinants. To explain this effect, OKKER et al. 1981
have proposed that the dCTPases of T2 and T4 are incompatible and that this incompatibility is responsible for the exclusion.
To test whether the T4 and T2 dCTPase are functionally equivalent, we cloned the wild-type T4 gene 56 in an expression vector (REMAUT et al. 1981
) under control of a temperature-sensitive phage lambda cI repressor to give plasmid pLAM71*, and we asked whether it complements a T2 56am mutant. In this vector, T4 gene 56 is repressed at 28°, and it can be induced by raising the temperature, usually to 42°. Overproduction of gp56 after induction (GARY 1992
, data not shown) required co-expression of the lambda N (antiterminator) gene, as expected from the regulation of gene 56 in T4-infected cells (LINDER and SKOLD 1980
). The middle T4 promoter upstream of gene 56 is inactive in the plasmid-bearing bacteria. Transcripts that are initiated further upstream, either from an early T4 promoter (MACDONALD and MOSIG 1984
) or from the lambda promoter in pLAM71*, are apparently subject to rho-dependent transcription termination (LINDER and SKOLD 1980
) and require antitermination factors or RNA-stabilizing factors (MOSIG and HALL 1994
; STITT and HINTON 1994
) to yield sufficient gp56 for phage growth.
The cloned T4 gene 56 produces dCTPase that is active after purification to homogeneity (GARY 1992
). It synthesizes sufficient gp56 at 30° to partially complement and allow plaque formation of the T4 gene 56am mutants E51 and C153. Importantly, it complements the T2 gene 56am mutant as well (Table 2), indicating that the dCTPases of the two phages are largely compatible with each other. This inference was confirmed by swapping the promoter-proximal and promoter-distal sequences of T2 and T4 gene 56 by recombination and showing that the chimeric gene product is functional in vivo. From the progeny of pLAM71* plasmid-bearing bacteria that were infected with the T2 gene 56am mutant we isolated several T2 am+ recombinants. Sequencing of the DNA revealed that in one of the recombinants, T2(T4 56+-69 -), there was an exchange within the gene 56 sequence at or before position 160 of the T2 sequence, generating a chimeric dCTPase gene, without substituting T4 gene 69 for the T2 sequence (see Figure 2 legend). In the other recombinant, T2(T456+-69+), T4 genes 56 and 69 have replaced the T2 sequences. [Note that no other T4 sequences, except for some dam sequences and three codons of soc, could have recombined because they were not present in the plasmid. The T2 (T456+ -69+) recombinant contained the two gene 69 mutations of pLAM71* described in MATERIALS AND METHODS, and it had acquired an additional A in the A run of an untranslated region at positions 16471649 (Figure 1B). All three mutations are consistent with Streisinger's model for frameshift mutations.] Both recombinants gave the same plating efficiencies as wild-type T2. We conclude that incompatibility of dCTPases is not the reason for exclusion of T2 by T4 phages.
View this table:
In this window
In a new window
|
Table 2.
Burst sizes of T2 and T4 phages from E. coli, with or without a cloned T4 gene 56, at 30°
|
|
Exclusion of wild-type T2 by wild-type T4 depends on the host bacteria:
During the course of our experiments we found unexpectedly that wild-type T2 is not excluded by wild-type T4 in E. coli CR63, a K strain. In agreement with previous reports by other investigators, wild-type T2 is excluded by wild-type T4 in E. coli B, which is restrictive for T4 am mutants, as well as in the am suppressing B su1 and the K strains UT481 and M 5219, that is, regardless of the presence of amber suppressors and regardless of whether they are B or K strains (Table 3, and data not shown). We surmise that two separate mechanisms contribute to the apparent exclusion of T2 sequences. Hereafter we distinguish them as "restriction" and "exclusion." We suspect that the first mechanism, restriction, depends on host enzymes that probably cooperate with T4 functions. This aspect requires further investigation, which is ongoing. Nevertheless, the lack of restriction in CR63 allowed us to investigate the second mechanism, allele-specific exclusion, without the complications due to restriction.
View this table:
In this window
In a new window
|
Table 3.
Percentage of T2 progeny produced in different E. coli strains after coinfection with wild-type T2 and T4
|
|
Sequence divergence excludes T2 gene 56 alleles from the progeny of crosses with T4:
Recombination frequencies decline rapidly with decreasing numbers of identical base pairs below a certain minimal length. In T4 this minimal length is approximately 50 bp (GOLDBERG 1966
; BAUTZ and BAUTZ 1967
; DRAKE 1967
; SINGER et al. 1982
). Figure 2 shows clearly that there is no segment of 50 identical base pairs in T2 and T4 DNA bracketing a T4 gene 56 mutation. Therefore, we suspected, as proposed as one possible explanation by RUSSELL and HUSKEY 1974
, that exclusion, at least in part, is because of low probabilities of recombination between T4 and T2 sequences in the genetic segment carrying the gene 56am mutation. The recombination hypothesis predicts that improving the degree of homology should enhance recombination and reduce exclusion. We tested this prediction by crossing T4 gene 56 am mutants with three different strains: (1) wild-type T2; (2) the T2(T4 56+69-) hybrid, which has more homology than wild-type T2 with sequences upstream of, but not adjacent to, the amber mutations; and (3) the T2(T4 56+69+) hybrid, which has perfect homology around the amber mutations except for the mutated sites. These crosses were done in CR63 bacteria (to eliminate the contribution of host-dependent restriction). In support of our hypothesis, the proportions of T4 am+ progeny (which must have recombined with T2) were 100 times higher in the third set of crosses, in which homology was restored, than in the two former sets (Table 4). In fact, the proportions of T4 am+ progeny in the third set of crosses were similar to those in crosses with other genes in this area of the genome (RUSSELL and HUSKEY 1974
). The overall proportion of viable am+ recombinants in crosses of T4 am mutants with wild-type T2 from any gene in this area is low, a finding that has been attributed to the inviability of splice recombinants in which incompatible T2 and T4 genes were joined to give lethal combinations (RUSSELL and HUSKEY 1974
). We consider this a satisfactory explanation.
View this table:
In this window
In a new window
|
Table 4.
Proportion of T4 am+ progeny in crosses of T4 gene 56am mutants with wild-type or chimeric T2
|
|
 | DISCUSSION |
|---|
Our results can be summarized as follows:
- A heterologous DNA segment differing in length (KIM and DAVIDSON 1974
) and sequence (our results) in T2 and T4 is bracketed by two genes that code for proteins (dCTPase and gp soc) that are similar in the two phages but show considerably more sequence divergence than the dam gene immediately upstream. In T4 the heterologous DNA segment encodes gene product 69, which has some resemblance to MobD, a member of a putative mobile endonuclease family of T4 (E. KUTTER, personal communication). In T2 the heterologous segment contains two short possible ORFs (T2 soc.1 and T2 soc.2, Figure 1) with no significant similarities with other proteins in the nonredundant Genbank database. We have not yet tested the expression of the two T2 ORFs.
- No obvious significant inverted and direct repeats, such as those that exist at or near ends of most transposable elements, bracket the heterologous regions. However, there is a complex palindrome, possibly forming a pseudoknot (Figure 4A) near the gene T4 56/69 junction, and a different palindromic sequence at the junction of 69 and soc (Figure 4B). In RNA the latter structure regulates expression of soc (MACDONALD et al. 1984
); the former structure is probably involved in ribosomal frameshifting to allow synthesis of a fusion protein combining dCTPase and gp69 of T4 (MOSIG and MACDONALD 1986
; A. CHANG, L. DAVENPORT and G. MOSIG, unpublished results).

View larger version (10K):
In this window
In a new window
Download PPT slide
|
Figure 4.
Possible folding into secondary structures of single-stranded nucleic acid of sequences preceding the stop codon of gene 56 (A) and adjacent to the late promoter of soc (B), that is, near the junctions of the substituted heterologous T2 and T4 sequences.
|
|
- In gene 56, four segments each sharing more than 20 bp identity in T2 and T4 are interrupted by segments with many apparently mismatched or unpaired bases. Apparent additions of multiple bases in one sequence (relative to the other) are compensated by apparent deletions of the same numbers of bases nearby (Figure 2). Nevertheless, the deduced amino acid sequences of T2 and T4 dCTPases (Figure 3) are 66% identical, the genes complement each other, and the two proteins are the closest relatives in the current database.
- The extreme exclusion of T2 am+ alleles by T4 gene 56 mutants is relieved ~100-fold (i.e., 100 times more T2 am+ alleles appear among viable T4 progeny) by improving the homology of the DNA surrounding the gene 56 mutations.
Exclusion of T2 gene 56 alleles by T4 is related to sequence divergence:
Our results indicate that the severe exclusion of T2 gene 56 in crosses with T4 gene 56am mutants is in part because of the sequence divergence of these genes. This is consistent with the reduction of recombination when the distance between mismatched base pairs is less than 50 base pairs (GOLDBERG 1966
; BAUTZ and BAUTZ 1967
; DRAKE 1967
; SINGER et al. 1982
), because restoring sequence identity (except in the mutated site) restores the proportions of the T2 56+ alleles, that is, wild-type recombinants, to the levels seen in neighboring genes.
This interpretation does not exclude the possibility that the product of gene 69 may play additional roles in exclusion or restriction or both in other E. coli strains. For example, gp69 might activate a host nuclease or direct it to a T2 sequence. Gp69 does not contain one of the three consensus motifs found in other mobile endonucleases (for reviews see BELFORT and ROBERTS 1997
, and references therein), including those predicted for T4 (GORBALENYA 1994
; SHUB et al. 1994
; E. KUTTER, personal communication). Suspected interactions of gp69 with host functions and its possible role in exclusion require further investigation, which is in progress.
In other systems, recombination between diverged sequences is reduced by mismatch repair proteins (RADMAN 1991
; DATTA et al. 1997
; KIRKPATRICK and PETES 1997
; SUGAWARA et al. 1997
; VULIC et al. 1997
; ZAHRT and MALOY 1997
). Recent results of DATTA et al. 1997
indicate that in yeast ~20 bp of perfect homology are needed to initiate heteroduplex formation and that mismatches within 600 bp or more trigger the antirecombination activities of the mismatch repair enzymes. Our observation that gene 56 T2-T4 recombination is restored only when homology with both genes 56 and 69 is restored supports the notion that similar length constraints exist for T4 recombination.
How did T4 and T2 gene 56 sequences diverge?
Electron microscopy of heteroduplexes made in vitro from T2 and T4 phage DNA has revealed several so-called substitution loops and many insertion loops (HOMYK and WEIL 1974
; KIM and DAVIDSON 1974
; YEE and MARSH 1981
). These hetroduplex loops were aligned with the genetic and physical maps by KUTTER et al. 1996
. Insertion loops are readily explained by deletions, insertions, or transpositions of relatively large DNA segments. Acquisition or loss of mobile introns or endonuclease genes (for review see CLYMAN et al. 1994
, and BELFORT and ROBERTS 1997
) and illegitimate recombination between short ectopic sequence repeats residing in the same genome (e.g., HOMYK and WEIL 1974
; PRIBNOW et al. 1981
; SINGER 1988
; WU et al. 1991
; DRAKE and RIPLEY 1994
; TETART et al. 1996
, for review) are prominent examples of sequence rearrangements evident as insertion loops.
The origins of heterologous regions revealed by substitution loops and the remarkable coincident sequence divergence in neighboring genes that we describe here require additional explanations. We surmise that this coincidence is related to the mechanisms that generated such sequence exchanges on an evolutionary scale. A current model of the interrelationship of T4 recombination, recombination-dependent DNA replication (MOSIG 1994
), and probably DNA packaging (FRANKLIN and MOSIG 1996
) can be adapted to explain this striking pattern as depicted in Figure 5. For the present discussion, the most important aspects of this model are as follows: (1) the formation of D-loops (displacement loops) by invasion of partially single-stranded regions at ends of DNA molecules into homologous double-stranded DNA; (2) the initiation of DNA replication from an intermediate of recombination by two different modes, that is, from the 3' end of a single-stranded DNA segment invading a duplex ("join-copy" recombination) to copy the invaded parent (LUDER and MOSIG 1982
), and from a 3'-ended nick in the invaded DNA ("join-cut-copy" recombination) to copy the invading parent and thereby join it covalently to the invaded parent (MOSIG et al. 1991
; MOSIG 1994
); and (3) enlargements of heteroduplex regions by branch migration. Partial repair may then occur in the heteroduplex regions (WOMACK 1963
; SHCHERBAKOV and PLUGINA 1991
; SHCHERBAKOV et al. 1995
).

View larger version (27K):
In this window
In a new window
Download PPT slide
|
Figure 5.
A model to explain the sequence divergence of T2 and T4 genes 56 adjacent to the substituted regions and the conservation of the adjacent dam genes (AD). A single-stranded DNA segment (generated by replication or by partial digestion), homologous to dam or to the beginning of gene 56, invades homologous double-stranded DNA. For reasons of simplicity, we have drawn the invaded molecule (ancestor of T2) straight and the invading molecule containing gene 69 with a bend. (The reciprocal situation would give equivalent exchange patterns.) The initial pairing region can be short, because DNA replication initiated from the 3' end of this invading single-stranded segment can stabilize the hybrid and can eventually copy all DNA in the rightward direction. Because the dam genes of T2 and T4 are nearly identical (MINER and HATTMAN 1988 ), we cannot determine exactly where the invasion occurred. In the leftward direction the invading single strand of the other DNA can be further assimilated into the duplex by branch migration, generating several mismatches. If there are extensive mismatches, partial heteroduplex repair may convert some gene 56 sequences to homoduplexes, leaving intermittent unrepaired sequences as heteroduplexes. Subsequent replication will fix these sequence alterations. In an essential gene, such as gene 56, only those scrambled sequences that still encode a functional gene product can survive.
|
|
To explain the unusual pattern of sequence divergence between T2 and T4 in the gene 56-69 region, we propose that in the dCTPase gene of an ancestor T-even phage, there was ectopic pairing with a foreign DNA element, which contained a gene with some sequence similarity to the T4 dCTPase gene 56 adjacent to an entirely different gene in the region now occupied by gene 69. A single-stranded terminus of this element invaded this ancestral gene 56 or dam within a short segment of perfect homology. In one direction, for example, toward dam, which has nearly identical sequences in T2 and T4 (MINER and HATTMAN 1988
), the 3' end of the invading foreign element primed leading strand DNA replication, thereby stabilizing the hybrid. In the opposite direction, single-strand assimilation by branch migration generated a heteroduplex with numerous mismatches and bulges. Partial heteroduplex repair may have generated four short homoduplex regions separated by unrepaired heteroduplex segments in the hybrid region evident in Figure 2. Alternatively, the homoduplex regions were identical in the two parental genes that recombined. We assume that strand assimilation was terminated by an endonuclease cut at the Y junction, and that this cut generated a 3' end in the invaded DNA. Such a cut might have been triggered by the failure of base-pairing in the completely heterologous region or by the formation of a complex secondary structure, such as the one drawn in Figure 4A, in the displaced single-stranded DNA near the junction, or both. From this 3' end in the invaded strand DNA synthesis was initiated, copying the unassimilated segment of the invading strand, and thereby covalently joining the ancestral T4 gene 56 sequence and a copy of the completely heterologous sequence of the invading element, for example, gene 69. Similar series of recombinational events may have occurred on the other side of the heterologous region of the foreign element, that is, in the soc region (Figure 2). The recombination intermediate, still containing heteroduplex regions, was subsequently replicated, and the unrepaired different versions of the heteroduplex sequences segregated. The resulting sequence heterogeneity now contributes to the apparent present-day partial barrier between these two phage species. The initial pairing could have been initiated by the recombinogenic ends of T-even chromosomes (DOERMANN and BOEHNER 1963
; MOSIG 1963
; WOMACK 1963
; MOSIG et al. 1971
), whose ends become single-stranded as a result of replication (LUDER and MOSIG 1982
; DANNENBERG and MOSIG 1983
; MOSIG 1987
) or by endonucleolytic cuts in or near the homologous DNA (CLYMAN et al. 1994
; KREUZER and MORRICAL 1994
; MOSIG 1994
; KREUZER et al. 1995
; GEORGE and KREUZER 1996
).
In a mirror image of this model, a T4 3' single-stranded end would at first invade the promoter-distal end of the ancestral gene 56 and initiate DNA synthesis in the direction of the ancestral soc.1 and soc.2. The latter possibility cannot be excluded but we consider it less likely, mainly because there is less sequence identity, necessary to initiate the invasion, between T2 and T4 at the promoter-distal segment than at the promoter-proximal segment of gene 56, and also because the T2-T4 sequence differences in soc are larger than in dam, but smaller than in gene 56.
We surmise that formation of mismatched heteroduplexes and partial repair, together with the selective pressure in the T-even phages for a functional dCTPase, have led to the remarkable divergence of the T2 and T4 dCTPase genes 56.
More complicated models can be envisioned and cannot be excluded. However, the model described here provides the simplest interpretation within the framework of recombination and replication mechanisms that have been shown to operate in T-even phages and that are now implicated in other organisms as well (HABER 1995
; LLOYD and LOW 1996
; KOGOMA 1997
). We suspect that different, apparently illegitimate, recombinations between different segments of ancient T-even genomes, evident from the present-day divergence (KIM and DAVIDSON 1974
; HENNING and HASHEMOL-HOSSEINI 1994
; REPOILA et al. 1994
; KUTTER et al. 1996
), may have occurred, perhaps in part at different times, by similar mechanisms.
The results of our T2-T4 crosses are consistent with evidence from other organisms (RADMAN 1991
; RAYSSIGUIER et al. 1991
) that, although most mismatched heteroduplex regions in vivo are aborted, such barriers to recombination break down occasionally, allowing sequence divergence and evolution of new species barriers (ARBER 1995
).
Mutagenic consequences of slipped mispairing in recombinational heteroduplex regions: Streisinger's model revisited:
The alignments of T2 and T4 genes 56 (Figure 2) suggest that one coding sequence for functioning dCTPase was converted into another by multiple compensating frameshift mutations in addition to simple base substitutions. In other words, all additions or deletions of base pairs in T2 (as compared with T4) that are not multiples of three are compensated by nearby deletions or additions, respectively, of the same number of base pairs, restoring the reading frame. Our explanation for this pattern of mispairing of partially homologous sequences and partial heteroduplex repair of the apparently mispaired segments is based on an original proposal by STREISINGER et al. 1966
to explain acridine-induced frameshift mutations (Figure 6). The diagrams in STREISINGER's original paper (STREISINGER et al. 1966
), as well as my perhaps faded memory of conversations with GEORGE STREISINGER, indicate that he assumed that both erroneous slipping of DNA polymerases during DNA replication and mispairing in recombinational intermediates could cause bulges in mispaired heteroduplexes as precursors of frameshift mutations. STREISINGER's original model pictured bulging by misalignment of short simple repeats. RIPLEY et al. 1988
, RIPLEY 1990
, and WANG and RIPLEY 1998
(accompanying article) have demonstrated directly that misaligned bases in potential stem-loop structures with no direct repeats are also hot spots of frameshift mutations. Taken together, these results suggested that bulged mispairing of any single-stranded DNA with any complementary sequence, regardless of the reasons for single-strandedness, can generate mispaired heteroduplexes and thus be mutagenic (DRAKE and RIPLEY 1994
). The keen insights of GEORGE STREISINGER and JAN DRAKE have provided the framework for our current understanding of mutagenesis in general and in phage T4 specifically.

View larger version (30K):
In this window
In a new window
Download PPT slide
|
Figure 6.
Streisinger's model for frameshift mutations (from STREISINGER et al. 1966 ). (A) Origin of a frameshift mutation at the end of a molecule. Line 1 shows the normal end of a molecule, line 2 shows an end in which one chain has been digested by an exonuclease followed by mispairing, and line 3 shows the appearance of the molecule after resynthesis of the digested chain. (B) Origin of a frameshift mutation in a heterozygous region. The lengths of the various regions of overlap are meant to be indicated schematically only. The contributions of the two parental DNA molecules to the heterozygote are distinguished by light vs. heavy print and newly synthesized material is indicated by smaller print. The heterozygote is shown as a joint molecule with a set of mispaired bases in the first line and, after synthesis, as a hybrid molecule in which a mutation has occurred, in the second line. (C) A terminal redundancy heterozygote of T4 that is also heteroduplex for a frameshift mutation and for fictitious markers at the same end of the molecule.
|
|
Because the T-even DNA polymerase is by far the major determinant of fidelity (REHA-KRANTZ 1994
; DRESSMAN et al. 1997
), one might assume that most mutations arise as consequences of errors in replication. However, DNA polymerase binding to the 3' ends of invading single strands (LUDER and MOSIG 1982
) may also excise misaligned bases in recombinational heteroduplexes with its proofreading exonuclease activity (BLOOM et al. 1994
; REHA-KRANTZ 1994
; OTTO 1997
).
DRAKE's pioneering experiments on frameshift mutations in T4 (DRAKE 1966
) contributed significantly to formulating STREISINGER's model. Moreover, results obtained by DRAKE and his collaborators (DRAKE 1966
; LINDSTROM and DRAKE 1970
) showed clearly that in T4 most acridine-induced frameshift mutations are first evident as heterozygotes (Figure 6C) and that they do not occur at random sites, but preferentially near tips of T4 chromosomes, which are randomly permuted over the T4 map (MOSIG 1963
, MOSIG 1968
; STREISINGER 1966
; MOSIG et al. 1971
). To explain otherwise paradoxical results, particularly the differences in mutagenesis spectra of different acridine derivatives in different organisms, JAN DRAKE had already suggested at that time that certain enzymes might actively participate in acridine-induced mutagenesis (LINDSTROM and DRAKE 1970
). This suggestion is now generally accepted and has been confirmed for T4 by two lines of evidence:
- Amsacrine-resistant T4 mutations occur in a T4 topoisomerase gene (HUFF et al. 1990
), and certain hot spots of frameshift mutations correspond to topoisomerase cutting sites (RIPLEY et al. 1988
; BROWN et al. 1993
). Moreover, certain enzyme-specific acridines that affect the cleavable intermediates are specific frameshift mutagens (HOWARD et al. 1994
).
- The existence of acridine-resistant gene 17 mutants (PIECHOWSKI and SUSMAN 1967
) indicates that the large subunit of the T4 terminase (gp17) is an additional target of acridines. This finding may be related to the preferred generation of frameshift mutations as heteroduplexes near chromosomal tips (Figure 6C), (LINDSTROM and DRAKE 1970
), because our recent results indicate that the largest of the several products of gene 17 binds preferentially to single-stranded DNA segments at junctions with double-stranded DNA, junctions that occur in recombinational intermediates (Figure 5) (FRANKLIN and MOSIG 1996
; FRANKLIN et al. 1998
). If unrepaired heteroduplexes in the double-stranded segments of these intermediates contain bulged mismatches, perhaps stabilized by acridines, they would be packaged as terminal redundancy heteroduplex heterozygotes (Figure 6C), exactly as found by LINDSTROM and DRAKE 1970
.
 | CONCLUSIONS |
|---|
Our results have shown that the partial species barrier between T2 and T4 genes 56, seen even in hosts in which wild-type T2 is not restricted by T4, is because of DNA sequence divergence, which limits recombination between these genes. In spite of the DNA sequence divergence and corresponding multiple amino acid changes, the products of gene 56 of T2 and T4 (dCTPases) are functionally interchangeable. Our results show that evolutionary exchanges of large heterologous segments are correlated with unusual patterns of sequence divergence in adjacent genes. Most remarkably, this divergence does not extend into the promoter-proximal segment of gene 56 or into the upstream dam gene (MINER and HATTMAN 1988
). These results have led us to propose a model in which evolution of such extensive DNA sequence divergence is a direct consequence of illegitimate pairing between heterologous DNA segments containing only a few complementary bases, and initiation of DNA replication from such a recombinational intermediate. This model can explain the mosaic arrangements of other genes in different phage genomes mentioned earlier as well as in bacterial genomes (FALKOW 1996
; LECLERQUE et al. 1996; MILKMAN 1996
; NELSON et al. 1997
; TYNDALL et al. 1997
). It suggests one general and potentially mutagenic way to accomplish horizontal gene transfer, independent of transposons, by a variation on a theme of homologous recombination. Incidentally, our results and interpretations have obvious consequences for constructions of phylogenetic trees based on sequence divergence, and the apparent different tempos of evolution of different genes in the same organisms.
 | ACKNOWLEDGMENTS |
|---|
We thank MICHAEL TRUPIN for constructing pLAM71*, and HELEN REVEL and BETTY KUTTER for constructive advice. This work was supported by a grant from the National Institutes of Health (GM-13221) to G.M., the Natural Science Fund and a Special Fund of the Dean of the College of Arts and Science of Vanderbilt University, and a Biological Core Facilities Grant and Shared Equipment Grant BIR-9419667 from the National Science Foundation.
 | LITERATURE CITED |
|---|
ALTSCHUL, S. F., T. L. MADDEN, A. A. SCHÄFFER, J. ZHANG, and Z. ZHANG et al., 1997 Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25:3389-3402[Abstract/Free Full Text].
ARBER, W., 1995 The generation of variation in bacterial genomes. J. Mol. Evol. 40:7-12.
BAUTZ, F. A. and E. K. F. BAUTZ, 1967 Transformation in phage T4: minimal recognition length between donor and recipient DNA. Genetics 57:887-895[Free Full Text].
BELFORT, M. and R. J. ROBERTS, 1997 Homing endonucleases: keeping the house in order. Nucleic Acids Res. 25:3379-3388[Abstract/Free Full Text].
BLOOM, L. H., M. R. OTTO, R. ERITJA, L. J. REHA-KRANTZ, and M. F. GOODMAN et al., 1994 Pre-steady state kinetic analysis of sequence dependent nucleotide excision by the 3'-exonuclease activity of bacteriophage T4 DNA polymerase. Biochemistry 33:7576-7586[Medline].
BROWN, H. J., H. W. STOKES, and R. M. HALL, 1996 The integrons In0, In2, and In5 are defective transposon derivatives. J. Bacteriol. 178:4429-4437[Abstract/Free Full Text].
BROWN, M. D., L. S. RIPLEY, and D. H. HALL, 1993 A proflavin-induced frameshift hotspot in the thymidylate synthase gene of bacteriophage T4. Mutat. Res. 286:189-197[Medline].
BRÜSSOW, H. and A. BRUTTIN, 1995 Characterization of a temperate Streptococcus thermophilus bacteriophage and its genetic relationship with lytic phages. Virology 212:632-640[Medline].
CALENDAR, R., S. YU, H. MYUNG, V. BARREIRO, R. ODEGRIP et al., 1998 The lysogenic conversion genes of coliphage P2 have unusually high AT content, pp. 241252 in Symposium on Horizontal Gene Transfer, edited by M. SYVAANEN. Chapman Hall, London.
CAMPBELL, A. C., 1988 Phage evolution and speciation, pp. 114 in The Bacteriophages, edited by R. CALENDAR. Plenum Press, New York.
CARLSON, K., E. RALEIGH and S. HATTMAN, 1994 Restriction and modification, pp. 369381 in Molecular Biology of Bacteriophage T4, edited by J. D. KARAM, J. W. DRAKE, K. N. KREUZER, G. MOSIG, D. H. HALL, F. A. EISERLING, L. W. BLACK, E. K. SPICER, E. KUTTER, K. CARLSON and E. S. MILLER. ASM Press, Washington, DC.
CLYMAN, J., S. QUIRK and M. BELFORT, 1994 Mobile introns in the T-even phages, pp. 8388 in Molecular Biology of Bacteriophage T4, edited by J. KARAM, J. W. DRAKE, K. N. KREUZER, G. MOSIG, D. H. HALL, F. A. EISERLING, L. W. BLACK, E. K. SPICER, E. KUTTER, K. CARLSON and E. S. MILLER. ASM Press, Washington, DC.
DANNENBERG, R. and G. MOSIG, 1983 Early intermediates in bacteriophage T4 DNA replication and recombination. J. Virol. 45:813-831[Abstract/Free Full Text].
DATTA, A., M. HENDRIX, M. LIPSITCH, and S. JINKS-ROBERTSON, 1997 Dual roles for DNA sequence identity and the mismatch repair system in the regulation of mitotic crossing-over in yeast. Proc. Natl. Acad. Sci. USA 94:9757-9762[Abstract/Free Full Text].
DE WIND, N., M. DEKKER, A. BERNS, M. RADMAN, and H. TE RIELE, 1995 Inactivation of the mouse Msh2 gene results in mismatch repair deficiency, methylation tolerance, hyperrecombination, and predisposition to cancer. Cell 82:321-330[Medline].
DOERMANN, A. H. and L. BOEHNER, 1963 An experimental analysis of bacteriophage T4 heterozygotes. I. Mottled plaques from crosses involving six rII loci. Virology 21:551-567[Medline].
DRAKE, J. W., 1966 Heteroduplex heterozygotes in bacteriophage T4 involving mutations of various dimensions. Proc. Natl. Acad. Sci. USA 55:506-512[Free Full Text].
DRAKE, J. W., 1967 The length of the homologous pairing region for genetic recombination in phage T4. Proc. Natl. Acad. Sci. USA 58:962-966[Free Full Text].
DRAKE, J. W., and L. S. RIPLEY, 1994 Mutagenesis, pp. 98124 in Molecular Biology of Bacteriophage T4, edited by J. D. KARAM, J. W. DRAKE, K. N. KREUZER, G. MOSIG, D. H. HALL, F. A. EISERLING, L. W. BLACK, E. K. SPICER, E. KUTTER, K. CARLSON and E. S. MILLER. ASM Press, Washington, DC.
DRESSMAN, H. K., C.-C. WANG, J. D. KARAM, and J. W. DRAKE, 1997 Retention of replication fidelity by a DNA polymerase functioning in a distantly related environment. Proc. Natl. Acad. Sci. USA 94:8042-8046[Abstract/Free Full Text].
FALKOW, S., 1996 The evolution of pathogenicity in Escherichia, Shigella, and Salmonella, pp. 27232729 in Escherichia coli and Salmonella, edited by F. C. NEIDHARDT, R. CURTISS III, J. L. INGRAHAM, E. C. C. LIN, K. B. LOW, B. MAGASANIK, W. S. REZNIKOFF, M. RILEY, M. SCHAECHTER and H. E. UMBARGER. ASM Press, Washington, D.C.
FOSTER, P. L., J. M. TRIMARCHI, and R. A. MAURER, 1996 Two enzymes, both of which process recombination intermediates, have opposite effects on adaptive mutation in Escherichia coli. Genetics 142:25-37[Abstract].
FRANKLIN, J. G. and G. MOSIG, 1996 Expression of the bacteriophage T4 DNA terminase genes 16 and 17 yields multiple proteins. Gene 177:179-189[Medline].
FRANKLIN, J. L., D. HASELTINE, L. DAVENPORT, and G. MOSIG, 1998 The largest (70kDa) product of the bacteriophage T4 DNA terminase gene 17 binds to single-stranded DNA segments and digests them towards junctions with double-stranded DNA. J. Mol. Biol. 277, in press)..
GARY, T. P., 1992 The dCTPase/dUTPase genes of bacteriophages T2 and T4 and a test of their role in the exclusion of T2 by T4. Ph.D. Dissertation, Vanderbilt University, Nashville, TN.
GEORGE, J. W. and K. N. KREUZER, 1996 Repair of double-strand breaks in bacteriophage T4 by a mechanism that involves extensive DNA repair. Genetics 143:1507-1520[Abstract].
GOLDBERG, E. B., 1966 The amount of DNA between genetic markers in phage T4. Proc. Natl. Acad. Sci. USA 56:1457-1463[Free Full Text]<