Genetics, Vol. 148, 305-316, January 1998, Copyright © 1998, Genetics Society of America

The Role of Gene Conversion in Determining Sequence Variation and Divergence in the Est-5 Gene Family in Drosophila pseudoobscura

Lynn Mertens Kinga
a Department of Biology, University of Miami, Coral Gables, Florida 33124

Corresponding author: Lynn Mertens King, Medical Genetics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892.

Communicating editor: A. G. CLARK


*  ABSTRACT
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

Nucleotide sequences of eight Est-5A and Est-5C genes corresponding to previously sequenced Est-5B genes in Drosophila pseudoobscura were determined to compare patterns of polymorphism and divergence among members of this small gene family. The three esterase genes were also sequenced from D. persimilis and D. miranda for interspecific comparisons. The data provide evidence that gene conversion between loci contributes to polymorphism and to the homogenization of the Est-5 genes. For Est-5B, which encodes one of the most highly polymorphic proteins in Drosophila, 12% of the segregating amino acid variants appear to have been introduced via gene conversion from other members of the gene family. Interlocus gene conversion can also explain high sequence similarity, especially at synonymous sites, between Est-5B and Est-5A. Tests of neutrality using interspecific comparisons show that levels of polymorphism conform to neutral expectations at each Est-5 locus. However, McDonald-Kreitman tests based on intraspecific gene comparisons indicate that positive selection on amino acids has accompanied Est-5 gene duplication and divergence in D. pseudoobscura.


THE X-linked Esterase -5 locus in Drosophila pseudoobscura is one of the most polymorphic allozyme loci in Drosophila (LEWONTIN and HUBBY 1966 Down; COYNE et al. 1978 Down; KEITH 1983 Down). As such, there has been considerable interest in addressing the adaptive significance of Est-5 variation, especially in reference to the high frequency allozyme variants (YAMAZAKI 1971 Down; ARNASON 1982 Down, ARNASON 1991 Down; KEITH 1983 Down). However, studies of allozyme variants are not likely to be appropriate for Est-5, because members of a single protein electrophoretic class may be heterogeneous in amino acid composition and thus an assemblage of "true" alleles (VEUILLE and KING 1995 Down).

A molecular characterization of the Est-5 gene region revealed three closely linked genes called Est-5C, Est-5B, and Est-5A (arranged 5' to 3'; BRADY et al. 1990 Down). An analysis of gene expression showed that Est-5A is transcribed in the third instar larvae, Est-5B is expressed in adults of both sexes and is the structural locus for the major adult EST5 protein (hereafter called EST5B), and Est-5C expression was not detected (BRADY et al. 1990 Down). BRADY and RICHMOND 1990 Down proposed an evolutionary history of the Est-5 gene duplications in D. pseudoobscura, with reference to Est-6 and Est-P in Drosophila melanogaster, based on comparisons of nucleotide sequences, patterns of gene expression, and properties of the enzymes. They propose that the first gene duplication predated the divergence of D. pseudoobscura and D. melanogaster, and gave rise to the Est-5A–Est-P lineage and to the Est-5B/C–Est-6 lineage. A second duplication in the D. pseudoobscura lineage gave rise to Est-5B and Est-5C. Based on this scenario, there is a lower than expected level of sequence divergence between Est-5A and Est-5B (17.5% nucleotide and 18.6% amino acid differences), compared with the orthologous Est-P and Est-6 loci (32.7% nucleotide and 35.8% amino acid differences), which is attributed to gene conversion (or reciprocal recombination) between Est-5A and Est-5B (BRADY and RICHMOND 1990 Down).

Many studies have now shown that members of multigene families do not evolve independently, and various mechanisms of homogenization, including unequal crossing over and gene conversion, have been proposed to explain the concerted evolution of the gene family members (ARNHEIN 1983 Down). Although gene conversion is considered to be a homogenizing mechanism, there is evidence that gene conversion can also generate variability among members of multigene families (XIONG et al. 1988 Down; KUHNER et al. 1991 Down; WINES et al. 1991 Down; OHTA 1992, 1995).

Although the evolutionary history of the Est-5/6 gene family shows evidence of gene conversion (and/or reciprocal recombination) and concerted evolution, it is unclear if interlocus gene conversion generates genetic variability in the gene family members, and especially if this mechanism generates Est-5B sequence variation and amino acid polymorphism in D. pseudoobscura. Previous work indicates that sequence variation among members of different EST5B protein electrophoretic classes does not deviate from neutral expectations, suggesting that the considerable amino acid polymorphism is selectively neutral (VEUILLE and KING 1995 Down). Thus, an analysis of sequence variation including Est-5A and Est-5C is likely to provide a more complete picture of the evolutionary forces and molecular mechanisms influencing polymorphism and divergence at Est-5B.

In this study, eight Est-5A and Est-5C alleles corresponding to previously sequenced Est-5B alleles were sequenced in D. pseudoobscura, and the three genes were sequenced in Drosophila persimilis and Drosophila miranda for interspecific comparisons. The goals of this study were to describe patterns of polymorphism and divergence in this gene family and to examine if gene conversion contributes to sequence variation, especially to the highly polymorphic Est-5B locus in D. pseudoobscura. The data also allow examination of amino acid divergence, which may accompany functional divergence of the duplicated genes. The interspecific comparisons allow tests of neutrality and examination of putative gene conversion tracts within a phylogenetic context.


*  MATERIALS AND METHODS
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

Sampling:
D. pseudoobscura isofemale lines were established by KEITH 1983 Down from collections made in 1979 in the James Reserve in the San Jacinto Mountains in southern California and near the Gundlach-Bundschu Winery in the Sonoma Valley of northern California, and they were maintained since then in the laboratory. Nucleotide sequences of Est-5A and Est-5C were determined from lines representing eight different EST5B protein electrophoretic classes: three lines from the James Reserve (J3, J5, and J10) and five lines from the Gundlach-Bundschu (G2–G6) populations. Line G3 represents the most common EST5B electrophoretic class. The Est-5B lines were chosen originally to characterize the nature of electrophoretic classes, and they are a nonrandom population sample (VEUILLE and KING 1995 Down).

Cloning and sequencing:
The Est-5 genes in D. pseudoobscura were isolated from {lambda}ZAPII (Stratagene, La Jolla, CA) subgenomic libraries and were constructed by cloning 8- or 3-kb EcoRI restriction fragments that include the Est-5C and Est-5B, and Est-5A gene regions, respectively. A D. persimilis genomic library in {lambda}EMBL3 and a D. miranda genomic library in {lambda}EMBL4 were provided by R. NORMAN. Clones were isolated by plaque hybridization using D. pseudoobscura Est-5 clones provided by J. BRADY. The clones were purified, and the three gene regions were individually subcloned into either pUC19 or pBSKS- (Stratagene) using standard procedures (SAMBROOK et al. 1989 Down).

Either plasmids or PCR-amplified templates were sequenced using oligonucleotide primers designed from published sequences (BRADY and RICHMOND 1992 Down). Plasmid templates were manually sequenced using Sequenase version 2.0 (United States Biochemical, Cleveland, OH). PCR templates were amplified using a thermal cycler (MJ Research, Watertown, MA); the reaction components 1x rTth buffer, 240 µM dNTPs, 5U rTth polymerase (Perkin Elmer, Norwalk, CT), 50 nM primers, 625 ng genomic DNA, 1.25 mM MgCl2; and the reaction profile 30 cycles of 94° 30 sec, 54° 1 min, 72° 2 min, followed by 72° 5 min, and 4° hold. The PCR products were purified with spin columns (Centricon 100; Amicon, Beverly, MA), and ~300 ng of template was used in the automated sequencing dye termination reaction (model 373A; Applied Biosystems, Foster City, CA). Complete and overlapping coverage was obtained in both directions for all sequences.

Sequence analysis:
Sequences were assembled using the GAP and PRETTY programs of the University of Wisconsin Genetics Computer Group (DEVEREUX et al. 1984 Down) or the Genetic Data Environment programs (SMITH et al. 1994 Down). The sequences have the following accession numbers in the GenBank database: D. pseudoobscura Est-5A, AF016135–AF016142; D. pseudoobscura Est-5C, AF016143–AF016160; D. persimilis Est-5C and Est-5B, AF016110; D. persimilis Est-5A, AF016111; D. miranda Est-5C and Est-5B, AF016109; and D. miranda Est-5A, AF016108. Sequence alignments were made using ClustalW (THOMPSON et al. 1994 Down), followed by manual adjustments based on amino acid alignments. The alignments included Est-6 and Est-P of D. melanogaster (GenBank accession numbers M33780 and M33781, respectively). Nucleotide diversity and the number of net nucleotide substitutions per site between populations (loci) were estimated by the method of NEI 1987 Down using the computer program DnaSP, version 2.0 (ROZAS and ROZAS 1995 Down). Estimates of nucleotide substitution were made using the Jukes-Cantor correction, and numbers of synonymous and nonsynonymous sites were estimated by the method of NEI and GOJOBORI 1986 Down using the computer program MEGA (KUMAR et al. 1993 Down). Alignment gaps were excluded in pairwise comparisons.

Phylogenetic analysis:
The genealogical relationships of genes and alleles were estimated using maximum parsimony (PAUP; SWOFFORD 1992 Down) and by the neighbor-joining method using the Jukes-Cantor distance estimation in MEGA (KUMAR et al. 1993 Down). For the parsimony analysis, heuristic searches with 100 random addition replicates, TBR branch swapping, and MULPARS options were invoked. Strict consensus trees were constructed from the multiple equally parsimonious trees. The tree topologies were evaluated by 100 bootstrap replicates. Three data sets were analyzed: coding regions, 5' flanking regions, and 3' flanking regions.


*  RESULTS
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

Nucleotide sequence variation:
Figure 1 shows the location of the polymorphic nucleotide sites and the interspecific differences in the total region sequenced of each Est-5 gene. Est-5A is the only one of three genes that shows length variation in the coding region both within and between species. In D. pseudoobscura, Est-5A is polymorphic for a CTA (Leu) deletion from 73 to 75 bp (Figure 1A), relative to Est-5A in D. persimilis and D. miranda. CTA is duplicated in this region, and the polymorphism involves the presence or absence of one of the duplications. The polymorphism is in intermediate frequency, with half of the lines having the deletion. Based on an alignment of the three Est-5 genes, the CTA duplication is present in Est-5A but not Est-5B in these species. Est-5A in D. pseudoobscura is presumably functional because no stop codons occur in the coding regions, and putative regulatory sequences are conserved in these eight lines; although no EST5A proteins have been identified, there is evidence that the gene is transcribed (BRADY et al. 1990 Down).




View larger version (89K):
[in this window]
[in a new window]
 
Figure 1. —Polymorphic sites in D. pseudoobscura Est-5, including sequences of D. persimilis (per5) and D. miranda (mir5). Dots indicate sequence identity, and dashes indicate deletions compared with line J5. The numbering of sites above the sequence is relative to the initiation codon, which begins with +1. Domains of the gene regions are noted above the numbered nucleotide sites. (A) Est-5A, (B) Est-5B, and (C) Est-5C. I, intron regions.

In D. miranda, Est-5A encodes a protein seven amino acids longer than in D. pseudoobscura, D. persimilis, and the putatively homologous Est-P gene in D. melanogaster. Thus, assuming that the shorter Est-5A gene is ancestral, the increase in gene length results from a T to C substitution at position 1701 that changes the UAG stop codon to a CAG (Gln) sense codon (Figure 1A). A UAG stop codon is present 18 nucleotides downstream from the CAG codon [from 1722 to 1724 base pairs (bp)], and the EST5A protein in D. miranda is extended by seven amino acids (Figure 2A).



View larger version (29K):
[in this window]
[in a new window]
 
Figure 2. —EST5 amino acid polymorphism in D. pseudoobscura, including D. persimilis (per) and D. miranda (mir). Dots indicate sequence identity, and dashes indicate deletions compared to line J5. Numbering is relative to the initial Met. (A) EST5A, (B) EST5B, and (C) EST5C.

Table 1 and Table 2 summarize Est-5 variation in D. pseudoobscura by gene region and class of site. The complete intergenic region between Est-5C and Est-5B was sequenced and was divided into two regions of equal length to compare 5' and 3' flanking regions. Comparisons of the level of polymorphism across genes at functionally different classes of sites show several significant differences in the patterns of variation.


 
View this table:
[in this window]
[in a new window]
 
Table 1. Nucleotide diversity among eight D. pseudoobscura haplotypes


 
View this table:
[in this window]
[in a new window]
 
Table 2. Nucleotide diversity at synonymous and nonsynonymous sites in D. pseudoobscura

For all three Est-5 genes, noncoding sites are significantly less polymorphic than synonymous sites (Est-5A, G = 17.70, 2 d.f., P < 0.001; Est-5B, G = 31.09, 2 d.f., P < 0.001; Est-5C, G = 21.49, 2 d.f., P < 0.001). This may be a general pattern in Drosophila (MORIYAMA and POWELL 1996 Down). They show an average nucleotide diversity ({pi}) at synonymous sites = 0.028 and noncoding sites {pi} = 0.017 for five nuclear genes in D. pseudoobscura. The following estimates of Est-5 variation are multiplied 4/3 to compare the X-linked Est-5 genes to autosomal genes. Averaging over the three Est-5 genes, {pi} = 0.039 for synonymous sites, and {pi} = 0.013 for noncoding sites. It appears that the nonrandom sample of Est-5B sequences does not inflate these estimates. Based on a random sample of 16 sequences, {pi} = 0.016 for a 504-bp intergenic region between Est-5B and Est-5C in D. pseudoobscura (BABCOCK and ANDERSON 1996 Down). This compares with {pi} = 0.015 for the same (808 bp) intergenic region in this study. Thus, the Est-5 genes show more variation at synonymous sites than other genes in D. pseudoobscura.

The 3' flanking regions are significantly more polymorphic than the 5' flanking regions summing across all three genes (G = 8.80, 1 d.f., P = 0.003), but if the regions are compared by gene, Est-5A and Est-5C show significant differences, but not Est-5B (G = 0.046, 1 d.f., P = 0.830). Considering the two intergenic regions, the 808-bp region between Est-5C and Est-5B is significantly more polymorphic than the 774-bp region (of the ~1100-bp region in total) sequenced between Est-5B and Est-5A (G = 5.12, 1 d.f., P = 0.024).

Each gene shows significantly different levels of polymorphism at synonymous and nonsynonymous sites (Est-5A, G = 27.10, 1 d.f., P < 10-6; Est-5C, G = 49.17, 1 d.f., P < 10-6; Est-5C, G = 51.95, 1 d.f., P < 10-6); however, the three genes have similar levels of polymorphism at synonymous sites (G = 3.47, 2 d.f., P = 0.176) and nonsynonymous sites (G = 3.05, 2 d.f., P = 0.218). Although the genes show similar levels of polymorphism, estimates of nucleotide diversity, the average pairwise number of differences per nucleotide site, are lowest at Est-5A for synonymous sites and lowest at Est-5C for nonsynonymous sites, and both classes of sites show the highest nucleotide diversity at Est-5B (Table 2). The original nonrandom sample of Est-5B sequences will cause an upward bias in estimates of variation at nonsynonymous sites at this locus, but this is not expected to influence variation at synonymous sites.

The distribution of nucleotide polymorphism was tested for heterogeneity by the variance test of GOSS and LEWONTIN 1996 Down. This method measures the distance between polymorphic sites and compares the observed variance with the expected values. The test was applied to the following: (1) the coding plus intron regions of each gene, (2) the intergenic region between Est-5C and Est-5B, and (3) the intergenic region between Est-5B and Est-5A. All tests show a highly significant nonrandom spatial distribution of polymorphism. The observed variances of interval length for these regions are Est-5A = 0.00567, P < 0.001; Est-5B = 0.00106, P < 0.001; Est-5C = 0.00545, P < 0.001; intergenic Est-5C and Est-5B = 0.01277, P < 0.001; and intergenic Est-5B and Est-5A = 0.05363, P < 0.001.

Amino acid variation:
Figure 2 shows the amino acid polymorphisms in the EST5 proteins, which were determined from the nucleotide sequences. EST5A has 3.1% amino acid polymorphism, and the proteins differ by an average of 5.8 amino acids (Figure 2A). EST5B has 4% amino acid polymorphism, and the proteins differ by an average of 8.9 amino acids (Figure 2B). All 16 EST5B amino acid sequences show 6.1% polymorphism, and they differ by an average of 7.7 amino acids (VEUILLE and KING 1995 Down). EST5C has 2% amino acid polymorphism, and the average number of amino acid differences among the sequences is 3.8 (Figure 2C).

Tests of gene conversion:
The method of BETRAN et al. 1997 Down was used to detect gene conversion events between the Est-5 loci. Their method uses the relative frequency of a nucleotide at a site to determine if the site is informative of a conversion event between two groups of sequences. A segregating nucleotide is informative if its relative frequency in a group of "converted" sequences is 20% or less and its relative frequency in the group of "converting" sequences is three or more times higher than in the group of "converted" sequences. The two outermost informative sites determine the length of the observed conversion tract. Conversion tracts of 1 bp in length are not considered because they cannot be distinguished from parallel mutation events.

This method detected six interlocus gene conversion events (Table 3). In addition, visual inspection of the data showed that nucleotide sites 132–143 and 942–960 in Est-5C and Est-5B, respectively, have segregating nucleotides in higher frequency (25%) than considered by the method of BETRÁN et al. (which is based on a minimum of an informative nucleotide pair) that are shared with another locus. These nucleotides may also be interpreted as resulting from gene conversion rather than from parallel mutation events, and they are included in Table 3.


 
View this table:
[in this window]
[in a new window]
 
Table 3. Interlocus Est-5 gene conversion in D. pseudoobscura

Considering information from interspecific comparisons, there are two ways to explain the Est-5A AG/CA haplotype variation at nucleotide sites 414 –415. In D. persimilis and D. miranda, these sites are CA at all three loci, suggesting that these nucleotides are the ancestral state. Therefore, the low-frequency CA polymorphism in Est-5A can be explained either by unique mutations (AG) that have increased in population frequency plus the maintenance of ancestral variation (CA), or by the conversion of AG to CA by either Est-5B or Est-5C.

The observed gene conversion tracts between nucleotides 639 and 1044, where Est-5A putatively converted Est-5B, are coincident with a region of few fixed nucleotide differences between these two genes (Figure 3A). For example, in the 500–1200-bp region, there are only 13 fixed differences (5 at synonymous sites, 8 at nonsynonymous sites) between Est-5A and Est-5B. This contrasts with 110 fixed differences in the first 500 nucleotides (57 at synonymous sites, 53 at nonsynonymous sites) and 140 fixed differences (77 at synonymous sites, 63 at nonsynonymous sites) in the last 447 bp of the coding region. This pattern of fixed nucleotide differences does not occur between Est-5A and Est-5C (Figure 3B) or between Est-5B and Est-5C (Figure 3C). The region of few fixed differences between Est-5A and Est-5B does not correspond to a region of low polymorphism in either gene, so it does not seem likely that constraint on sequence divergence is maintaining the similarity.





View larger version (81K):
[in this window]
[in a new window]
 
Figure 3. —The number of fixed differences between Est-5 genes at synonymous and nonsynonymous sites in 50-bp intervals along the coding sequence. (A) Est-5A vs. Est-5B (B) Est-5A vs. Est-5C, and (C) Est-5B vs. Est-5C.

The patterns of fixed differences between the Est-5 genes in D. pseudoobscura are similar to patterns of divergence between the Est-5 genes in D. miranda and D. persimilis (not shown). However, Est-6 and Est-P in D. melanogaster do not show the same pattern of divergence as their putative homologs (Est-5B and Est-5A, respectively; BRADY and RICHMOND 1992 Down). Three regions where there are no fixed differences between the Est-5 genes in all pairwise comparisons, centered on intervals 575, 1125, and 1175 bp, suggests that the lack of divergence is related to functional constraint. Two of these regions are near but not entirely coincident with amino acid residues Ser-210 and Glu-340 (the positions are based on alignment of the three EST5 proteins), corresponding to codons at nucleotide sites 628–630 and 1028–1030, respectively, which are thought to be involved in the catalytic function of the enzyme (KAROTAM et al. 1993 Down). The region encompassing the third residue of a proposed catalytic triad, His-470, corresponding to nucleotide sites 1408–1410, is not conserved.

Thus, the length of the region of similarity between Est-5A and Est-5B may be explained partly by functional constraint but perhaps mostly by a single gene conversion (and/or reciprocal recombination) event that predates the divergence of D. pseudoobscura, D. persimilis, and D. miranda. The accumulation of unique polymorphisms and fixed differences in this region is also evidence that the event was not recent. The putative converted Est-5B regions in D. pseudoobscura may then be remnants of one old conversion event that have been reshuffled by interallelic recombination.

The lengths of the observed converted gene regions in Table 3 range from 2 to 28 bp, or up to 405 bp if the region between nucleotide sites 639 and 1044 is considered to result from a single event between Est-5A and Est-5B. The lengths of the true gene conversion tracts are difficult to estimate. From the model of BETRAN et al. 1997 Down, the estimates of true tract length are 10 bp for Est-5A and Est-5B and 16 bp for Est-5B and Est-5C (A. BARBADILLA, personal communication). These estimates, however, are based on the assumption that conversion events have not been broken up by subsequent recombination events. This assumption does not appear to be valid for Est-5, which shows considerable intragenic recombination (Figure 1), so these estimates are not likely to be meaningful. In D. melanogaster, estimates of the mean length of gene conversion tracts within the rosy locus, based on intragenic recombination using strains with known molecular markers, is estimated to be 352 bp (HILLIKER et al. 1994 Down). Thus, it seems plausible that a single gene conversion even may account for the 405-bp region of Est-5B haloptype variation in D. pseudoobscura. Longer tract lengths may not be observed in Est-5 because intragenic recombination would quickly reshuffle the polymorphisms, except for those that are very close together.

Tests of neutral molecular evolution:
Tests of neutral molecular evolution applied to the Est-5 data fail to reject the neutral model. TAJIMA'S D -statistic (Est-5A, D = -0.60; Est-5B, D = -0.73; Est-5C, D = -0.42) is not significant for any locus, although the values of D are negative and suggest purifying selection (TAJIMA 1993 Down). The HKA-test (HUDSON et al. 1987 Down), using the intergenic region between Est-5C and Est-5B as the reference locus and D. miranda for the interspecific comparison, yields nonsignificant {chi}2 values: Est-5A, {chi}2 = 0.269; Est-5B, {chi}2 = 1.094; Est-5C, {chi}2 = 0.375, P < 0.10 (1 d.f.). Application of the test to all sites in the Est-5 coding regions also failed to reject the neutral model, as did pairwise comparisons of synonymous sites between Est-5 genes.

Finally, the ratios of nonsynonymous to synonymous polymorphisms in D. pseudoobscura (0.65, Est-5A; 0.51, Est-5B; 0.36, Est-5C) are not significantly different from the ratios of nonsynonymous to synonymous fixed differences between D. pseudoobscura and D. miranda (0.53, Est-5A; 0.40, Est-5B; 0.29, Est-5C; P < 0.07 for each gene comparison), based on MCDONALD and KREITMAN'S 1991 Down test. In contrast, the results of this test applied to between gene comparisons are significant for Est-5A vs. Est-5B and Est-5A vs. Est-5C (Table 4). There is an excess of fixed differences at nonsynonymous sites in these comparisons, which indicates selective amino acid divergence between Est-5A and Est-5B and between Est-5A and Est-5C.


 
View this table:
[in this window]
[in a new window]
 
Table 4. Polymorphism and fixed differences at nonsynonymous and synonymous sites between Est-5 genes in D. pseudoobscura

Genealogical inference:
The phylogenetic analyses were based on an alignment of 1695 nucleotide sites of the coding regions of the Est-5 genes in D. pseudoobscura, D. miranda, and D. persimilis, and of the Est-6 and Est-P genes in D. melanogaster. The maximum parsimony analysis showed that the genes of the obscura group species (D. pseudoobscura, D. miranda, and D. persimilis) clustered within each locus (Figure 4). In this species group, separate analyses of the 5' flanking region (268 nucleotide sites) and the 3' flanking region (376 nucleotide sites) showed similar relationships between genes and loci, although the nearest neighbors within each Est-5 gene cluster differed, most likely as a result of recombination (not shown).



View larger version (24K):
[in this window]
[in a new window]
 
Figure 4. —A 50% majority rule bootstrap consensus tree. Numerals adjacent to each branch refer to bootstrap support from 100 replicates. Forty-two equally parsimonious trees were found. The tree length of the strict consensus tree is 1799, with a consistency index of 0.638.

The clustering of species within genes indicates that the Est-5 gene duplications predate the divergence of the three sibling species and that mechanisms of concerted evolution have not homogenized the genes, since the species diverged from a common ancestor. This is in contrast to the relationship between the Est-5 genes in D. pseudoobscura and the Est-6 and Est-P in D. melanogaster : Est-6 and Est-P genes cluster with one another and not with the putatively orthologous Est-5B and Est-5A genes, respectively, so the esterase gene family members have been homogenized within the melanogaster and obscura species groups since the time of their divergence (e.g., ~25 mya; RUSSO et al. 1995 Down). The relationships among clusters of esterase genes were the same for the neighbor-joining tree; only the relationships of the D. miranda and the D. persimilis genes with respect to the D. pseudoobscura alleles differed (not shown).


*  DISCUSSION
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

The data on Est-5A and Est-5C polymorphism and haplotype variation, in addition to previous data on Est-5B nucleotide sequence polymorphism, contribute to understanding the factors that influence Est-5B polymorphism and to understanding the evolution of this small multigene family. The polymorphism data provide statistical support for the hypothesis that interlocus gene conversion contributes to amino acid polymorphism, and may partly explain why Est-5B is a highly polymorphic allozyme locus. Gene conversion was detected in the coding regions between Est-5A and Est-5B, and between Est-5B and Est-5C, but not between the two outer loci, Est-5A and Est-5C. The flanking regions were not examined for evidence of gene conversion (between loci) because they are not alignable much beyond 250–350 bp. The interlocus conversion events can explain at least 4 of the 33 (12.1%) polymorphic amino acid positions in EST5B (16 sequences), 1 of the 17 (5.9%) polymorphic amino acid positions in EST5A, and 1 of the 12 (8.3%) polymorphic amino acid positions in EST5C. Interlocus gene conversion can also explain the following proportions of polymorphic synonymous sites: 1/26 (3.8%) in Est-5A, 12/67 (17.9%) in Est-5B (16 sequences), and 5/34 (14.7%) in Est-5C.

The levels of polymorphism in the coding regions are similar for all three genes and fit neutral theory expectations. However, the polymorphic sites have a significantly heterogeneous distribution in the coding and intron regions of each gene. Figure 5 shows nucleotide diversity in sliding window intervals across the coding region of each gene. The magnitude of variation does not always correspond to the same location in the three genes, and comparisons of the location of conversion tracts (Table 3) and peaks of nucleotide diversity show that they are related. In Est-5B, at least three peaks of nucleotide diversity, at intervals entered at 250, 700, and 950 bp (Figure 5B), correspond to gene conversion tracts at sites 255–257, 699–705, and 942–947. In Est-5C, the intervals with the highest nucleotide diversity at 100–150 bp (Figure 5C) correspond to the conversion tract at nucleotide sites 132–134. In Est-5A, nucleotide diversity at the 400-bp interval corresponds to the tract at sites 414 –415. The heterogeneity is also likely to be influenced by regions of functional constraint, for example, at residues putatively involved in the catalytic mechanism of esterases (noted above) and at six cysteine residues involved in disulfide bridges (BRADY et al. 1990 Down) that are conserved in the three genes and three obscura group species studied here, as well as in Est-6 of D. melanogaster, D. simulans, and D. mauritiana (KAROTAM et al. 1993 Down).





View larger version (69K):
[in this window]
[in a new window]
 
Figure 5. —Nucleotide diversity in the coding regions of the Est-5 genes in sliding windows of 100 bp, step size 25 bp, with centers on 50-bp intervals. (A) Est-5A, (B) Est-5B, and (C) Est-5C.

The maintenance of functional regulatory sequences may explain the low level and pattern of variation in the intergenic regions (Figure 6). One regulatory motif, ACTGGT, identified in D. pseudoobscura (HEALY et al. 1996 Down), corresponds to sites 693–698 bp in the intergenic Est-5C/Est-5B region, where there is no sequence variation (Figure 6A). This motif is also conserved in D. persimilis and D. miranda. The motif is present in Est-5A (sites 667–673 bp in Figure 6B) in D. pseudoobscura and D. persimilis, but not in D. miranda, where a 9-bp deletion is located. Est-5C shows an imperfect motif, ATTGGT, at sites -89 to -90 bp from the translation start site in all three sibling species. In the 3' flanking regions, polyadenylation signal sequences (BRADY and RICHMOND 1992 Down) are conserved in the three genes and species. These are located beginning at 263 bp in Est-5C (Figure 6A) and at either 110 or 119 bp in Est-5B (Figure 6B; the 3' region of Est-5A is not shown).




View larger version (49K):
[in this window]
[in a new window]
 
Figure 6. —Nucleotide diversity in the noncoding regions between the Est-5 genes in sliding windows of 30 bp, step size 15 bp, with centers on 15-bp intervals. (A) Intergenic region between Est-5C and Est-5B. (B) Intergenic region between Est-5B and Est-5A.

Evidence of positive selection on Est-5 genes:
OHTA 1994 Down suggests that an acceleration of amino acid changes between duplicated genes in conjunction with functional differentiation is evidence of positive selection. Although the total number of fixed differences at nonsynonymous sites is not greater than the total number of fixed differences at synonymous sites in pairwise comparisons of the Est-5 genes, the ratios of nonsynonymous to synonymous variation for both fixed differences and polymorphism show evidence of adaptive amino acid divergence between Est-5A and Est-5B/Est-5C. The gene duplication resulting in Est-5B and Est-5C is putatively the most recent event in the evolution of the Est-5 gene family (BRADY and RICHMOND 1992 Down). The number of net nucleotide substitutions per site is lowest between Est-5B and Est-5C (0.16450 ± 0.0210) compared with Est-5A and Est-5B (0.1989 ± 0.0251) and with Est-5A and Est-5C (0.1989 ± 0.0251), and supports this hypothesis. Therefore, most of the amino acid divergence between Est-5A and Est-5B/C may have predated the Est-5B/C duplication. The evidence of differential gene expression of Est-5A and Est-5B (BRADY et al. 1990 Down) is consistent with the interpretation of positive selection on functional divergence, although it is unknown if the difference in amino acid composition is associated with a difference in enzyme function.

Variation at Est-5B is higher than variation at Est-5A and Est-5C, and is relatively high compared to other D. pseudoobscura genes (MORIYAMA and POWELL 1996 Down), although there is so far no evidence of selective mechanisms operating on amino acid or total nucleotide sequence variation. Perhaps what makes this gene most unusual is that it lies between Est-5A and Est-5C, so that in the short term, gene conversion involving two different loci contributes to Est-5B sequence variation and amino acid polymorphism. Mutation and intragenic recombination also contribute to haplotype diversity, so these three factors may explain the considerable Est-5B allozyme variation. Over longer periods of time, gene conversion and/or reciprocal recombination has homogenized the Est-5 genes, based on a phylogenetic analysis of esterase genes (Figure 4), although there appears to be selection for amino acid divergence between the EST5 proteins.


*  ACKNOWLEDGMENTS

I thank B. RICHTER for assistance in sequencing and A. BARBADILLA for assistance in the analysis of gene conversion tract lengths. A. BABADILLA, M. P. CUMMINGS, and R. C. LEWONTIN, and anonymous reviewers provided useful comments on the manuscript. This work was supported by National Institutes of Health grant GM-21179 to R. C. LEWONTIN, and National Science Foundation grant DEB 95-24595 and University of Miami General Research Awards to L. M. KING.

Manuscript received January 10, 1997; Accepted for publication September 26, 1997.


*  LITERATURE CITED
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

ARNHEIN, N., 1983 Concerted evolution of multigene families, pp. 38–61 in Evolution of Genes and Proteins, edited by M. NEI and R. K. KOEHN. Sinauer, Sunderland, MA.

ÁRNASON, E., 1991  Perturbation-reperturbation test of selection vs. hitchhiking of the two major alleles of esterase-5 in Drosophila pseudoobscura.. Genetics 129:145-168[Abstract].

ÁRNASON, E., 1982  An experimental study of neutrality at the Malic Dehydrogenase and Esterase-5 loci in Drosophila pseudoobscura. Hereditas 96:13-27[Medline].

BABCOCK, C. S. and W. W. ANDERSON, 1996  Molecular evolution of the sex-ratio inversion complex in Drosophila pseudoobscura: analysis of the esterase-5 gene region. Mol. Biol. Evol. 13:287-308.

BETRÁN, E., J. ROZAS, A. NAVARRO, and A. BARBADILLA, 1997  The estimation of the number and the length distribution of gene conversion tracts from population DNA sequence data. Genetics 146:89-99[Abstract].

BRADY, J. P. and R. C. RICHMOND, 1990  Molecular analysis of evolutionary changes in the expression of Drosophila esterases. Proc. Natl. Acad. Sci. USA 87:8217-8221[Abstract/Free Full Text].

BRADY, J. P. and R. C. RICHMOND, 1992  An evolutionary model for the duplication and divergence of esterase genes in Drosophila. J. Mol. Evol. 34:506-521[Medline].

BRADY, J. P., R. C. RICHMOND, and J. G. OAKESHOTT, 1990  Cloning of the esterase-5 locus from Drosophila pseudoobscura and comparison with its homologue in D. melanogaster.. Mol. Biol. Evol. 7:525-546[Abstract].

COYNE, J. A., A. A. FELTON and R. C. LEWONTIN, 1978 Extent of genetic variation at a highly polymorphic esterase locus in Drosophila pseudoobscura. 75: 5090–5093.

DEVEREUX, J., P. HAEBERLI, and O. SMITHIES, 1984  A comprehensive set of sequence analysis programs for the VAX. Nucleic Acids Res. 14:623-633[Abstract/Free Full Text].

GOSS, P. J. E. and R. C. LEWONTIN, 1996  Detecting heterogeneity of substitution along DNA and protein sequences. Genetics 143:589-602[Abstract].

HEALY, M. J., M. M. DUMANCIC, A. CAO, and J. G. OAKESHOTT, 1996  Localization of sequences regulating ancestral and acquired sites of esterase 6 activity in Drosophila melanogaster. Mol. Biol. Evol. 13:784-797[Abstract].

HILLIKER, A. J., G. HARAUZ, A. G. REANUME, M. GRAY, and S. H. CLARK et al., 1994  Meiotic gene conversion tract length distribution within the rosy locus of Drosophila melanogaster.. Genetics 137:1019-1026[Abstract].

HUDSON, R. R., M. KREITMAN, and M. AGUADÉ, 1987  A test of neutral molecular evolution based on nucleotide data. Genetics 116:153-159[Abstract].

KAROTAM, J., A. C. DELVES, and J. G. OAKESHOTT, 1993  Conser-vation and change in structural and 5' flanking sequences of esterase 6 in sibling Drosophila species. Genetica 88:11-28[Medline].

KEITH, T. P., 1983  Frequency distribution of esterase-5 alleles in two populations of Drosophila pseudoobscura.. Genetics 105:135-155[Abstract].

KUHNER, M. K., D. A. LAWLOR, P. D. ENNIS, and P. PARHAM, 1991  Gene conversation in the evolution of the human and chimpanzee MHC class I loci. Tissue Antigens 38:152-164[Medline].

KUMAR, S., K. TAMURA and M. NEI, 1993 MEGA Molecular Evolutionary Genetics Analysis. Version 1.01. The Pennsylvania State University, University Park, PA.

LEWONTIN, R. C. and J. L. HUBBY, 1966  A molecular approach to the study of genic heterozygosity in natural populations. II. Amount of variation and degree of heterozygosity in natural populations of Drosophila pseudoobscura.. Genetics 54:595-609[Free Full Text].

MCDONALD, J. and M. KREITMAN, 1991  Adaptive protein evolution at the Adh locus in Drosophila.. Nature 351:652-654[Medline].

MORIYAMA, E. N. and J. R. POWELL, 1996  Intraspecific nuclear DNA variation in Drosophila.. Mol. Biol. Evol. 13:261-277[Abstract].

NEI, M., 1987 Molecular Evolutionary Genetics. Columbia University Press, New York.

NEI, M. and T. GOJOBORI, 1986  Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol. Biol. Evol. 3:418-426[Abstract].

OHTA, T., 1994  Further evidence of evolution by gene duplication revealed through DNA sequence comparisons. Genetics 138:1331-1337[Abstract].

ROZAS, J. and R. ROZAS, 1995  DnaSP: DNA sequence polymorphism—an interactive program for estimating population genetics parameters from DNA sequence data. Comput. Appl. Biosci. 11:621-625[Abstract/Free Full Text].

RUSSO, C. A. M., N. TAKEZAKI, and M. NEI, 1995  Molecular phylogeny and divergence times of Drosophilid species. Mol. Biol. Evol. 12:391-404[Abstract].

SAMBROOK, J., E. F. FRITSCH and T. MANIATIS, 1989 Molecular Cloning: A Laboratory Manual, Ed. 2, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY.

SMITH, S. W., R. OVERBEEK, C. R. WOESE, W. GILBERT, and P. M. GILLEVET, 1994  The genetic data environment an expandable GUI for multiple sequence analysis. Comput. Appl. Biosci. 10:6715.

SWOFFORD, D. L., 1992 PAUP: Phylogenetic analysis using parsimony, portable version (Unix) 3.0r+4 (pre-release 0.4). Illinois Natural History Survey, Champaign.

TAJIMA, F., 1993 Measurement of DNA polymorphism, pp. 37–59 in Mechanisms of Molecular Evolution, edited by N. TAKAHATA and A. G. CLARK. Sinauer, Sunderland, MA.

THOMPSON, J. D., D. G. HIGGINS, and T. J. GIBSON, 1994  CLUSTALW: improving the sensitivity of progressive sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucl. Acid. Res. 22:4673-4680[Abstract/Free Full Text].

VEUILLE, M. and L. M. KING, 1995  Molecular basis of polymorphism at the esterase-5B locus in Drosophila pseudoobscura.. Genetics 141:255-262[Abstract].

WINES, D. R., J. M. BRADY, E. M. SOUTHARD, and R. J. MACDONALD, 1991  Evolution of the rat kallikrein gene family: gene conversion leads to functional diversity. J. Mol. Evol. 32:476-492[Medline].

XIONG, Y., B. SAKAGUCHI, and T. H. EICKBUSH, 1988  Gene conversion can generate sequence variants in the late chorion multigene families of Bombyx mori.. Genetics 120:221-231[Abstract].

YAMAZAKI, T., 1971  Measurement of fitness at the esterase-5 locus in Drosophila pseudoobscura.. Genetics 67:579-603[Free Full Text].




This article has been cited by other articles:


Home page
Mol Biol EvolHome page
E. S. Balakirev, V. R. Chechetkin, V. V. Lobzin, and F. J. Ayala
Entropy and GC Content in the {beta}-esterase Gene Cluster of the Drosophila melanogaster Subgroup
Mol. Biol. Evol., October 1, 2005; 22(10): 2063 - 2072.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
K. Thornton and M. Long
Excess of Amino Acid Substitutions Relative to Polymorphism Between X-Linked Duplications in Drosophila melanogaster
Mol. Biol. Evol., February 1, 2005; 22(2): 273 - 284.
[Abstract] [Full Text] [PDF]


Home page
Mol. Cell. Biol.Home page
K. T. Nishant, H. Ravishankar, and M. R. S. Rao
Characterization of a Mouse Recombination Hot Spot Locus Encoding a Novel Non-Protein-Coding RNA
Mol. Cell. Biol., June 15, 2004; 24(12): 5620 - 5634.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
E. Bosch, M. E. Hurles, A. Navarro, and M. A. Jobling
Dynamics of a Human Interparalog Gene Conversion Hotspot
Genome Res., May 1, 2004; 14(5): 835 - 844.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
K. M. Teshima and H. Innan
The Effect of Gene Conversion on the Divergence Between Duplicated Genes
Genetics, March 1, 2004; 166(3): 1553 - 1560.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
Y.-w. Zhang, O. A. Ryder, and Y.-p. Zhang
Intra- and Interspecific Variation of the CCR5 Gene in Higher Primates
Mol. Biol. Evol., October 1, 2003; 20(10): 1722 - 1729.
[Abstract] [Full Text]


Home page
GeneticsHome page
H. Innan
The Coalescent and Infinite-Site Model of a Small Multigene Family
Genetics, February 1, 2003; 163(2): 803 - 810.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
M. Kovacevic and S. W. Schaeffer
Molecular Population Genetics of X-Linked Genes in Drosophila pseudoobscura
Genetics, September 1, 2000; 156(1): 155 - 172.
[Abstract] [Full Text]


Home page
GeneticsHome page
D. M. Weinreich and D. M. Rand
Contrasting Patterns of Nonneutral Evolution in Proteins Encoded in Nuclear and Mitochondrial Genomes
Genetics, September 1, 2000; 156(1): 385 - 399.
[Abstract] [Full Text]


Home page
GeneticsHome page
H. Akashi
Inferring the Fitness Effects of DNA Mutations From Polymorphism and Divergence Data: Statistical Power to Detect Directional Selection Under Stationarity and Free Recombination
Genetics, January 1, 1999; 151(1): 221 - 238.
[Abstract] [Full Text]