Genetics, Vol. 165, 1901-1914, December 2003, Copyright © 2003

Nucleotide Variation of the Est-6 Gene Region in Natural Populations of Drosophila melanogaster

Evgeniy S. Balakireva,b,c and Francisco J. Ayalaa
a Department of Ecology and Evolutionary Biology, University of California, Irvine, California 92697-2525,
b Institute of Marine Biology, Vladivostok 690041, Russia
c Academy of Ecology, Marine Biology and Biotechnology, Far Eastern State University, Vladivostok 690600, Russia

Corresponding author: Francisco J. Ayala, 321 Steinhaus Hall, University of California, Irvine, CA 92697-2525., fjayala{at}uci.edu (E-mail)

Communicating editor: M. ASMUSSEN


*  ABSTRACT
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

We have investigated nucleotide polymorphism in the Est-6 gene region in four samples of Drosophila melanogaster derived from natural populations of East Africa (Zimbabwe), Europe (Spain), North America (California), and South America (Venezuela). There are two divergent sequence types in the North and South American samples, which are not perfectly (North America) or not at all (South America) associated with the Est-6 allozyme variation. Less pronounced or no sequence dimorphism occurs in the European and African samples, respectively. The level of nucleotide diversity is highest in the African sample, lower (and similar to each other) in the samples from Europe and North America, and lowest in the sample from South America. The extent of linkage disequilibrium is low in Africa (1.23% significant associations), but much higher in non-African populations (22.59, 21.45, and 37.68% in Europe, North America, and South America, respectively). Tests of neutrality with recombination are significant in non-African samples but not significant in the African sample. We propose that demographic history (bottleneck and admixture of genetically different populations) is the major factor shaping the nucleotide patterns in the Est-6 gene region. However, positive selection modifies the pattern: balanced selection creates elevated levels of nucleotide variation around functionally important (target) polymorphic sites (RsaI-/RsaI+ in the promoter region and F/S in the coding region) in both African and non-African samples; and directional selection, acting during the geographic expansion phase of D. melanogaster, creates an excess of very similar sequences (RsaI- and S allelic lineages, in the promoter and coding regions, respectively) in the non-African samples.


FROM the very beginning of the "allozyme era," esterase 6 (Est-6) has been one of the most investigated and informative molecular markers in Drosophila population, evolutionary, and development genetics (reviewed by OAKESHOTT et al. 1989 Down, OAKESHOTT et al. 1993 Down, OAKESHOTT et al. 1995 Down; KOROCHKIN et al. 1990 Down; RICHMOND et al. 1990 Down). WRIGHT 1961 Down, WRIGHT 1963 Down described two main allozymes (Fast and Slow) of EST-6, showed their Mendelian inheritance, found a differential response to the organophosphate inhibitor, and raised questions about the adaptive significance of the polymorphism. The main allozymes show large-scale latitudinal clines (OAKESHOTT et al. 1981 Down), with the Slow allozyme more common at higher latitudes. This, together with other data on the temporal and geographic allozyme variation in natural populations and results of laboratory experiments, suggests that the EST-6 polymorphism is maintained by some form of positive selection (reviewed by OAKESHOTT et al. 1989 Down, OAKESHOTT et al. 1993 Down, OAKESHOTT et al. 1995 Down; RICHMOND et al. 1990 Down).

The Est-6 gene is on the left arm of chromosome 3 of Drosophila melanogaster, at cytogenetic map position 69A1–A3 (PROCUNIER et al. 1991 Down). OAKESHOTT et al. 1987 Down first obtained the nucleotide sequence and characterized the exon-intron structure of the Est-6 gene. Using available information on nine other eukaryotic esterases, they identified the active site and other functionally important regions of the gene. The coding region of Est-6 is 1686 bp long and consists of two exons (1387 and 248 bp) and a small (51-bp) intron. The gene encodes the major ß-carboxyl esterase (EST-6) that is transferred by D. melanogaster males to females in the seminal fluid during copulation (RICHMOND et al. 1980 Down) and affects the female's consequent behavior and mating proclivity (GROMKO et al. 1984 Down). The Est-6 gene is duplicated (COLLET et al. 1990 Down) but there is evidence that the adjacently located duplicate, referred to as Est-P (COLLET et al. 1990 Down) or Est-7 (DUMANCIC et al. 1997 Down), may be a pseudogene ({psi}Est-6, BALAKIREV and AYALA 1996 Down; BALAKIREV et al. 2003 Down; but see DUMANCIC et al. 1997 Down). The ß-esterase gene cluster in other Drosophila species also includes two (three in D. pseudoobscura) closely linked genes (YENIKOLOPOV et al. 1989 Down; BRADY et al. 1990 Down; EAST et al. 1990 Down; OAKESHOTT et al. 1993 Down, OAKESHOTT et al. 1995 Down).

The expression of Est-6 in D. melanogaster has been investigated using P-element-mediated transformation (LUDWIG et al. 1993 Down; HEALY et al. 1996 Down; TAMARINA et al. 1997 Down). Within the ~1.2 kb of the 5'-flanking region, several independently acting cis-regulatory promoter elements that control the expression of the gene in different tissues have been identified. GAME and OAKESHOTT 1990 Down investigated restriction site polymorphism and its association with functional variation within a 21.5-kb region including the Est-6 gene and found that a restriction polymorphism at an RsaI site in the 5'-flanking region of Est-6 shows a significant association with male amount and activity of EST-6. Given other evidence showing that differences in male EST-6 activity affect the reproductive success of their mates (RICHMOND et al. 1990 Down), GAME and OAKESHOTT 1990 Down concluded that Est-6 cis-acting regulatory polymorphisms may be important contributors to adaptive variation. Indeed, OAKESHOTT et al. 1994 Down and SAAD et al. 1994 Down have detected significant associations between the fitness components (preadult viability, development time, time to mating, remating frequency, egg production, and fertility) of D. melanogaster and the EST-6 activity level.

COOKE and OAKESHOTT 1989 Down sequenced the complete coding region of Est-6 in 13 D. melanogaster lines in an Australian population (chosen so as to include all allozyme variants known in the population). They suggested that the main Fast and Slow allozymes differ by two amino acids (Asn/Asp at position 237 and Thr/Ala at position 247; but see HASSON and EANES 1996 Down and BALAKIREV et al. 1999 Down) and considered these two amino acid replacements as the most likely targets for selection underlying the previously detected latitudinal clines (OAKESHOTT et al. 1981 Down). ODGERS et al. 1995 Down sequenced 974 bp of the Est-6 5'-flanking region in D. melanogaster and identified a nucleotide substitution responsible for the RsaI polymorphism (T -> G at -531). They also revealed the presence of two highly diverged haplotype groups and a peak of polymorphism around the RsaI site. ODGERS et al. 1995 Down showed that the RsaI+ haplotype group yields ~25% more EST-6 enzyme activity in adult males than does the RsaI- haplotype and detected weak disequilibrium between the promoter polymorphism and the Fast/Slow allozyme polymorphism. Later, ODGERS et al. 2002 Down carried out P-element-mediated germ-line transformation, fusing representative promoter alleles to an identical Est-6 coding region. They found a twofold difference in EST-6 activity in the male anterior sperm ejaculatory duct. ODGERS et al. 2002 Down also conducted restriction fragment length polymorphism (RFLP) and sequencing of the promoter region in populations from Africa, America, Asia, and Australia and detected significant deviation from neutral expectations in the non-African samples but not in the African one. HASSON and EANES 1996 Down investigated the nucleotide polymorphism of the Est-6 coding region in 16 lines from disparate parts of the world, selected on the basis of the presence/absence of the cosmopolitan inversion In (3L) Payne, and detected shared polymorphisms between St and In (3L) Payne chromosomes, indicating extensive genetic exchange between arrangements. BALAKIREV et al. 1999 Down sequenced 15 alleles of the Est-6 coding region from a Californian population and found two highly differentiated haplotypes, one encompassing the Fast alleles and the other consisting of Slow alleles. They also detected a distinct peak of increased variation in the region surrounding the replacement site responsible for the EST-6 Fast/Slow allozyme polymorphism and suggested that balancing selection might be involved in the polymorphism. All these studies involve samples that are too small (COOKE and OAKESHOTT 1989 Down; HASSON and EANES 1996 Down; BALAKIREV et al. 1999 Down), nonrandom, or both (COOKE and OAKESHOTT 1989 Down; HASSON and EANES 1996 Down) and thus unsuitable for certain population genetic tests (HUDSON et al. 1994 Down; SIMONSEN et al. 1995 Down).

We (BALAKIREV et al. 2002 Down) increased the sample size and the length of the region sequenced to carry out significant tests of neutrality and to analyze the possible association between the regulatory and structural nucleotide polymorphism, seeking also to test for linkage disequilibrium within the gene region, a possibility suggested by the patterns observed in our previous study (BALAKIREV et al. 1999 Down). We investigated the 5'-flanking, coding, and 3'-flanking regions of the Est-6 gene (3062 bp total) in a random sample of 30 lines (and thus large enough for the population genetic tests; see HUDSON et al. 1994 Down; SIMONSEN et al. 1995 Down) of D. melanogaster from a natural population of California. We detected a highly structured pattern of variability, with distinctive features in the coding and 5'-flanking regions. We discovered two distinct allelic lineages for the promoter and coding region of the Est-6 gene. The pattern of variability was complex and differed between the coding and the 5'-flanking regions, although the level of nucleotide diversity was very similar in the two regions. We detected strong linkage disequilibrium within the 5'-flanking region and Est-6 coding region separately but it was much less pronounced between these two functional regions of the gene. The neutrality tests of KELLY 1997 Down and WALL 1999 Down incorporating recombination were highly significant for the studied regions. We suggested that the Est-6 nucleotide polymorphism is shaped by a combination of directional and balancing selection acting on the promoter and coding region polymorphisms and by interactions between the two regions due to different degrees of hitchhiking (BALAKIREV et al. 2002 Down).

We now present the analysis of nucleotide variation of the Est-6 gene region in three additional samples of D. melanogaster derived from the natural populations of East Africa (Zimbabwe), Europe (Spain), and South America (Venezuela). The motivation for examining this gene in different populations is to analyze the pattern of nucleotide variation in the ancestral (African) and derived (European and American) D. melanogaster populations; we attempt further to clarify the question concerning the evolutionary forces shaping the regulatory (RsaI+/RsaI-) and structural [Fast/Slow (F/S)] nucleotide polymorphisms. ODGERS et al. 1995 Down, ODGERS et al. 2002 Down could not analyze the association between the regulatory and structural nucleotide polymorphisms, because they did not sequence the Est-6 coding region in the same lines of D. melanogaster for which they obtained the promoter region sequences.


*  MATERIALS AND METHODS
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

Drosophila strains:
D. melanogaster strains were derived from random samples of wild flies collected in Europe (Spain), North America (California), and South America (Venezuela). The strains were made fully homozygous for the third chromosome by crosses with balancer stocks, as described by SEAGER and AYALA 1982 Down. The strains were named, in accordance with the electrophoretic alleles they carry for esterase-6 (the letter before the hyphen) and superoxide dismutase (the letter after the number), Ultra Slow (US), Slow (S), and Fast (F) (Fig 1). Chung-I Wu kindly provided the D. melanogaster strains from East Africa (Sengwa and Harare, Zimbabwe). The strain Zim S-44F (Zimbabwe) is from F. J. Ayala's laboratory.




View larger version (97K):
In this window
In a new window
Download PPT slide
 
Figure 1. The lines of D. melanogaster from East Africa [Zimbabwe (Zim)], Europe [Barcelona, Spain (Bar)], North America [El Rio, California (ER)], and South America [Caracas, Venezuela (Ven)] are presented sequentially. The lines within each population are grouped according to their genetic similarity. The S, US, and F letters before the line numbers refer to the EST-6 allozymes, Slow, Ultra-Slow, and Fast. The S and F after the numbers refer to the allozyme polymorphism at the Sod locus (except in Zimbabwe, where Sod has not been investigated) and have been previously used to tag these lines. The second letter in the Zimbabwe lines (not in Zim S-44F) refers to the locality of collection (Sengwa or Harare). The line Zim S-44F is from F. J. Ayala's laboratory. The numbers above the top sequence represent the position of segregating sites and the start of a deletion or insertion. Nucleotides are numbered from the beginning of our sequence (position 32 in COLLET et al. 1990 Down). The coding region (exon I and exon II) of the Est-6 gene is underlined below the reference sequence (ER S-26F). Amino acid replacement polymorphisms are marked with asterisks. The RsaI polymorphism is determined by site 653, where RsaI+ has T and RsaI- has G; the S-F allozyme polymorphism is determined by site 1959, where S has A (asparagine) and F has G (aspartic acid). Dots indicate the same nucleotide as the reference sequence. Hyphens represent deleted nucleotides. Question marks indicate missing data. {blacktriangleup} denotes a deletion; {dagger} denotes the absence of a deletion; {blacktriangledown} denotes an insertion; {ddagger} denotes the absence of an insertion. {blacktriangleup}1, 5-bp deletion of CTTTT; {blacktriangleup}2, 19-bp deletion of TTCTATTTTGTCGCAAGCA; {blacktriangleup}3, single-nucleotide deletion of T; {blacktriangledown}1, 35-bp insertion of AGTAATTGTAATAATAATATAATAGTAATTTTGAT; {blacktriangledown}2, single-nucleotide insertion of A; {blacktriangleup}4, 2-bp deletion of AA; {blacktriangleup}5, 9-bp deletion of CAAACCTAA; {blacktriangleup}6, 3-bp deletion of GAT; {blacktriangledown}3, 3-bp insertion of TGT.

DNA extraction, amplification, and sequencing:
Methods are as previously described (BALAKIREV et al. 2003 Down). The sequences of both strands were determined for each line, using 12 overlapping internal primers spaced, on average, 350 nucleotides. (See GenBank accessions AF526538, AF526539, AF526540, AF526541, AF526542, AF526543, AF526544, AF526545, AF526546, AF526547, AF526548, AF526549, AF526550, AF526551, AF526552, AF526553, AF526554, AF526555, AF526556, AF526557, AF526558, AF526559, AF150809, AF150810, AF150811, AF150812, AF150813, AF150814, AF150815, AF147095–147102, and AF217624, AF217625, AF217626, AF217627, AF217628, AF217629, AF217630, AF217631, AF217632, AF217633, AF217634, AF217635, AF217636, AF217637, AF217638, AF217639, AF217640, AF217641, AF217642, AF217643, AF217644, AF217645). At least two independent PCR amplifications were sequenced for each polymorphic site in all D. melanogaster strains to prevent possible PCR and sequencing errors.

DNA sequence analysis:
The Est-6 sequences were assembled using the program SeqMan (Lasergene, 1994–1997; DNASTAR, Madison, WI). The computer programs DnaSP, version 3.4 (ROZAS and ROZAS 1999 Down), and PROSEQ, version 2.4 (FILATOV and CHARLESWORTH 1999 Down), were used for most intraspecific analyses. Departures from neutral expectations were investigated using KELLY's (1997) and WALL's (1999) tests on the basis of linkage disequilibrium between segregating sites and incorporating recombination. The permutation approach of HUDSON et al. 1992A Down, HUDSON et al. 1992B Down was used to estimate the significance of sequence differences between populations and haplotype families. Simulations based on the algorithms of the coalescent process with recombination (HUDSON 1990 Down) were performed with the PROSEQ program to estimate the probabilities of the observed values of Kelly's ZnS and Wall's B and Q statistics. The coalescent approach was also used to estimate confidence intervals for the nucleotide diversity values. The program Geneconv version 1.81 (SAWYER 1999 Down) was used to detect gene conversion events. The population recombination rate was analyzed with the permutation-based approach (MCVEAN et al. 2002 Down) on the basis of the approximate-likelihood coalescent method of HUDSON 2001 Down.


*  RESULTS
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

Nucleotide polymorphism and recombination:
The sequenced region consists of 3066 bp (2498 bp in the African sample). Fig 1 shows a total of 121 polymorphic sites (124 mutations because of three different nucleotides at each of positions 763, 1391, and 2396) in a sample of 78 sequences of the Est-6 gene from four populations of D. melanogaster: 45 sites (46 mutations) in the 5'-flanking region (3 sites, positions 329, 405, and 424, are associated with deletions), 49 sites (51 mutations) in exon I, 2 sites in the intron, 5 sites in exon II, and 20 sites in the intergenic region. Within the Est-6 exons we detected 20 replacement and 34 synonymous polymorphic sites. Nine length polymorphisms, six deletions ({blacktriangleup}1–{blacktriangleup}6), and three insertions ({blacktriangledown}1–{blacktriangledown}3) occur within the whole sequenced region (Fig 1).

The length of the 5'-flanking region sequenced in the East-African sample is 619 bp but 1183 bp in the other samples. To obtain comparable estimates of nucleotide variation in all samples, we restrict the analysis of the 5'-flanking region to the 619 bp ("standard length") sequenced in all. Table 1 shows estimates of nucleotide diversity for the standard length of the Est-6 gene and flanking regions. The {pi} value for the full sequence is 0.0060 ± 0.0005, which is within the range of values observed in other highly recombining gene regions of D. melanogaster (MORIYAMA and POWELL 1996 Down). The {pi} value is very similar in the 5'-flanking (0.0060 ± 0.0007) and Est-6 regions (0.0057 ± 0.0005), but higher in the intergenic region (0.0094 ± 0.0018). The synonymous variation (0.0160) is 6.7 times higher than the nonsynonymous variation (0.0024) in the Est-6 coding region. This sort of difference is expected if there is selective constraint on the Est-6 nonsynonymous substitution rate. The level of silent divergence is at least 2.0 times higher for the Est-6 gene than for the 5'-flanking or intergenic region (Table 1). The level of nucleotide diversity is highest in the African sample ({pi} = 0.0092 ± 0.0008) and lowest in the sample from South America ({pi} = 0.0034 ± 0.0007). Intermediate (and very similar) values of nucleotide diversity are observed in the European ({pi} = 0.0055 ± 0.0008) and North American ({pi} = 0.0060 ± 0.0008) samples (Table 1).


 
View this table:
In this window
In a new window

 
Table 1. Nucleotide diversity and divergence in the Est-6 gene region of D. melanogaster

Previously, we detected in the California population lower polymorphism in the coding region of the S haplotypes than in that of the F haplotypes and lower polymorphism in the promoter region of the RsaI- haplotypes than in that of the RsaI+ haplotypes. We also noted that the "double sweep" (RsaI-/S) haplotypes (the haplotypes that have the more common mutations in both the promoter and coding region) were least variable (BALAKIREV et al. 2002 Down). A similar tendency is observed in the East-African and European samples but not in the South American sample (Table 2). The South American sample is unique in the sense that it has no F allelic lineage at Est-6 (see Fig 1). [We note that this population also lacks the S allele at the Sod locus (HUDSON et al. 1994 Down).]


 
View this table:
In this window
In a new window

 
Table 2. Nucleotide diversity in different allelic lineages of the Est-6 gene

The method of HUDSON and KAPLAN 1985 Down reveals a minimum of 20 recombination events in the whole region analyzed: 3 for the 5'-flanking region, 16 for the Est-6 gene, and 1 between them. The population recombination rate (MCVEAN et al. 2002 Down) is 0.0216 for the combined data set (Table 3), which is about three times less than the laboratory estimate of recombination rate (0.0664) based on the physical and genetic maps of D. melanogaster (J. M. COMERON, personal communication; COMERON et al. 1999 Down; BALAKIREV et al. 2002 Down). The rate of recombination is several times greater in the African than in the non-African samples (Table 3). The lowest recombination occurs in the South American sample. There is a positive correlation between nucleotide variation and recombination rate, as observed elsewhere (e.g., BEGUN and AQUADRO 1992 Down; see Table 1 and Table 3).


 
View this table:
In this window
In a new window

 
Table 3. Recombination estimate

The method of SAWYER 1989 Down, SAWYER 1999 Down detects gene conversion events within the Est-6 gene in all samples except Venezuela. The number of significant fragments varies from 1 (Africa) to 14 (North America). The average length of fragments is 636 bp (range 314–1183 bp). The conversion events are less pronounced in the protein alignment (only 2 significant fragments, 1 in Africa and 1 in North America), which suggests the involvement mostly of silent sites in significant fragments of the nucleotide alignment.

Haplotype structure and differentiation of populations:
Maximum haplotype diversity occurs in East Africa (Hdiv = 1.000; no identical sequence pairs); less occurs in Europe (Hdiv = 0.895; 16 identical sequence pairs) and North America (Hdiv = 0.947; 20 identical sequence pairs); and the minimum occurs in South America (Hdiv = 0.621; 72 identical sequence pairs).

Fig 2 shows a neighbor-joining tree of the Est-6 sequences (standard length). Due to recombination and gene conversion, this tree is not a good reflection of the genealogical process, but it serves to show the genetic structure of the data. The tree shows a relative absence of geographic structure: the sequences from a given population do not all group together. However, recombination has not completely erased all information, since there are two clusters of haplotypes related to RsaI polymorphism (data not shown). The first cluster includes the sequences with the RsaI- haplotypes (all strains from Ven S-10F at the top to ER F-1461S at the bottom); the second cluster contains the RsaI+ haplotypes (all strains from ER S-255S down to Ven S-2F). The RsaI-/RsaI+ clusters are even more apparent in the tree for the promoter region only (data not shown). If we restrict the analysis only to the coding region, the two clusters obtained differ to some extent (but not exclusively) with respect to the S and F haplotypes (data not shown).



View larger version (11K):
In this window
In a new window
Download PPT slide
 
Figure 2. Neighbor-joining tree of the Est-6 haplotypes of D. melanogaster, based on Kimura's two-parameter distance. The numbers are bootstrap probability values based on 10,000 replications. The trees are based on the Est-6 standard length.

ODGERS et al. 1995 Down described two groups of haplotypes for the 5'-flanking region of the Est-6 gene of D. melanogaster from Australia. We detected two groups of haplotypes both for the Est-6 gene (including the 5'-flanking region) and for the {psi}Est-6 putative pseudogene from North America (California; BALAKIREV and AYALA 1996 Down; BALAKIREV et al. 1999 Down, BALAKIREV et al. 2002 Down, BALAKIREV et al. 2003 Down). Two significantly divergent sequence types are also detected in South America (Fig 3A), where only the Slow Est-6 allozyme occurs. The average number of nucleotide differences (K) between the two haplotypes is 11.286. This is comparable with the differences between RsaI+/RsaI- (K = 6.720) and F/S (K = 11.809) allelic lineages in California (BALAKIREV et al. 2002 Down). The permutation test (HUDSON et al. 1992A Down) of the Venezuelan haplotypes is highly significant, K*st = 0.5867 (P = 0.0000). Sequence dimorphism is less pronounced in the European sample (Fig 3B). The two divergent sequence types are not associated with Est-6 allozyme variation (South America) or imperfectly associated (Europe, North America). The African sample (Fig 3C) has no clear sequence dimorphism (although all S haplotypes but one cluster together).





View larger version (57K):
In this window
In a new window
Download PPT slide
 
Figure 3. Neighbor-joining tree of the Est-6 haplotypes of D. melanogaster from South America (A), Europe (B), and Africa (C) based on Kimura's two-parameter distance. The numbers are bootstrap probability values based on 10,000 replications.

The estimates of population differentiation (HUDSON et al. 1992A Down) are fairly similar between the pairs Zim-Bar (Fst = 0.0653), Zim-ER (Fst = 0.0398), Bar-Ven (Fst = 0.1093), and ER-Ven (Fst = 0.0920) (for locality abbreviations see Fig 1). The maximal and minimal Fst values are obtained, respectively, for the pairs Zim-Ven (Fst = 0.1508) and Bar-ER (Fst = -0.0059). We assess the statistical significance of the Fst values with the permutation method of HUDSON et al. 1992B Down, with 10,000 permutations. The differences are significant (P < 0.05) between Africa and all other samples (Europe, North America, and South America), a result consistent with other data (BEGUN and AQUADRO 1993 Down, BEGUN and AQUADRO 1995 Down). The differences between European and the North or South American samples are not significant (P > 0.05).

Sliding-window analysis:
Fig 4 shows the distribution of polymorphism along the Est-6 sequences. There is a distinct peak in the 5'-flanking region, which includes the RsaI+/RsaI- site (position 653 in Fig 1). ODGERS et al. 1995 Down detected this peak of variation in an Australian population of D. melanogaster. We also detected this peak (BALAKIREV et al. 1999 Down, BALAKIREV et al. 2002 Down, BALAKIREV et al. 2003 Down) in the Californian population of D. melanogaster. Another distinct peak of variation occurs around the F/S site except in Venezuela. We detected this peak (BALAKIREV et al. 1999 Down, BALAKIREV et al. 2002 Down, BALAKIREV et al. 2003 Down) in our Californian data and also in data of HASSON and EANES 1996 Down and COOKE and OAKESHOTT 1989 Down and suggested that it may reflect the effect of balancing selection (STROBECK 1983 Down; HUDSON and KAPLAN 1988 Down) between the F and S haplotypes, rather than within them (BALAKIREV et al. 2002 Down). The absence of the peak in Venezuela may be a consequence of the absence of F haplotypes in this sample (Fig 1). The strong presence of both the promoter (RsaI+/RsaI-) and coding region (F/S) peaks in the African sample (Fig 4) suggests that these polymorphic sites were targets of balancing selection already in the African population (from which the others derive by migration).



View larger version (18K):
In this window
In a new window
Download PPT slide
 
Figure 4. Sliding-window plots of nucleotide diversity ({pi}) along the Est-6 gene region of D. melanogaster. A schematic of the Est-6 gene is displayed at bottom. Exons are indicated by open boxes; the intron and the 5'- and 3'-flanking regions are shown by thin lines. Window sizes are 100 nucleotides with 1-nucleotide increments. The locations of the RsaI and allozyme polymorphisms are marked.

The valley regions located between the peaks of nucleotide variation are centered around positions 350, 1200, and 1800 (Fig 4). The first valley region includes nearly 400 bp upstream of the Est-6 coding region. KAROTAM et al. 1993 Down, KAROTAM et al. 1995 Down and ODGERS et al. 1995 Down detected strong conservation and low nucleotide variation of this region in D. melanogaster, D. simulans, and D. mauritiana. The region is under strong functional constraint because it contains several regulatory elements that are essential for Est-6 expression (LUDWIG et al. 1993 Down). Another valley region (1100–1300 bp) corresponds to amino acid residues Arg-159, Asp-181, and Ser-209 (codons at nucleotide sites 475–477, 541–543, and 625–627; positions 1094–1096, 1160–1162, and 1244–1246 in our coordinates). These residues (along with the surrounding sequences) are highly conserved in different esterases and are likely to be important for esterase enzymatic function (MYERS et al. 1988 Down). A third valley region encompasses the potential N-linked glycosylation site, corresponding to codon position 1258–1260 (1877–1879 in our coordinates). The correspondence between the level of polymorphism and localities of functionally important sites implicated in the catalytic mechanism suggests that the observed valley regions reflect functional constraint.

We have measured heterogeneity in the distribution of silent polymorphic sites along the Est-6 sequence and discordance between the level of within-melanogaster polymorphism and the melanogaster-simulans divergence by means of GOSS and LEWONTIN's (1996) and MCDONALD's (1996, 1998) statistics and have assessed their significance by Monte Carlo simulations of the coalescent model incorporating recombination (MCDONALD 1996 Down, MCDONALD 1998 Down). On the basis of 10,000 simulations, with the recombination parameters varying from 1 to 64, the tests are not significant for any of the separate samples or for the combined data set (data not shown).

Linkage disequilibrium:
Linkage disequilibrium (LD) is measured by calculating the P value of Fisher's exact test in all pairwise comparisons between polymorphic sites. For the whole standard region (2498 bp) there are 1485 pairwise comparisons and 467 (31.45%) of them are significant. (With the Bonferroni correction, 11.92% remain significant; Bonferroni-corrected values are italicized in the ensuing sentences.) For the 5'-flanking region 25 of 78 (32.05%; 23.08%) pairwise comparisons are significant. For the Est-6 coding region (including the intron) 219 of 528 (41.48%; 23.11%) comparisons are significant. There are 19.58% (1.17%) significant associations between the 5'-flanking region and the Est-6 gene coding region. The proportion of pairs of sites with LD values significantly different from zero, at the 5% level, is much higher within the 5'-flanking region and Est-6 coding region (244 of 606 pairwise comparisons) than between them (84 of 429, Fisher's exact test, P < 0.001; Fisher's criterion F = 52.919; P < 0.001). This observation corroborates our hypothesis (BALAKIREV et al. 2002 Down) that the promoter and coded regions are subject to separate selection processes.

Linkage disequilibrium is notably low in the African sample: only 1.23% significant associations are in this sample, but 22.59, 21.45, and 37.68% are in the European, North American, and South American samples, respectively. Fig 5 shows the distribution of D values along the whole region studied. A notable peak is around the F/S site and a less pronounced peak is around the RsaI-/RsaI+ site.



View larger version (19K):
In this window
In a new window
Download PPT slide
 
Figure 5. Sliding-window plot of linkage disequilibrium (measured by D) along the Est-6 gene region of D. melanogaster. A schematic of the Est-6 putative pseudogene is displayed at bottom. Window sizes are 130 nucleotides with 60-nucleotide increments.

The significance of Pearson's correlation coefficient between LD and physical distance between sites is estimated by 10,000 permutations (MCVEAN et al. 2002 Down). For all samples, except South America, there is significant decline in LD with increasing distance (Table 4). The strong haplotype structure and pattern of linkage disequilibrium suggest that the South American population originated from a recent admixture of genetically differentiated populations.


 
View this table:
In this window
In a new window

 
Table 4. Correlation between linkage disequilibrium and physical distance between the Est-6 (full-sequence) polymorphic sites

Tests of neutrality:
The tests of HUDSON et al. 1987 Down, TAJIMA 1989 Down, and DEPAULIS and VEUILLE 1998 Down do not reveal any significant deviation from neutrality for the Est-6 gene region in any of the four populations of D. melanogaster (see also BALAKIREV et al. 2002 Down). However, KELLY's (1997) ZnS and WALL's (1999) B and Q tests detect significant deviations from neutrality in the non-African samples, with the population recombination rate ranging from 0.005 to 0.010 (Table 5; data for B and Q are not shown). The tests fail to detect any significant deviation from neutrality for the African sample, even when using 0.0664 as the recombination rate (laboratory estimate and based on the physical and genetic maps of D. melanogaster; J. M. COMERON, personal communication; COMERON et al. 1999 Down; BALAKIREV et al. 2002 Down), which is at least 2.5 times higher than the value of recombination obtained by the method of MCVEAN et al. 2002 Down (Table 3). The significant values of Kelly's and Wall's statistics are grouped around the peaks of linkage disequilibrium and centered around the functionally important sites within both the 5'-flanking region (RsaI site) and the coding region (F/S polymorphism) of the Est-6 gene (data not shown), which has been interpreted as evidence that these sites are targets of balancing selection (AYALA et al. 2002 Down; BALAKIREV et al. 2002 Down, BALAKIREV et al. 2003 Down).


 
View this table:
In this window
In a new window

 
Table 5. Kelly's (1977) test of neutrality for the Est-6 gene region


*  DISCUSSION
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

We have investigated nucleotide polymorphism in the Est-6 gene region in four populations of D. melanogaster from Zimbabwe, Spain, California, and Venezuela. A dimorphic haplotype structure exists in the North American sample, which is not perfectly associated with the Est-6 allozyme variation (S/F) and in South America, where there are no Est-6 F haplotypes. The presence of two or more highly diverged haplotypes has been interpreted as a result of positive selection in D. melanogaster (see, e.g., HUDSON et al. 1994 Down, HUDSON et al. 1997 Down; BENASSI et al. 1999 Down; LABATE et al. 1999 Down). TEETER et al. 2000 Down investigated single-nucleotide polymorphism in 66 sequences of D. melanogaster spaced at 5- to 20-cM intervals and generated a map with no gaps greater than one-half of a chromosome arm (TEETER et al. 2000 Down). Two-thirds of all sequences were dimorphic. If the dimorphism results from positive selection, TEETER et al. 2000 Down estimate that one site for every few kilobases would be subject to strong positive selection, which seems improbable. TEETER et al. 2000 Down suggest that admixture between two differentiated populations of D. melanogaster would account for and be a more appropriate explanation of the dimorphism. Suggestions of admixture have also been made on the basis of nucleotide sequencing, RFLP, and allozyme analyses of D. melanogaster populations (e.g., DAVID and CAPY 1988 Down; SINGH and LONG 1992 Down; RICHTER et al. 1997 Down; HASSON et al. 1998 Down).

Our Est-6 data are compatible with this proposal. We have found a strong dimorphic haplotype structure in three other D. melanogaster genes on the third chromosome, Sod (HUDSON et al. 1997 Down), tinman, and bagpipe (E. S. BALAKIREV and F. J. AYALA, unpublished data), which may also have resulted from population admixture. Nevertheless, the Est-6 data suggest that positive selection may also contribute to the observed patterns: balanced selection would account for the elevated nucleotide variation and linkage disequilibrium around the target polymorphic sites (RsaI-/RsaI+ in the promoter region and F/S in the coding region), while directional selection would yield an excess of very similar sequences exhibiting a very low level of variability (RsaI- and S allelic lineages, in the promoter and coding region, respectively).

The African sample has the highest level of nucleotide diversity and the lowest level of linkage disequilibrium. The non-African samples show a pattern of haplotype distribution consistent with selective sweep hypotheses in the history of the species. The distribution of haplotype frequency in non-African samples is highly asymmetric: from a total of 66 sequences, 52 belong to the S haplotype and 48 belong to RsaI- haplotype. The haplotype test (HUDSON et al. 1994 Down) is significant for the North and South American (excluding the recombinant strain Ven S-13F) samples, but not significant for the European sample. We conclude that bottlenecks have been an important evolutionary factor changing the genetic composition of colonizing D. melanogaster populations. The haplotype structure and polymorphism of the Est-6 gene region are in accordance with the general pattern of relationships between the African and non-African populations of D. melanogaster (ANDOLFATTO 2001 Down; AQUADRO et al. 2001 Down). However, the peaks of nucleotide variation in the African sample, centered on functionally important sites (Fig 4), suggest that this population is not in mutation-drift equilibrium. The footprints of directional selection have been previously shown in African populations (e.g., MOUSSET et al. 2003 Down).

We found lower polymorphism in the S than in the F haplotypes (coding region) and lower polymorphism in the RsaI- than in the RsaI+ haplotypes (promoter region) in the California population (BALAKIREV et al. 2002 Down). The same pattern occurs in the other populations (excluding Venezuela, where no F haplotypes occur), as well as in the total data set encompassing all four populations (Table 2): {pi} is six times higher for the RsaI+ than for the RsaI- haplotypes; for the coding region, {pi} is twice as large for the F as for the S haplotypes but double (0.00695) for the F haplotypes. Thus the lower variability among RsaI- and S haplotypes is not limited to the California population. But the differences are smaller in the African sample, which could indicate that the RsaI- and S haplotypes increased in frequency in Europe and America after their colonization.

We propose that the RsaI+/F (zero-sweep) haplotypes may represent the ancestral condition (BALAKIREV et al. 2002). The frequency of these haplotypes is higher in Africa (0.333) than elsewhere (0.091). We also suggest that the RsaI-/S (double-sweep) haplotypes have evolved under directional selection, since they are less variable but more frequent in non-African samples (0.606) than in African (0.250). Directional selection, however, does not lead toward fixation of the double-sweep haplotypes in the derived populations because of balancing selection maintaining both divergent haplotypes (RsaI-/RsaI+ and F/S) in the promoter and coding regions (BALAKIREV et al. 2002 Down).

The population data available suggest two different migrations of D. melanogaster during the expansion period from the African continent: (1) Africa -> Europe -> North America and (2) Africa -> South America (see also DAVID and CAPY 1988 Down; SINGH and LONG 1992 Down). The second migration is supported by the fact that the East-African and South American samples share a deletion ({blacktriangleup}6, Fig 1) that is absent in other samples. This deletion is present in 5 of 12 East-African strains but absent in Europe and North America (Fig 1). Gaps constitute a valuable source of phylogenetic information (GIRIBET and WHEELER 1999 Down). The absence of the F Est-6 allele (and of the S Sod allele; HUDSON et al. 1994 Down) also suggests that the South American population does not derive from Europe or America. The South American population might represent an admixture of migrants from North America and Africa. The most common haplotype (RsaI-/S) is from North America, while the haplotype RsaI+/S clusters with most of the African strains (Fig 2). The admixture would have been recent, since the strong haplotype structure has not been eroded by recombination (linkage disequilibrium is highest in the South America sample).


*  ACKNOWLEDGMENTS

We are grateful to G. McVean, D. A. Filatov, J. K. Kelly, J. H. McDonald, J. D. Wall, J. M. Comeron, F. Depaulis, and J. Rozas for useful advice on analyses and for providing computer programs. We thank Elena Balakireva, Andrei Tatarenkov, Victor DeFilippis, Martina Zurovkova, and Carlos Márquez for encouragement and help; and W. M. Fitch, B. Gaut, R. R. Hudson, A. Long, and two anonymous reviewers for detailed and valuable comments. This work is supported by National Institutes of Health grant GM42397 to F. J. Ayala.

Manuscript received February 27, 2003; Accepted for publication August 20, 2003.


*  LITERATURE CITED
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

ANDOLFATTO, P., 2001  Contrasting patterns of X-linked and autosomal nucleotide variation in Drosophila melanogaster and Drosophila simulans.. Mol. Biol. Evol. 18:279-290.[Abstract/Free Full Text]

AQUADRO, C. F., V. B. DUMONT, and F. A. REED, 2001  Genome-wide variation in the human and fruitfly: a comparison. Curr. Opin. Genet. Dev. 11:627-634.[Medline]

AYALA, F. J., E. S. BALAKIREV, and A. G. SÁEZ, 2002  Genetic polymorphism at two linked loci, Sod and Est-6, in Drosophila melanogaster.. Gene 300:19-29.[Medline]

BALAKIREV, E. S. and F. J. AYALA, 1996  Is esterase-P encoded by a cryptic pseudogene in Drosophila melanogaster? Genetics 144:1511-1518.[Abstract]

BALAKIREV, E. S., E. I. BALAKIREV, F. RODRIGUEZ-TRELLES, and F. J. AYALA, 1999  Molecular evolution of two linked genes, Est-6 and Sod, in Drosophila melanogaster.. Genetics 153:1357-1369.[Abstract/Free Full Text]

BALAKIREV, E. S., E. I. BALAKIREV, and F. J. AYALA, 2002  Molecular evolution of the Est-6 gene in Drosophila melanogaster: contrasting patterns of DNA variability in adjacent functional regions. Gene 288:167-177.[Medline]

BALAKIREV, E. S., V. R. CHECHETKIN, V. V. LOBZIN, and F. J. AYALA, 2003  DNA polymorphism in the ß-esterase gene cluster of Drosophila melanogaster.. Genetics 164:533-544.[Abstract/Free Full Text]

BEGUN, D. J. and C. F. AQUADRO, 1992  Levels of naturally occurring DNA polymorphism correlate with recombination rates in Drosophila melanogaster.. Nature 356:519-520.[Medline]

BEGUN, D. J. and C. F. AQUADRO, 1993  African and North American populations of Drosophila melanogaster are very different at the DNA level. Nature 365:548-550.[Medline]

BEGUN, D. J. and C. F. AQUADRO, 1995  Evolution at the tip and base of the X chromosome in an African population of Drosophila melanogaster.. Mol. Biol. Evol. 12:382-390.[Abstract]

NASSI, V., F. DEPAULIS, G. K. MEGHLAOUI, and M. VEUILLE, 1999  Partial sweeping of variation at the Fbp2 locus in a West African population of Drosophila melanogaster.. Mol. Biol. Evol. 16:347-353.[Abstract]

BRADY, J. P., R. C. RICHMOND, and J. G. OAKESHOTT, 1990  Cloning of the esterase-5 locus from Drosophila pseudoobscura and comparison with its homologue in D. melanogaster.. Mol. Biol. Evol. 7:525-546.[Abstract]

COLLET, C., K. M. NIELSEN, R. J. RUSSELL, M. KARL, and J. G. OAKESHOTT et al., 1990  Molecular analysis of duplicated esterase genes in Drosophila melanogaster.. Mol. Biol. Evol. 7:9-28.[Abstract]

COMERON, J. M., M. KREITMAN, and M. AGUADÉ, 1999  Natural selection on synonymous sites is correlated with gene length and recombination in Drosophila. Genetics 151:239-249.[Abstract/Free Full Text]

COOKE, P. H. and J. G. OAKESHOTT, 1989  Amino acid polymorphisms for esterase-6 in Drosophila melanogaster.. Proc. Natl. Acad. Sci. USA 86:1426-1430.[Abstract/Free Full Text]

DAVID, J. R. and P. CAPY, 1988  Genetic variation of Drosophila melanogaster natural populations. Trends Genet. 4:106-111.[Medline]

DEPAULIS, F. and M. VEUILLE, 1998  Neutrality tests based on the distribution of haplotypes under an infinite-site model. Mol. Biol. Evol. 15:1788-1790.[Medline]

DUMANCIC, M. M., J. G. OAKESHOTT, R. J. RUSSELL, and M. J. HEALY, 1997  Characterization of the EstP protein in Drosophila melanogaster and its conservation in Drosophilids. Biochem. Genet. 35:251-271.[Medline]

EAST, P. D., A. GRAHAM and G. WHITINGTON, 1990 Molecular isolation and preliminary characterisation of a duplicated esterase locus in Drosophila buzzatii, pp. 389–406 in Ecological and Evolutionary Genetics of Drosophila, edited by J. S. F. BARKER, W. STARMER and R. J. MACINTYRE. Plenum Press, New York.

FILATOV, D. A. and D. CHARLESWORTH, 1999  DNA polymorphism, haplotype structure and balancing selection in the Leavenworthia PgiC locus. Genetics 153:1423-1434.[Abstract/Free Full Text]

GAME, A. Y. and J. G. OAKESHOTT, 1990  Associations between restriction site polymorphism and enzyme activity variation for esterase 6 in Drosophila melanogaster.. Genetics 126:1021-1031.[Abstract]

GIRIBET, G. and W. C. WHEELER, 1999  On gaps. Mol. Phylogenet. Evol. 13:132-143.[Medline]

GOSS, P. J. E. and R. C. LEWONTIN, 1996  Detecting heterogeneity of substitution along DNA and protein sequences. Genetics 143:589-602.[Abstract]

GROMKO, M. H., D. F. GILBERT and R. C. RICHMOND, 1984 Sperm transfer and use in the multiple mating system of Drosophila, pp. 371–426 in Sperm Competition and the Evolution of Animal Mating Systems, edited by R. L. SMITH. Academic Press, New York.

HASSON, E. and W. F. EANES, 1996  Contrasting histories of three gene regions associated with In(3L)Payne of Drosophila melanogaster.. Genetics 144:1565-1575.[Abstract]

HASSON, E., I. N. WANG, L. W. ZENG, M. KREITMAN, and W. EANES, 1998  Nucleotide variation in the Triosephosphate isomerase (Tpi) locus of Drosophila melanogaster and D. simulans.. Mol. Biol. Evol. 15:756-769.[Abstract]

HEALY, M. J., M. M. DUMANCIC, A. CAO, and J. G. OAKESHOTT, 1996  Localization of sequences regulating ancestral and acquired sites of esterase 6 activity in Drosophila melanogaster.. Mol. Biol. Evol. 13:784-797.[Abstract]

HUDSON, R. R., 1990  Gene genealogies and the coalescent process. Oxf. Surv. Biol. 7:1-44.

HUDSON, R. R., 2001  Two-locus sampling distributions and their application. Genetics 159:1805-1817.[Abstract/Free Full Text]

HUDSON, R. R. and N. KAPLAN, 1985  Statistical properties of the number of recombination events in the history of a sample of DNA sequences. Genetics 111:147-164.[Abstract/Free Full Text]

HUDSON, R. R. and N. KAPLAN, 1988  The coalescent process in models with selection and recombination. Genetics 120:831-840.[Abstract/Free Full Text]

HUDSON, R. R., M. KREITMAN, and M. AGUADÉ, 1987  A test of neutral molecular evolution based on nucleotide data. Genetics 116:153-159.[Abstract/Free Full Text]

HUDSON, R. R., D. BOOS, and N. L. KAPLAN, 1992a  A statistical test for detecting geographic subdivision. Mol. Biol. Evol. 9:138-151.[Abstract]

HUDSON, R. R., M. SLATKIN, and W. P. MADDISON, 1992b  Estimation of levels of gene flow from DNA sequence data. Genetics 132:583-589.[Abstract]

HUDSON, R. R., K. BAILEY, D. SKARECKY, J. KWIATOWSKI, and F. J. AYALA, 1994  Evidence for positive selection in the superoxide dismutase (Sod) region of Drosophila melanogaster.. Genetics 136:1329-1340.[Abstract]

HUDSON, R. R., A. G. SÁEZ, and F. J. AYALA, 1997  DNA variation at the Sod locus of Drosophila melanogaster: an unfolding story of natural selection. Proc. Natl. Acad. Sci. USA 94:7725-7729.[Abstract/Free Full Text]

JUKES, T. H., and C. R. CANTOR, 1969 Evolution of protein molecules, pp. 21–120 in Mammalian Protein Metabolism, edited by H. M. MUNRO. Academic Press, New York.

KAROTAM, J., A. C. DELVES, and J. G. OAKESHOTT, 1993  Conservation and change in structural and 5' flanking sequences of esterase 6 in sibling Drosophila species. Genetica 88:11-28.[Medline]

KAROTAM, J., T. M. BOYCE, and J. G. OAKESHOTT, 1995  Nucleotide variation at the hypervariable esterase 6 isozyme locus of Drosophila simulans.. Mol. Biol. Evol. 12:113-122.[Abstract]

KELLY, J. K., 1997  A test of neutrality based on interlocus associations. Genetics 146:1197-1206.[Abstract]

KOROCHKIN, L., M. Z. LUDWIG, N. A. TAMARINA, I. USPENSKY, G. YENIKOLOPOV et al., 1990 Molecular genetic mechanisms of tissue-specific esterase isozymes and protein expression in Drosophila, pp. 399–440 in Isozymes: Structure, Function, and Use in Biology and Medicine, edited by C. MARKERT and J. SCANDALIOS. Wiley-Liss, New York.

LABATE, J. A., C. H. BIERMANN, and W. F. EANES, 1999  Nucleotide variation at the runt locus in Drosophila melanogaster and Drosophila simulans.. Mol. Biol. Evol. 16:724-731.[Abstract]

LUDWIG, M. Z., N. A. TAMARINA, and R. C. RICHMOND, 1993  Localization of sequences controlling the spatial, temporal, and sex-specific expression of the esterase 6 locus in Drosophila melanogaster adults. Proc. Natl. Acad. Sci. USA 90:6233-6237.[Abstract/Free Full Text]

MCDONALD, J. H., 1996  Detecting non-neutral heterogeneity across a region of DNA sequence in the ratio of polymorphism to divergence. Mol. Biol. Evol. 13:253-260.[Abstract]

MCDONALD, J. H., 1998  Improved tests for heterogeneity across a region of DNA sequence in the ratio of polymorphism to divergence. Mol. Biol. Evol. 15:377-384.[Abstract]

MCVEAN, G., P. AWADALLA, and P. FEARNHEAD, 2002  A coalescent-based method for detecting and estimating recombination from gene sequences. Genetics 160:1231-1241.[Abstract/Free Full Text]

MORIYAMA, E. N. and J. R. POWELL, 1996  Intraspecific nuclear DNA variation in Drosophila. Mol. Biol. Evol. 13:261-277.[Abstract]

MOUSSET, S., L. BRAZIER, M.-L. CARIOU, F. CHARTOIS, and F. DEPAULIS et al., 2003  Evidence of a high rate of selective sweeps in African Drosophila melanogaster.. Genetics 163:599-609.[Abstract/Free Full Text]

MYERS, M., R. C. RICHMOND, and J. G. OAKESHOTT, 1988  On the origins of esterases. Mol. Biol. Evol. 5:113-119.[Abstract]

NEI, M., 1987 Molecul