Genetics, Vol. 165, 1901-1914, December 2003, Copyright © 2003
Nucleotide Variation of the Est-6 Gene Region in Natural Populations of Drosophila melanogaster
Evgeniy S. Balakireva,b,c and
Francisco J. Ayalaa
a Department of Ecology and Evolutionary Biology, University of California, Irvine, California 92697-2525,
b Institute of Marine Biology, Vladivostok 690041, Russia
c Academy of Ecology, Marine Biology and Biotechnology, Far Eastern State University, Vladivostok 690600, Russia
Corresponding author:
Francisco J. Ayala, 321 Steinhaus Hall, University of California, Irvine, CA 92697-2525., fjayala{at}uci.edu (E-mail)
Communicating editor: M. ASMUSSEN
 | ABSTRACT |
|---|
We have investigated nucleotide polymorphism in the Est-6 gene region in four samples of Drosophila melanogaster derived from natural populations of East Africa (Zimbabwe), Europe (Spain), North America (California), and South America (Venezuela). There are two divergent sequence types in the North and South American samples, which are not perfectly (North America) or not at all (South America) associated with the Est-6 allozyme variation. Less pronounced or no sequence dimorphism occurs in the European and African samples, respectively. The level of nucleotide diversity is highest in the African sample, lower (and similar to each other) in the samples from Europe and North America, and lowest in the sample from South America. The extent of linkage disequilibrium is low in Africa (1.23% significant associations), but much higher in non-African populations (22.59, 21.45, and 37.68% in Europe, North America, and South America, respectively). Tests of neutrality with recombination are significant in non-African samples but not significant in the African sample. We propose that demographic history (bottleneck and admixture of genetically different populations) is the major factor shaping the nucleotide patterns in the Est-6 gene region. However, positive selection modifies the pattern: balanced selection creates elevated levels of nucleotide variation around functionally important (target) polymorphic sites (RsaI-/RsaI+ in the promoter region and F/S in the coding region) in both African and non-African samples; and directional selection, acting during the geographic expansion phase of D. melanogaster, creates an excess of very similar sequences (RsaI- and S allelic lineages, in the promoter and coding regions, respectively) in the non-African samples.
FROM the very beginning of the "allozyme era," esterase 6 (Est-6) has been one of the most investigated and informative molecular markers in Drosophila population, evolutionary, and development genetics (reviewed by OAKESHOTT et al. 1989
, OAKESHOTT et al. 1993
, OAKESHOTT et al. 1995
; KOROCHKIN et al. 1990
; RICHMOND et al. 1990
). WRIGHT 1961
, WRIGHT 1963
described two main allozymes (Fast and Slow) of EST-6, showed their Mendelian inheritance, found a differential response to the organophosphate inhibitor, and raised questions about the adaptive significance of the polymorphism. The main allozymes show large-scale latitudinal clines (OAKESHOTT et al. 1981
), with the Slow allozyme more common at higher latitudes. This, together with other data on the temporal and geographic allozyme variation in natural populations and results of laboratory experiments, suggests that the EST-6 polymorphism is maintained by some form of positive selection (reviewed by OAKESHOTT et al. 1989
, OAKESHOTT et al. 1993
, OAKESHOTT et al. 1995
; RICHMOND et al. 1990
).
The Est-6 gene is on the left arm of chromosome 3 of Drosophila melanogaster, at cytogenetic map position 69A1A3 (PROCUNIER et al. 1991
). OAKESHOTT et al. 1987
first obtained the nucleotide sequence and characterized the exon-intron structure of the Est-6 gene. Using available information on nine other eukaryotic esterases, they identified the active site and other functionally important regions of the gene. The coding region of Est-6 is 1686 bp long and consists of two exons (1387 and 248 bp) and a small (51-bp) intron. The gene encodes the major ß-carboxyl esterase (EST-6) that is transferred by D. melanogaster males to females in the seminal fluid during copulation (RICHMOND et al. 1980
) and affects the female's consequent behavior and mating proclivity (GROMKO et al. 1984
). The Est-6 gene is duplicated (COLLET et al. 1990
) but there is evidence that the adjacently located duplicate, referred to as Est-P (COLLET et al. 1990
) or Est-7 (DUMANCIC et al. 1997
), may be a pseudogene (
Est-6, BALAKIREV and AYALA 1996
; BALAKIREV et al. 2003
; but see DUMANCIC et al. 1997
). The ß-esterase gene cluster in other Drosophila species also includes two (three in D. pseudoobscura) closely linked genes (YENIKOLOPOV et al. 1989
; BRADY et al. 1990
; EAST et al. 1990
; OAKESHOTT et al. 1993
, OAKESHOTT et al. 1995
).
The expression of Est-6 in D. melanogaster has been investigated using P-element-mediated transformation (LUDWIG et al. 1993
; HEALY et al. 1996
; TAMARINA et al. 1997
). Within the
1.2 kb of the 5'-flanking region, several independently acting cis-regulatory promoter elements that control the expression of the gene in different tissues have been identified. GAME and OAKESHOTT 1990
investigated restriction site polymorphism and its association with functional variation within a 21.5-kb region including the Est-6 gene and found that a restriction polymorphism at an RsaI site in the 5'-flanking region of Est-6 shows a significant association with male amount and activity of EST-6. Given other evidence showing that differences in male EST-6 activity affect the reproductive success of their mates (RICHMOND et al. 1990
), GAME and OAKESHOTT 1990
concluded that Est-6 cis-acting regulatory polymorphisms may be important contributors to adaptive variation. Indeed, OAKESHOTT et al. 1994
and SAAD et al. 1994
have detected significant associations between the fitness components (preadult viability, development time, time to mating, remating frequency, egg production, and fertility) of D. melanogaster and the EST-6 activity level.
COOKE and OAKESHOTT 1989
sequenced the complete coding region of Est-6 in 13 D. melanogaster lines in an Australian population (chosen so as to include all allozyme variants known in the population). They suggested that the main Fast and Slow allozymes differ by two amino acids (Asn/Asp at position 237 and Thr/Ala at position 247; but see HASSON and EANES 1996
and BALAKIREV et al. 1999
) and considered these two amino acid replacements as the most likely targets for selection underlying the previously detected latitudinal clines (OAKESHOTT et al. 1981
). ODGERS et al. 1995
sequenced 974 bp of the Est-6 5'-flanking region in D. melanogaster and identified a nucleotide substitution responsible for the RsaI polymorphism (T
G at -531). They also revealed the presence of two highly diverged haplotype groups and a peak of polymorphism around the RsaI site. ODGERS et al. 1995
showed that the RsaI+ haplotype group yields
25% more EST-6 enzyme activity in adult males than does the RsaI- haplotype and detected weak disequilibrium between the promoter polymorphism and the Fast/Slow allozyme polymorphism. Later, ODGERS et al. 2002
carried out P-element-mediated germ-line transformation, fusing representative promoter alleles to an identical Est-6 coding region. They found a twofold difference in EST-6 activity in the male anterior sperm ejaculatory duct. ODGERS et al. 2002
also conducted restriction fragment length polymorphism (RFLP) and sequencing of the promoter region in populations from Africa, America, Asia, and Australia and detected significant deviation from neutral expectations in the non-African samples but not in the African one. HASSON and EANES 1996
investigated the nucleotide polymorphism of the Est-6 coding region in 16 lines from disparate parts of the world, selected on the basis of the presence/absence of the cosmopolitan inversion In (3L) Payne, and detected shared polymorphisms between St and In (3L) Payne chromosomes, indicating extensive genetic exchange between arrangements. BALAKIREV et al. 1999
sequenced 15 alleles of the Est-6 coding region from a Californian population and found two highly differentiated haplotypes, one encompassing the Fast alleles and the other consisting of Slow alleles. They also detected a distinct peak of increased variation in the region surrounding the replacement site responsible for the EST-6 Fast/Slow allozyme polymorphism and suggested that balancing selection might be involved in the polymorphism. All these studies involve samples that are too small (COOKE and OAKESHOTT 1989
; HASSON and EANES 1996
; BALAKIREV et al. 1999
), nonrandom, or both (COOKE and OAKESHOTT 1989
; HASSON and EANES 1996
) and thus unsuitable for certain population genetic tests (HUDSON et al. 1994
; SIMONSEN et al. 1995
).
We (BALAKIREV et al. 2002
) increased the sample size and the length of the region sequenced to carry out significant tests of neutrality and to analyze the possible association between the regulatory and structural nucleotide polymorphism, seeking also to test for linkage disequilibrium within the gene region, a possibility suggested by the patterns observed in our previous study (BALAKIREV et al. 1999
). We investigated the 5'-flanking, coding, and 3'-flanking regions of the Est-6 gene (3062 bp total) in a random sample of 30 lines (and thus large enough for the population genetic tests; see HUDSON et al. 1994
; SIMONSEN et al. 1995
) of D. melanogaster from a natural population of California. We detected a highly structured pattern of variability, with distinctive features in the coding and 5'-flanking regions. We discovered two distinct allelic lineages for the promoter and coding region of the Est-6 gene. The pattern of variability was complex and differed between the coding and the 5'-flanking regions, although the level of nucleotide diversity was very similar in the two regions. We detected strong linkage disequilibrium within the 5'-flanking region and Est-6 coding region separately but it was much less pronounced between these two functional regions of the gene. The neutrality tests of KELLY 1997
and WALL 1999
incorporating recombination were highly significant for the studied regions. We suggested that the Est-6 nucleotide polymorphism is shaped by a combination of directional and balancing selection acting on the promoter and coding region polymorphisms and by interactions between the two regions due to different degrees of hitchhiking (BALAKIREV et al. 2002
).
We now present the analysis of nucleotide variation of the Est-6 gene region in three additional samples of D. melanogaster derived from the natural populations of East Africa (Zimbabwe), Europe (Spain), and South America (Venezuela). The motivation for examining this gene in different populations is to analyze the pattern of nucleotide variation in the ancestral (African) and derived (European and American) D. melanogaster populations; we attempt further to clarify the question concerning the evolutionary forces shaping the regulatory (RsaI+/RsaI-) and structural [Fast/Slow (F/S)] nucleotide polymorphisms. ODGERS et al. 1995
, ODGERS et al. 2002
could not analyze the association between the regulatory and structural nucleotide polymorphisms, because they did not sequence the Est-6 coding region in the same lines of D. melanogaster for which they obtained the promoter region sequences.
 | MATERIALS AND METHODS |
|---|
Drosophila strains:
D. melanogaster strains were derived from random samples of wild flies collected in Europe (Spain), North America (California), and South America (Venezuela). The strains were made fully homozygous for the third chromosome by crosses with balancer stocks, as described by SEAGER and AYALA 1982
. The strains were named, in accordance with the electrophoretic alleles they carry for esterase-6 (the letter before the hyphen) and superoxide dismutase (the letter after the number), Ultra Slow (US), Slow (S), and Fast (F) (Fig 1). Chung-I Wu kindly provided the D. melanogaster strains from East Africa (Sengwa and Harare, Zimbabwe). The strain Zim S-44F (Zimbabwe) is from F. J. Ayala's laboratory.


View larger version (97K):
In this window
In a new window
Download PPT slide
|
Figure 1.
The lines of D. melanogaster from East Africa [Zimbabwe (Zim)], Europe [Barcelona, Spain (Bar)], North America [El Rio, California (ER)], and South America [Caracas, Venezuela (Ven)] are presented sequentially. The lines within each population are grouped according to their genetic similarity. The S, US, and F letters before the line numbers refer to the EST-6 allozymes, Slow, Ultra-Slow, and Fast. The S and F after the numbers refer to the allozyme polymorphism at the Sod locus (except in Zimbabwe, where Sod has not been investigated) and have been previously used to tag these lines. The second letter in the Zimbabwe lines (not in Zim S-44F) refers to the locality of collection (Sengwa or Harare). The line Zim S-44F is from F. J. Ayala's laboratory. The numbers above the top sequence represent the position of segregating sites and the start of a deletion or insertion. Nucleotides are numbered from the beginning of our sequence (position 32 in COLLET et al. 1990 ). The coding region (exon I and exon II) of the Est-6 gene is underlined below the reference sequence (ER S-26F). Amino acid replacement polymorphisms are marked with asterisks. The RsaI polymorphism is determined by site 653, where RsaI+ has T and RsaI- has G; the S-F allozyme polymorphism is determined by site 1959, where S has A (asparagine) and F has G (aspartic acid). Dots indicate the same nucleotide as the reference sequence. Hyphens represent deleted nucleotides. Question marks indicate missing data. denotes a deletion; denotes the absence of a deletion; denotes an insertion; denotes the absence of an insertion. 1, 5-bp deletion of CTTTT; 2, 19-bp deletion of TTCTATTTTGTCGCAAGCA; 3, single-nucleotide deletion of T; 1, 35-bp insertion of AGTAATTGTAATAATAATATAATAGTAATTTTGAT; 2, single-nucleotide insertion of A; 4, 2-bp deletion of AA; 5, 9-bp deletion of CAAACCTAA; 6, 3-bp deletion of GAT; 3, 3-bp insertion of TGT.
|
|
DNA extraction, amplification, and sequencing:
Methods are as previously described (BALAKIREV et al. 2003
). The sequences of both strands were determined for each line, using 12 overlapping internal primers spaced, on average, 350 nucleotides. (See GenBank accessions
AF526538,
AF526539,
AF526540,
AF526541,
AF526542,
AF526543,
AF526544,
AF526545,
AF526546,
AF526547,
AF526548,
AF526549,
AF526550,
AF526551,
AF526552,
AF526553,
AF526554,
AF526555,
AF526556,
AF526557,
AF526558,
AF526559,
AF150809,
AF150810,
AF150811,
AF150812,
AF150813,
AF150814,
AF150815,
AF147095147102, and
AF217624,
AF217625,
AF217626,
AF217627,
AF217628,
AF217629,
AF217630,
AF217631,
AF217632,
AF217633,
AF217634,
AF217635,
AF217636,
AF217637,
AF217638,
AF217639,
AF217640,
AF217641,
AF217642,
AF217643,
AF217644,
AF217645). At least two independent PCR amplifications were sequenced for each polymorphic site in all D. melanogaster strains to prevent possible PCR and sequencing errors.
DNA sequence analysis:
The Est-6 sequences were assembled using the program SeqMan (Lasergene, 19941997; DNASTAR, Madison, WI). The computer programs DnaSP, version 3.4 (ROZAS and ROZAS 1999
), and PROSEQ, version 2.4 (FILATOV and CHARLESWORTH 1999
), were used for most intraspecific analyses. Departures from neutral expectations were investigated using KELLY's (1997) and WALL's (1999) tests on the basis of linkage disequilibrium between segregating sites and incorporating recombination. The permutation approach of HUDSON et al. 1992A
, HUDSON et al. 1992B
was used to estimate the significance of sequence differences between populations and haplotype families. Simulations based on the algorithms of the coalescent process with recombination (HUDSON 1990
) were performed with the PROSEQ program to estimate the probabilities of the observed values of Kelly's ZnS and Wall's B and Q statistics. The coalescent approach was also used to estimate confidence intervals for the nucleotide diversity values. The program Geneconv version 1.81 (SAWYER 1999
) was used to detect gene conversion events. The population recombination rate was analyzed with the permutation-based approach (MCVEAN et al. 2002
) on the basis of the approximate-likelihood coalescent method of HUDSON 2001
.
 | RESULTS |
|---|
Nucleotide polymorphism and recombination:
The sequenced region consists of 3066 bp (2498 bp in the African sample). Fig 1 shows a total of 121 polymorphic sites (124 mutations because of three different nucleotides at each of positions 763, 1391, and 2396) in a sample of 78 sequences of the Est-6 gene from four populations of D. melanogaster: 45 sites (46 mutations) in the 5'-flanking region (3 sites, positions 329, 405, and 424, are associated with deletions), 49 sites (51 mutations) in exon I, 2 sites in the intron, 5 sites in exon II, and 20 sites in the intergenic region. Within the Est-6 exons we detected 20 replacement and 34 synonymous polymorphic sites. Nine length polymorphisms, six deletions (
1
6), and three insertions (
1
3) occur within the whole sequenced region (Fig 1).
The length of the 5'-flanking region sequenced in the East-African sample is 619 bp but 1183 bp in the other samples. To obtain comparable estimates of nucleotide variation in all samples, we restrict the analysis of the 5'-flanking region to the 619 bp ("standard length") sequenced in all. Table 1 shows estimates of nucleotide diversity for the standard length of the Est-6 gene and flanking regions. The
value for the full sequence is 0.0060 ± 0.0005, which is within the range of values observed in other highly recombining gene regions of D. melanogaster (MORIYAMA and POWELL 1996
). The
value is very similar in the 5'-flanking (0.0060 ± 0.0007) and Est-6 regions (0.0057 ± 0.0005), but higher in the intergenic region (0.0094 ± 0.0018). The synonymous variation (0.0160) is 6.7 times higher than the nonsynonymous variation (0.0024) in the Est-6 coding region. This sort of difference is expected if there is selective constraint on the Est-6 nonsynonymous substitution rate. The level of silent divergence is at least 2.0 times higher for the Est-6 gene than for the 5'-flanking or intergenic region (Table 1). The level of nucleotide diversity is highest in the African sample (
= 0.0092 ± 0.0008) and lowest in the sample from South America (
= 0.0034 ± 0.0007). Intermediate (and very similar) values of nucleotide diversity are observed in the European (
= 0.0055 ± 0.0008) and North American (
= 0.0060 ± 0.0008) samples (Table 1).
Previously, we detected in the California population lower polymorphism in the coding region of the S haplotypes than in that of the F haplotypes and lower polymorphism in the promoter region of the RsaI- haplotypes than in that of the RsaI+ haplotypes. We also noted that the "double sweep" (RsaI-/S) haplotypes (the haplotypes that have the more common mutations in both the promoter and coding region) were least variable (BALAKIREV et al. 2002
). A similar tendency is observed in the East-African and European samples but not in the South American sample (Table 2). The South American sample is unique in the sense that it has no F allelic lineage at Est-6 (see Fig 1). [We note that this population also lacks the S allele at the Sod locus (HUDSON et al. 1994
).]
The method of HUDSON and KAPLAN 1985
reveals a minimum of 20 recombination events in the whole region analyzed: 3 for the 5'-flanking region, 16 for the Est-6 gene, and 1 between them. The population recombination rate (MCVEAN et al. 2002
) is 0.0216 for the combined data set (Table 3), which is about three times less than the laboratory estimate of recombination rate (0.0664) based on the physical and genetic maps of D. melanogaster (J. M. COMERON, personal communication; COMERON et al. 1999
; BALAKIREV et al. 2002
). The rate of recombination is several times greater in the African than in the non-African samples (Table 3). The lowest recombination occurs in the South American sample. There is a positive correlation between nucleotide variation and recombination rate, as observed elsewhere (e.g., BEGUN and AQUADRO 1992
; see Table 1 and Table 3).
The method of SAWYER 1989
, SAWYER 1999
detects gene conversion events within the Est-6 gene in all samples except Venezuela. The number of significant fragments varies from 1 (Africa) to 14 (North America). The average length of fragments is 636 bp (range 3141183 bp). The conversion events are less pronounced in the protein alignment (only 2 significant fragments, 1 in Africa and 1 in North America), which suggests the involvement mostly of silent sites in significant fragments of the nucleotide alignment.
Haplotype structure and differentiation of populations:
Maximum haplotype diversity occurs in East Africa (Hdiv = 1.000; no identical sequence pairs); less occurs in Europe (Hdiv = 0.895; 16 identical sequence pairs) and North America (Hdiv = 0.947; 20 identical sequence pairs); and the minimum occurs in South America (Hdiv = 0.621; 72 identical sequence pairs).
Fig 2 shows a neighbor-joining tree of the Est-6 sequences (standard length). Due to recombination and gene conversion, this tree is not a good reflection of the genealogical process, but it serves to show the genetic structure of the data. The tree shows a relative absence of geographic structure: the sequences from a given population do not all group together. However, recombination has not completely erased all information, since there are two clusters of haplotypes related to RsaI polymorphism (data not shown). The first cluster includes the sequences with the RsaI- haplotypes (all strains from Ven S-10F at the top to ER F-1461S at the bottom); the second cluster contains the RsaI+ haplotypes (all strains from ER S-255S down to Ven S-2F). The RsaI-/RsaI+ clusters are even more apparent in the tree for the promoter region only (data not shown). If we restrict the analysis only to the coding region, the two clusters obtained differ to some extent (but not exclusively) with respect to the S and F haplotypes (data not shown).

View larger version (11K):
In this window
In a new window
Download PPT slide
|
Figure 2.
Neighbor-joining tree of the Est-6 haplotypes of D. melanogaster, based on Kimura's two-parameter distance. The numbers are bootstrap probability values based on 10,000 replications. The trees are based on the Est-6 standard length.
|
|
ODGERS et al. 1995
described two groups of haplotypes for the 5'-flanking region of the Est-6 gene of D. melanogaster from Australia. We detected two groups of haplotypes both for the Est-6 gene (including the 5'-flanking region) and for the
Est-6 putative pseudogene from North America (California; BALAKIREV and AYALA 1996
; BALAKIREV et al. 1999
, BALAKIREV et al. 2002
, BALAKIREV et al. 2003
). Two significantly divergent sequence types are also detected in South America (Fig 3A), where only the Slow Est-6 allozyme occurs. The average number of nucleotide differences (K) between the two haplotypes is 11.286. This is comparable with the differences between RsaI+/RsaI- (K = 6.720) and F/S (K = 11.809) allelic lineages in California (BALAKIREV et al. 2002
). The permutation test (HUDSON et al. 1992A
) of the Venezuelan haplotypes is highly significant, K*st = 0.5867 (P = 0.0000). Sequence dimorphism is less pronounced in the European sample (Fig 3B). The two divergent sequence types are not associated with Est-6 allozyme variation (South America) or imperfectly associated (Europe, North America). The African sample (Fig 3C) has no clear sequence dimorphism (although all S haplotypes but one cluster together).
The estimates of population differentiation (HUDSON et al. 1992A
) are fairly similar between the pairs Zim-Bar (Fst = 0.0653), Zim-ER (Fst = 0.0398), Bar-Ven (Fst = 0.1093), and ER-Ven (Fst = 0.0920) (for locality abbreviations see Fig 1). The maximal and minimal Fst values are obtained, respectively, for the pairs Zim-Ven (Fst = 0.1508) and Bar-ER (Fst = -0.0059). We assess the statistical significance of the Fst values with the permutation method of HUDSON et al. 1992B
, with 10,000 permutations. The differences are significant (P < 0.05) between Africa and all other samples (Europe, North America, and South America), a result consistent with other data (BEGUN and AQUADRO 1993
, BEGUN and AQUADRO 1995
). The differences between European and the North or South American samples are not significant (P > 0.05).
Sliding-window analysis:
Fig 4 shows the distribution of polymorphism along the Est-6 sequences. There is a distinct peak in the 5'-flanking region, which includes the RsaI+/RsaI- site (position 653 in Fig 1). ODGERS et al. 1995
detected this peak of variation in an Australian population of D. melanogaster. We also detected this peak (BALAKIREV et al. 1999
, BALAKIREV et al. 2002
, BALAKIREV et al. 2003
) in the Californian population of D. melanogaster. Another distinct peak of variation occurs around the F/S site except in Venezuela. We detected this peak (BALAKIREV et al. 1999
, BALAKIREV et al. 2002
, BALAKIREV et al. 2003
) in our Californian data and also in data of HASSON and EANES 1996
and COOKE and OAKESHOTT 1989
and suggested that it may reflect the effect of balancing selection (STROBECK 1983
; HUDSON and KAPLAN 1988
) between the F and S haplotypes, rather than within them (BALAKIREV et al. 2002
). The absence of the peak in Venezuela may be a consequence of the absence of F haplotypes in this sample (Fig 1). The strong presence of both the promoter (RsaI+/RsaI-) and coding region (F/S) peaks in the African sample (Fig 4) suggests that these polymorphic sites were targets of balancing selection already in the African population (from which the others derive by migration).

View larger version (18K):
In this window
In a new window
Download PPT slide
|
Figure 4.
Sliding-window plots of nucleotide diversity ( ) along the Est-6 gene region of D. melanogaster. A schematic of the Est-6 gene is displayed at bottom. Exons are indicated by open boxes; the intron and the 5'- and 3'-flanking regions are shown by thin lines. Window sizes are 100 nucleotides with 1-nucleotide increments. The locations of the RsaI and allozyme polymorphisms are marked.
|
|
The valley regions located between the peaks of nucleotide variation are centered around positions 350, 1200, and 1800 (Fig 4). The first valley region includes nearly 400 bp upstream of the Est-6 coding region. KAROTAM et al. 1993
, KAROTAM et al. 1995
and ODGERS et al. 1995
detected strong conservation and low nucleotide variation of this region in D. melanogaster, D. simulans, and D. mauritiana. The region is under strong functional constraint because it contains several regulatory elements that are essential for Est-6 expression (LUDWIG et al. 1993
). Another valley region (11001300 bp) corresponds to amino acid residues Arg-159, Asp-181, and Ser-209 (codons at nucleotide sites 475477, 541543, and 625627; positions 10941096, 11601162, and 12441246 in our coordinates). These residues (along with the surrounding sequences) are highly conserved in different esterases and are likely to be important for esterase enzymatic function (MYERS et al. 1988
). A third valley region encompasses the potential N-linked glycosylation site, corresponding to codon position 12581260 (18771879 in our coordinates). The correspondence between the level of polymorphism and localities of functionally important sites implicated in the catalytic mechanism suggests that the observed valley regions reflect functional constraint.
We have measured heterogeneity in the distribution of silent polymorphic sites along the Est-6 sequence and discordance between the level of within-melanogaster polymorphism and the melanogaster-simulans divergence by means of GOSS and LEWONTIN's (1996) and MCDONALD's (1996, 1998) statistics and have assessed their significance by Monte Carlo simulations of the coalescent model incorporating recombination (MCDONALD 1996
, MCDONALD 1998
). On the basis of 10,000 simulations, with the recombination parameters varying from 1 to 64, the tests are not significant for any of the separate samples or for the combined data set (data not shown).
Linkage disequilibrium:
Linkage disequilibrium (LD) is measured by calculating the P value of Fisher's exact test in all pairwise comparisons between polymorphic sites. For the whole standard region (2498 bp) there are 1485 pairwise comparisons and 467 (31.45%) of them are significant. (With the Bonferroni correction, 11.92% remain significant; Bonferroni-corrected values are italicized in the ensuing sentences.) For the 5'-flanking region 25 of 78 (32.05%; 23.08%) pairwise comparisons are significant. For the Est-6 coding region (including the intron) 219 of 528 (41.48%; 23.11%) comparisons are significant. There are 19.58% (1.17%) significant associations between the 5'-flanking region and the Est-6 gene coding region. The proportion of pairs of sites with LD values significantly different from zero, at the 5% level, is much higher within the 5'-flanking region and Est-6 coding region (244 of 606 pairwise comparisons) than between them (84 of 429, Fisher's exact test, P < 0.001; Fisher's criterion F = 52.919; P < 0.001). This observation corroborates our hypothesis (BALAKIREV et al. 2002
) that the promoter and coded regions are subject to separate selection processes.
Linkage disequilibrium is notably low in the African sample: only 1.23% significant associations are in this sample, but 22.59, 21.45, and 37.68% are in the European, North American, and South American samples, respectively. Fig 5 shows the distribution of D values along the whole region studied. A notable peak is around the F/S site and a less pronounced peak is around the RsaI-/RsaI+ site.

View larger version (19K):
In this window
In a new window
Download PPT slide
|
Figure 5.
Sliding-window plot of linkage disequilibrium (measured by D) along the Est-6 gene region of D. melanogaster. A schematic of the Est-6 putative pseudogene is displayed at bottom. Window sizes are 130 nucleotides with 60-nucleotide increments.
|
|
The significance of Pearson's correlation coefficient between LD and physical distance between sites is estimated by 10,000 permutations (MCVEAN et al. 2002
). For all samples, except South America, there is significant decline in LD with increasing distance (Table 4). The strong haplotype structure and pattern of linkage disequilibrium suggest that the South American population originated from a recent admixture of genetically differentiated populations.
View this table:
In this window
In a new window
|
Table 4.
Correlation between linkage disequilibrium and physical distance between the Est-6 (full-sequence) polymorphic sites
|
|
Tests of neutrality:
The tests of HUDSON et al. 1987
, TAJIMA 1989
, and DEPAULIS and VEUILLE 1998
do not reveal any significant deviation from neutrality for the Est-6 gene region in any of the four populations of D. melanogaster (see also BALAKIREV et al. 2002
). However, KELLY's (1997) ZnS and WALL's (1999) B and Q tests detect significant deviations from neutrality in the non-African samples, with the population recombination rate ranging from 0.005 to 0.010 (Table 5; data for B and Q are not shown). The tests fail to detect any significant deviation from neutrality for the African sample, even when using 0.0664 as the recombination rate (laboratory estimate and based on the physical and genetic maps of D. melanogaster; J. M. COMERON, personal communication; COMERON et al. 1999
; BALAKIREV et al. 2002
), which is at least 2.5 times higher than the value of recombination obtained by the method of MCVEAN et al. 2002
(Table 3). The significant values of Kelly's and Wall's statistics are grouped around the peaks of linkage disequilibrium and centered around the functionally important sites within both the 5'-flanking region (RsaI site) and the coding region (F/S polymorphism) of the Est-6 gene (data not shown), which has been interpreted as evidence that these sites are targets of balancing selection (AYALA et al. 2002
; BALAKIREV et al. 2002
, BALAKIREV et al. 2003
).
 | DISCUSSION |
|---|
We have investigated nucleotide polymorphism in the Est-6 gene region in four populations of D. melanogaster from Zimbabwe, Spain, California, and Venezuela. A dimorphic haplotype structure exists in the North American sample, which is not perfectly associated with the Est-6 allozyme variation (S/F) and in South America, where there are no Est-6 F haplotypes. The presence of two or more highly diverged haplotypes has been interpreted as a result of positive selection in D. melanogaster (see, e.g., HUDSON et al. 1994
, HUDSON et al. 1997
; BENASSI et al. 1999
; LABATE et al. 1999
). TEETER et al. 2000
investigated single-nucleotide polymorphism in 66 sequences of D. melanogaster spaced at 5- to 20-cM intervals and generated a map with no gaps greater than one-half of a chromosome arm (TEETER et al. 2000
). Two-thirds of all sequences were dimorphic. If the dimorphism results from positive selection, TEETER et al. 2000
estimate that one site for every few kilobases would be subject to strong positive selection, which seems improbable. TEETER et al. 2000
suggest that admixture between two differentiated populations of D. melanogaster would account for and be a more appropriate explanation of the dimorphism. Suggestions of admixture have also been made on the basis of nucleotide sequencing, RFLP, and allozyme analyses of D. melanogaster populations (e.g., DAVID and CAPY 1988
; SINGH and LONG 1992
; RICHTER et al. 1997
; HASSON et al. 1998
).
Our Est-6 data are compatible with this proposal. We have found a strong dimorphic haplotype structure in three other D. melanogaster genes on the third chromosome, Sod (HUDSON et al. 1997
), tinman, and bagpipe (E. S. BALAKIREV and F. J. AYALA, unpublished data), which may also have resulted from population admixture. Nevertheless, the Est-6 data suggest that positive selection may also contribute to the observed patterns: balanced selection would account for the elevated nucleotide variation and linkage disequilibrium around the target polymorphic sites (RsaI-/RsaI+ in the promoter region and F/S in the coding region), while directional selection would yield an excess of very similar sequences exhibiting a very low level of variability (RsaI- and S allelic lineages, in the promoter and coding region, respectively).
The African sample has the highest level of nucleotide diversity and the lowest level of linkage disequilibrium. The non-African samples show a pattern of haplotype distribution consistent with selective sweep hypotheses in the history of the species. The distribution of haplotype frequency in non-African samples is highly asymmetric: from a total of 66 sequences, 52 belong to the S haplotype and 48 belong to RsaI- haplotype. The haplotype test (HUDSON et al. 1994
) is significant for the North and South American (excluding the recombinant strain Ven S-13F) samples, but not significant for the European sample. We conclude that bottlenecks have been an important evolutionary factor changing the genetic composition of colonizing D. melanogaster populations. The haplotype structure and polymorphism of the Est-6 gene region are in accordance with the general pattern of relationships between the African and non-African populations of D. melanogaster (ANDOLFATTO 2001
; AQUADRO et al. 2001
). However, the peaks of nucleotide variation in the African sample, centered on functionally important sites (Fig 4), suggest that this population is not in mutation-drift equilibrium. The footprints of directional selection have been previously shown in African populations (e.g., MOUSSET et al. 2003
).
We found lower polymorphism in the S than in the F haplotypes (coding region) and lower polymorphism in the RsaI- than in the RsaI+ haplotypes (promoter region) in the California population (BALAKIREV et al. 2002
). The same pattern occurs in the other populations (excluding Venezuela, where no F haplotypes occur), as well as in the total data set encompassing all four populations (Table 2):
is six times higher for the RsaI+ than for the RsaI- haplotypes; for the coding region,
is twice as large for the F as for the S haplotypes but double (0.00695) for the F haplotypes. Thus the lower variability among RsaI- and S haplotypes is not limited to the California population. But the differences are smaller in the African sample, which could indicate that the RsaI- and S haplotypes increased in frequency in Europe and America after their colonization.
We propose that the RsaI+/F (zero-sweep) haplotypes may represent the ancestral condition (BALAKIREV et al. 2002). The frequency of these haplotypes is higher in Africa (0.333) than elsewhere (0.091). We also suggest that the RsaI-/S (double-sweep) haplotypes have evolved under directional selection, since they are less variable but more frequent in non-African samples (0.606) than in African (0.250). Directional selection, however, does not lead toward fixation of the double-sweep haplotypes in the derived populations because of balancing selection maintaining both divergent haplotypes (RsaI-/RsaI+ and F/S) in the promoter and coding regions (BALAKIREV et al. 2002
).
The population data available suggest two different migrations of D. melanogaster during the expansion period from the African continent: (1) Africa
Europe
North America and (2) Africa
South America (see also DAVID and CAPY 1988
; SINGH and LONG 1992
). The second migration is supported by the fact that the East-African and South American samples share a deletion (
6, Fig 1) that is absent in other samples. This deletion is present in 5 of 12 East-African strains but absent in Europe and North America (Fig 1). Gaps constitute a valuable source of phylogenetic information (GIRIBET and WHEELER 1999
). The absence of the F Est-6 allele (and of the S Sod allele; HUDSON et al. 1994
) also suggests that the South American population does not derive from Europe or America. The South American population might represent an admixture of migrants from North America and Africa. The most common haplotype (RsaI-/S) is from North America, while the haplotype RsaI+/S clusters with most of the African strains (Fig 2). The admixture would have been recent, since the strong haplotype structure has not been eroded by recombination (linkage disequilibrium is highest in the South America sample).
 | ACKNOWLEDGMENTS |
|---|
We are grateful to G. McVean, D. A. Filatov, J. K. Kelly, J. H. McDonald, J. D. Wall, J. M. Comeron, F. Depaulis, and J. Rozas for useful advice on analyses and for providing computer programs. We thank Elena Balakireva, Andrei Tatarenkov, Victor DeFilippis, Martina Zurovkova, and Carlos Márquez for encouragement and help; and W. M. Fitch, B. Gaut, R. R. Hudson, A. Long, and two anonymous reviewers for detailed and valuable comments. This work is supported by National Institutes of Health grant GM42397 to F. J. Ayala.
Manuscript received February 27, 2003; Accepted for publication August 20, 2003.
 | LITERATURE CITED |
|---|
ANDOLFATTO, P., 2001 Contrasting patterns of X-linked and autosomal nucleotide variation in Drosophila melanogaster and Drosophila simulans.. Mol. Biol. Evol. 18:279-290.[Abstract/Free Full Text]
AQUADRO, C. F., V. B. DUMONT, and F. A. REED, 2001 Genome-wide variation in the human and fruitfly: a comparison. Curr. Opin. Genet. Dev. 11:627-634.[Medline]
AYALA, F. J., E. S. BALAKIREV, and A. G. SÁEZ, 2002 Genetic polymorphism at two linked loci, Sod and Est-6, in Drosophila melanogaster.. Gene 300:19-29.[Medline]
BALAKIREV, E. S. and F. J. AYALA, 1996 Is esterase-P encoded by a cryptic pseudogene in Drosophila melanogaster? Genetics 144:1511-1518.[Abstract]
BALAKIREV, E. S., E. I. BALAKIREV, F. RODRIGUEZ-TRELLES, and F. J. AYALA, 1999 Molecular evolution of two linked genes, Est-6 and Sod, in Drosophila melanogaster.. Genetics 153:1357-1369.[Abstract/Free Full Text]
BALAKIREV, E. S., E. I. BALAKIREV, and F. J. AYALA, 2002 Molecular evolution of the Est-6 gene in Drosophila melanogaster: contrasting patterns of DNA variability in adjacent functional regions. Gene 288:167-177.[Medline]
BALAKIREV, E. S., V. R. CHECHETKIN, V. V. LOBZIN, and F. J. AYALA, 2003 DNA polymorphism in the ß-esterase gene cluster of Drosophila melanogaster.. Genetics 164:533-544.[Abstract/Free Full Text]
BEGUN, D. J. and C. F. AQUADRO, 1992 Levels of naturally occurring DNA polymorphism correlate with recombination rates in Drosophila melanogaster.. Nature 356:519-520.[Medline]
BEGUN, D. J. and C. F. AQUADRO, 1993 African and North American populations of Drosophila melanogaster are very different at the DNA level. Nature 365:548-550.[Medline]
BEGUN, D. J. and C. F. AQUADRO, 1995 Evolution at the tip and base of the X chromosome in an African population of Drosophila melanogaster.. Mol. Biol. Evol. 12:382-390.[Abstract]
BÉNASSI, V., F. DEPAULIS, G. K. MEGHLAOUI, and M. VEUILLE, 1999 Partial sweeping of variation at the Fbp2 locus in a West African population of Drosophila melanogaster.. Mol. Biol. Evol. 16:347-353.[Abstract]
BRADY, J. P., R. C. RICHMOND, and J. G. OAKESHOTT, 1990 Cloning of the esterase-5 locus from Drosophila pseudoobscura and comparison with its homologue in D. melanogaster.. Mol. Biol. Evol. 7:525-546.[Abstract]
COLLET, C., K. M. NIELSEN, R. J. RUSSELL, M. KARL, and J. G. OAKESHOTT et al., 1990 Molecular analysis of duplicated esterase genes in Drosophila melanogaster.. Mol. Biol. Evol. 7:9-28.[Abstract]
COMERON, J. M., M. KREITMAN, and M. AGUADÉ, 1999 Natural selection on synonymous sites is correlated with gene length and recombination in Drosophila. Genetics 151:239-249.[Abstract/Free Full Text]
COOKE, P. H. and J. G. OAKESHOTT, 1989 Amino acid polymorphisms for esterase-6 in Drosophila melanogaster.. Proc. Natl. Acad. Sci. USA 86:1426-1430.[Abstract/Free Full Text]
DAVID, J. R. and P. CAPY, 1988 Genetic variation of Drosophila melanogaster natural populations. Trends Genet. 4:106-111.[Medline]
DEPAULIS, F. and M. VEUILLE, 1998 Neutrality tests based on the distribution of haplotypes under an infinite-site model. Mol. Biol. Evol. 15:1788-1790.[Medline]
DUMANCIC, M. M., J. G. OAKESHOTT, R. J. RUSSELL, and M. J. HEALY, 1997 Characterization of the EstP protein in Drosophila melanogaster and its conservation in Drosophilids. Biochem. Genet. 35:251-271.[Medline]
EAST, P. D., A. GRAHAM and G. WHITINGTON, 1990 Molecular isolation and preliminary characterisation of a duplicated esterase locus in Drosophila buzzatii, pp. 389406 in Ecological and Evolutionary Genetics of Drosophila, edited by J. S. F. BARKER, W. STARMER and R. J. MACINTYRE. Plenum Press, New York.
FILATOV, D. A. and D. CHARLESWORTH, 1999 DNA polymorphism, haplotype structure and balancing selection in the Leavenworthia PgiC locus. Genetics 153:1423-1434.[Abstract/Free Full Text]
GAME, A. Y. and J. G. OAKESHOTT, 1990 Associations between restriction site polymorphism and enzyme activity variation for esterase 6 in Drosophila melanogaster.. Genetics 126:1021-1031.[Abstract]
GIRIBET, G. and W. C. WHEELER, 1999 On gaps. Mol. Phylogenet. Evol. 13:132-143.[Medline]
GOSS, P. J. E. and R. C. LEWONTIN, 1996 Detecting heterogeneity of substitution along DNA and protein sequences. Genetics 143:589-602.[Abstract]
GROMKO, M. H., D. F. GILBERT and R. C. RICHMOND, 1984 Sperm transfer and use in the multiple mating system of Drosophila, pp. 371426 in Sperm Competition and the Evolution of Animal Mating Systems, edited by R. L. SMITH. Academic Press, New York.
HASSON, E. and W. F. EANES, 1996 Contrasting histories of three gene regions associated with In(3L)Payne of Drosophila melanogaster.. Genetics 144:1565-1575.[Abstract]
HASSON, E., I. N. WANG, L. W. ZENG, M. KREITMAN, and W. EANES, 1998 Nucleotide variation in the Triosephosphate isomerase (Tpi) locus of Drosophila melanogaster and D. simulans.. Mol. Biol. Evol. 15:756-769.[Abstract]
HEALY, M. J., M. M. DUMANCIC, A. CAO, and J. G. OAKESHOTT, 1996 Localization of sequences regulating ancestral and acquired sites of esterase 6 activity in Drosophila melanogaster.. Mol. Biol. Evol. 13:784-797.[Abstract]
HUDSON, R. R., 1990 Gene genealogies and the coalescent process. Oxf. Surv. Biol. 7:1-44.
HUDSON, R. R., 2001 Two-locus sampling distributions and their application. Genetics 159:1805-1817.[Abstract/Free Full Text]
HUDSON, R. R. and N. KAPLAN, 1985 Statistical properties of the number of recombination events in the history of a sample of DNA sequences. Genetics 111:147-164.[Abstract/Free Full Text]
HUDSON, R. R. and N. KAPLAN, 1988 The coalescent process in models with selection and recombination. Genetics 120:831-840.[Abstract/Free Full Text]
HUDSON, R. R., M. KREITMAN, and M. AGUADÉ, 1987 A test of neutral molecular evolution based on nucleotide data. Genetics 116:153-159.[Abstract/Free Full Text]
HUDSON, R. R., D. BOOS, and N. L. KAPLAN, 1992a A statistical test for detecting geographic subdivision. Mol. Biol. Evol. 9:138-151.[Abstract]
HUDSON, R. R., M. SLATKIN, and W. P. MADDISON, 1992b Estimation of levels of gene flow from DNA sequence data. Genetics 132:583-589.[Abstract]
HUDSON, R. R., K. BAILEY, D. SKARECKY, J. KWIATOWSKI, and F. J. AYALA, 1994 Evidence for positive selection in the superoxide dismutase (Sod) region of Drosophila melanogaster.. Genetics 136:1329-1340.[Abstract]
HUDSON, R. R., A. G. SÁEZ, and F. J. AYALA, 1997 DNA variation at the Sod locus of Drosophila melanogaster: an unfolding story of natural selection. Proc. Natl. Acad. Sci. USA 94:7725-7729.[Abstract/Free Full Text]
JUKES, T. H., and C. R. CANTOR, 1969 Evolution of protein molecules, pp. 21120 in Mammalian Protein Metabolism, edited by H. M. MUNRO. Academic Press, New York.
KAROTAM, J., A. C. DELVES, and J. G. OAKESHOTT, 1993 Conservation and change in structural and 5' flanking sequences of esterase 6 in sibling Drosophila species. Genetica 88:11-28.[Medline]
KAROTAM, J., T. M. BOYCE, and J. G. OAKESHOTT, 1995 Nucleotide variation at the hypervariable esterase 6 isozyme locus of Drosophila simulans.. Mol. Biol. Evol. 12:113-122.[Abstract]
KELLY, J. K., 1997 A test of neutrality based on interlocus associations. Genetics 146:1197-1206.[Abstract]
KOROCHKIN, L., M. Z. LUDWIG, N. A. TAMARINA, I. USPENSKY, G. YENIKOLOPOV et al., 1990 Molecular genetic mechanisms of tissue-specific esterase isozymes and protein expression in Drosophila, pp. 399440 in Isozymes: Structure, Function, and Use in Biology and Medicine, edited by C. MARKERT and J. SCANDALIOS. Wiley-Liss, New York.
LABATE, J. A., C. H. BIERMANN, and W. F. EANES, 1999 Nucleotide variation at the runt locus in Drosophila melanogaster and Drosophila simulans.. Mol. Biol. Evol. 16:724-731.[Abstract]
LUDWIG, M. Z., N. A. TAMARINA, and R. C. RICHMOND, 1993 Localization of sequences controlling the spatial, temporal, and sex-specific expression of the esterase 6 locus in Drosophila melanogaster adults. Proc. Natl. Acad. Sci. USA 90:6233-6237.[Abstract/Free Full Text]
MCDONALD, J. H., 1996 Detecting non-neutral heterogeneity across a region of DNA sequence in the ratio of polymorphism to divergence. Mol. Biol. Evol. 13:253-260.[Abstract]
MCDONALD, J. H., 1998 Improved tests for heterogeneity across a region of DNA sequence in the ratio of polymorphism to divergence. Mol. Biol. Evol. 15:377-384.[Abstract]
MCVEAN, G., P. AWADALLA, and P. FEARNHEAD, 2002 A coalescent-based method for detecting and estimating recombination from gene sequences. Genetics 160:1231-1241.[Abstract/Free Full Text]
MORIYAMA, E. N. and J. R. POWELL, 1996 Intraspecific nuclear DNA variation in Drosophila. Mol. Biol. Evol. 13:261-277.[Abstract]
MOUSSET, S., L. BRAZIER, M.-L. CARIOU, F. CHARTOIS, and F. DEPAULIS et al., 2003 Evidence of a high rate of selective sweeps in African Drosophila melanogaster.. Genetics 163:599-609.[Abstract/Free Full Text]
MYERS, M., R. C. RICHMOND, and J. G. OAKESHOTT, 1988 On the origins of esterases. Mol. Biol. Evol. 5:113-119.[Abstract]
NEI, M., 1987 Molecul