Genetics, Vol. 153, 1717-1729, December 1999, Copyright © 1999

Large Number of Replacement Polymorphisms in Rapidly Evolving Genes of Drosophila: Implications for Genome-Wide Surveys of DNA Polymorphism

Karl J. Schmida,c, Loredana Nigrob, Charles F. Aquadroc, and Diethard Tautz1,a
a Zoologisches Institut, Universität München, 80333 München, Germany,
b Dipartimento di Biologia, University of Padua, 35122 Padua, Italy
c Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York 14853

Corresponding author: Karl J. Schmid, Section of Genetics and Development, 403 Biotechnology Bldg., Cornell University, Ithaca, NY 14853-2703., kjs21{at}cornell.edu (E-mail)

Communicating editor: A. G. CLARK

e


*  ABSTRACT
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

We present a survey of nucleotide polymorphism of three novel, rapidly evolving genes in populations of Drosophila melanogaster and D. simulans. Levels of silent polymorphism are comparable to other loci, but the number of replacement polymorphisms is higher than that in most other genes surveyed in D. melanogaster and D. simulans. Tests of neutrality fail to reject neutral evolution with one exception. This concerns a gene located in a region of high recombination rate in D. simulans and in a region of low recombination rate in D. melanogaster, due to an inversion. In the latter case it shows a very low number of polymorphisms, presumably due to selective sweeps in the region. Patterns of nucleotide polymorphism suggest that most substitutions are neutral or nearly neutral and that weak (positive and purifying) selection plays a significant role in the evolution of these genes. At all three loci, purifying selection of slightly deleterious replacement mutations appears to be more efficient in D. simulans than in D. melanogaster, presumably due to different effective population sizes. Our analysis suggests that current knowledge about genome-wide patterns of nucleotide polymorphism is far from complete with respect to the types and range of nucleotide substitutions and that further analysis of differences between local populations will be required to understand the forces more completely. We note that rapidly diverging and nearly neutrally evolving genes cannot be expected only in the genome of Drosophila, but are likely to occur in large numbers also in other organisms and that their function and evolution are little understood so far.


THE question of which evolutionary forces are responsible for the evolution of genes and proteins has been a contentious issue among molecular evolutionists. Many sequence comparisons of homologous proteins seem to confirm that the sequence evolution of proteins results mainly from the random fixation of neutral sequence variants, because the overwhelming majority of proteins exhibits fewer replacements than silent substitutions. According to the neutral theory of molecular evolution, functional and structural constraints determine what proportions of new variants are deleterious, thereby causing rate differences between different proteins. The rapidly growing database of DNA sequences provides evidence for both neutral and adaptive patterns in sequence data, but positive selection may be more frequent than thought previously (KREITMAN and AKASHI 1995 Down). Most molecular evolutionists now agree that most new mutations in proteins are deleterious; there is still disagreement about what proportions of nondeleterious mutant alleles are neutral, nearly neutral, or advantageous (KREITMAN 1996 Down; OHTA 1996 Down; LI 1997 Down). There is also some debate about the relative role of drift and positive selection under weak selection because both nearly neutral and episodic selection models are able to produce the identical patterns of polymorphism with certain parameter assumptions (GILLESPIE 1994 Down).

Rapidly evolving proteins are particularly interesting for this discussion. Three scenarios may explain why proteins evolve rapidly. The first may be a lack of strong functional or structural constraints. In this case, a large number of amino acid residues can be mutated without impairing the function of the protein and it evolves in a neutral fashion. The second may be positive selection for sequence divergence. Some classes of proteins appear to be affected predominantly by positive selection. Such proteins are involved in pathogen-host interaction and the immune system (HUGHES et al. 1990 Down; FITCH et al. 1991 Down; SMITH et al. 1995 Down; HUGHES 1997 Down), sex determination (WHITFIELD et al. 1993 Down; SUTTON and WILKINSON 1997 Down), and reproduction (LEE et al. 1995 Down; METZ and PALUMBI 1996 Down; TSAUR and WU 1997 Down). A final explanation may be a mixture of the first two explanations: neutral evolution of some residues and positive selection at others.

A major limitation in understanding the factors governing protein evolution is a lack of knowledge about the distribution of evolutionary rates among the vast majority of genes in a genome. Most proteins whose evolution has been studied so far are functionally and structurally well characterized and evolutionarily conserved. They constitute a nonrandom sample of all genes in a genome and may give a biased picture of the relative roles of mutation, selection, and drift. This is contrasted by the output from genome sequencing projects, where thousands of novel proteins are being identified whose structure, function, and molecular evolution remain largely unknown. As long as there are no complete genome sequences from closely related species available, it is necessary to use a random sample of genes for evaluating the range of evolutionary rates and the factors affecting sequence evolution in a genome.

Previously, we performed such a genome-wide survey and examined the sequence conservation of ~100 different, randomly isolated nonidentical clones from an embryonic cDNA library of Drosophila melanogaster to estimate the range and distribution of evolutionary divergence in the Drosophila genome by genomic filter hybridization (SCHMID and TAUTZ 1997 Down). In this screen, about one-third of these clones was classified as fast evolving, because they did not hybridize against genomic DNA from Drosophila virilis (40 million year evolutionary distance). More detailed sequence comparisons of 10 fast evolving cDNA clones between D. melanogaster and the closely related species D. yakuba (12 millon year evolutionary distance) revealed that the numbers of amino acid replacement substitutions are among the highest of currently known Drosophila genes.

Here we describe a survey of nucleotide polymorphism in populations of D. melanogaster and D. simulans at three fast evolving loci that were isolated in our previous screen. The goal of this study is to test whether the amino acid sequences of the proteins are also variable within species and to use the polymorphism data for tests of neutral evolution. The work described here extends the initial population survey of SCHMID and TAUTZ 1997 Down because two additional loci and larger numbers of lines were analyzed. Results are compared to other genes that were surveyed in populations of both species to identify differences between fast evolving and conserved genes. Furthermore, we compare levels of polymorphism and divergence among loci and between lineages to differentiate between locus-specific and lineage-specific effects.


*  MATERIALS AND METHODS
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

Surveyed genes:
Three genes that were classified as fast evolving in our screen were chosen for this analysis. They constitute novel, putative protein coding genes and are characterized by large numbers of nonsynonymous substitutions in comparisons between D. melanogaster and D. yakuba (SCHMID and TAUTZ 1997 Down). Note that their names are derived from their location in microtiter plates and do not reflect their cytological location in the genome of D. melanogaster. Although the genetic and biochemical functions of these genes are not known, there is strong evidence that all three of them are functional genes and not pseudogenes: (1) the ratio of nonsynonymous to synonymous substitutions (Ka/Ks) ratio between D. melanogaster and D. yakuba is <1, indicating purifying selection; (2) all insertion/deletion mutations between the two species are in frame; (3) the open reading frame (ORF) and expression patterns (K. J. SCHMID and D. TAUTZ, unpublished data) are conserved between species. Figure 1 shows a schematic structure of the genes and the regions that were surveyed in this study.



View larger version (17K):
In this window
In a new window
Download PPT slide
 
Figure 1. Sequenced regions of the three loci surveyed in this study. A schematic representation of the cDNA clones and additional introns that were discovered after sequencing of genomic PCR fragments are depicted (the clones are oriented from 5' to 3'). Gray boxes show the coding regions and white boxes show noncoding regions of the cDNA. The black boxes in anon1E9 show the zinc-finger domains. Sequenced regions are outlined by the bars above each gene (lengths are given for the aligned D. melanogaster and D. simulans sequences).

Clone anon1A3 encodes a protein of 489 amino acids and is characterized by a highly negative net charge. The gene has no similarity to other sequences in database searches and there are no close homologs in the Drosophila genome, as evaluated by Southern blotting. The gene is expressed in different tissues during embryogenesis: until gastrulation, the transcript is homogeneously distributed in the embryo and then becomes restricted to the developing mesoderm and central nervous system.

The protein encoded by clone anon1E9 has a length of 588 amino acids and contains six C2H2 zinc-finger motifs. Four zinc-finger motifs are arrayed as a tandem in the center of the protein and the other two at the C terminus (Figure 1). Database searches reveal no close similarity to other zinc-finger proteins, and only those residues necessary for maintaining the structure of the fold are identical between anon1E9 and the best matches. This gene is only maternally expressed during embryogenesis, and the transcript is homogeneously distributed in the early embryo. The transcript can be detected until the cellular blastoderm stage.

Clone anon1G5 is the fastest evolving among the three genes. The putative protein has a length of 337 amino acids, does not exhibit sequence similarity to other genes, and is a single copy gene. The central region is very divergent between D. melanogaster and D. yakuba and also contains several insertions and deletions. This gene is expressed throughout embryogenesis and shows no developmental regulation at the transcriptional level.

Lines:
Isofemale lines from the following locations were used. The survey of anon1A3 in D. melanogaster includes four lines from Australia, five from North America, five from Asia (Iraq, Japan, and China), nine from Europe (Cyprus, France, Italy, Spain, and the former Soviet Union), and three from East Africa (Kenya and Zimbabwe). The D. simulans sample of anon1A3 includes two lines from the United States, three from Mexico, one from Uruguay, and six from Zimbabwe. Gene anon1E9 was surveyed in three lines of D. melanogaster from Australia, four from North America, one from Asia (Iraq), four from Europe, and three from East Africa. The D. simulans sample of anon1E9 consists of three lines from North America, two from Mexico, one from Uruguay, and two from Zimbabwe. The D. melanogaster sample of gene anon1G5 comprises three lines from Australia, five from North America, one from South America (Peru), two from Asia (Iraq and Japan), three from Europe, and two from East Africa. In the D. simulans sample are three lines from North America, four from Mexico, one from South America, and six from Zimbabwe.

The lines were collected by various researchers and given to us by M. Kidwell (D. melanogaster) and M. Turelli (D. simulans) or maintained at the University of Padua. The number of lines vary between genes, mainly because polymerase chain reaction (PCR) did not work well in all lines or high quality sequences could not be obtained. If only those lines are used for analysis for which we have sequences from all three genes, essentially the same results are observed; we therefore include all sequences from the different lines in the following analysis.

DNA preparation, PCR, and sequencing:
DNA was prepared from single flies by phenol-chloroform extraction and ethanol precipitation (SAMBROOK et al. 1989 Down). The loci were amplified with PCR by using the following primers and cycling conditions in 50-µl reactions. Reaction conditions were as suggested by the manufacturer of the AmpliTaq DNA polymerase (Perkin-Elmer, Foster City, CA). Cycling conditions were: 2 min 95°, then 35 cycles of 1 min 94°, 1 min 48°, 2 min 72°, and final extension of 10 min 72°. The following primers were used for amplification: anon1A3-1, 5'-GGAGGAGGCGAGGAAGATGT-3'; anon1A3-2, 5'-GTTGGCAACATCAGACCAACT-3'; anon1E9-PR3, 5'-AATATATGCTAGCGCACCATG-3' anon1E9-PR2, 5'-ATTTCAACGTTTGCATTTGG-3'; anon1G5-PR3, 5'-AAGTATCTAGCCGACGAGGAC-3'; anon1G5-PR4, TACCCAGCT CTCATTCATCTC. The PCR products were gel purified with the Jetsorb kit (Genomed, Germany) and directly used for sequencing. Sequencing was carried out on an ABI 377 sequencer with DyeTerminator and AmpliTaq FS chemistry (Perkin-Elmer). Internal primers were used to sequence every base from both directions. Sequences were edited and aligned with ABI Factura, AutoAssembler, and Sequence Navigator programs. GenBank accession numbers are AA433202, AA433203, AA433204, AA433205, AA433206, AA433207, AA433208, AA433209, AA433210, AA433211, AA433212, AA433213, AA433214, AA433215, AA433216, AA433217, AA433218, AA433219, AA433220, AA433221, AA433222, AA433223, AA433224, AA433225, AA433226, AA433227, AA433228, AA433229, AA433230, AA433231, AA433232, AA433233, AA433234, AA433235, AA433236, AA433237, AA433238, AA433239, AA433240, AA433241, AA433242, AA433243, AA433244, AA433245, AA433246, AA433247, AA433248, AA433249, AA433250, AA433251, AA433252, AA433253, AA433254, AA433255, AA433256, AA433257, AA433258, AA433259, AA433260, AA433261, AA433262, AA433263, AA433264, AA433265, AA433266, AA433267, AA433268, AA433269, AA433270, AA433271, AA433272, AA433273, AA433274, AA433275, AA433276, AA433277, AA433278, AA433279, AA433280, AA433281, AA433282, AA433283, AA433284, AA433285, AA433286, AA433287, AA433288, AA433289, AA433290 and AF161723, AF161724, AF161725, AF161726, AF161727, AF161728, AF161729, AF161730, AF161731, AF161732, AF161733, AF161734, AF161735, AF161736, AF161737, AF161738, AF161739, AF161740, AF161741, AF161742, AF161743, AF161744, AF161745, AF161746, AF161747, AF161748, AF161749, AF161750, AF161751, AF161752, AF161753, AF161754, AF161755, AF161756, AF161757, AF161758, AF161759, AF161760, AF161761, AF161762, AF161763, AF161764, AF161765, AF161766, AF161767, AF161768, AF161769, AF161770, AF161771, AF161772, AF161773, AF161774, AF161775, AF161776, AF161777, AF161778, AF161779, AF161780, AF161781, AF161782, AF161783, AF161784, AF161785, AF161786, AF161787, AF161788, AF161789, AF161790, AF161791, AF161792, AF161793, AF161794, AF161795, AF161796. Aligned sequences and figures of variable sites are available at http://www.mbg.cornell.edu/aquadro/sequences.html.

Chromosomal in situ hybridization:
Chromosomes were prepared from Oregon-R lines from D. melanogaster and Soda Lake populations from D. simulans according to the protocol of LIM 1993 Down. cDNA inserts (1 µg; cloned into pBluescript) were biotinylated with the BioNick nick translation kit (Gibco BRL, Gaithersburg, MD). Signal detection was achieved with Vectastain (Vector Laboratories, Burlingame, CA) and Detek Hrp (ENZO, Farmingdale, NY) kits. Photographs were taken on a Zeiss microscope with a Pixera digital camera and processed with the GNU image manipulator 1.0 program.

Analysis:
The analysis of polymorphism and divergence was carried out using the program DnaSP 3.0 (ROZAS and ROZAS 1999 Down). Numbers of substitutions per site were computed with the program Kestim (COMERON 1995 Down). {theta}, an estimate of the mutation parameter 4Neµ (WATTERSON 1975 Down), and {pi}, the average number of pairwise differences (NEI 1987 Down), were estimated as measures of nucleotide diversity. Several tests for neutral evolution were applied. Tajima's D statistic compares the two different estimates of nucleotide diversity, {theta} and {pi}, which should be identical under a neutral model (D is expected to be zero) (TAJIMA 1989 Down). D is then tested for a significant difference from zero. A related test is Fu and Li's D (FU and LI 1993 Down), which counts the number of singletons in a population sample and tests whether this number is significantly different from the expected number under a neutral model. The HKA test (HUDSON et al. 1987 Down) tests whether observed levels of polymorphism and sequence divergence are consistent with a neutral equilibrium model. Regional differences in the ratio of polymorphic sites to fixed differences of the sequence data were tested with the program DNA Slider that employs various statistical procedures (see MCDONALD 1998 Down). The McDonald-Kreitman test was used to compare ratios of silent and replacement substitutions within and between species (MCDONALD and KREITMAN 1991 Down).

Lineage-specific fixed differences and polymorphisms were assigned to either D. melanogaster or D. simulans lines by comparison to the D. yakuba outgroup sequence. The following GenBank accessions of D. yakuba homologs were used: AF005844 (anon1A3), AF005848 (anon1E9), and AF005852 (anon1G5). Essentially the same parsimony criteria as described by AKASHI 1997 Down were applied to infer the ancestral state. The relative-rate test of TAJIMA 1993 Down was calculated to test whether the number of fixed substitutions differs between the two lineages. The relative-rate test of MUSE and GAUT 1994 Down was calculated with single, randomly chosen alleles from the D. melanogaster and D. simulans samples and the homologous D. yakuba sequence as outgroup.

The spatial distribution of substitutions along the coding sequence was tested with the test of TANG and LEWONTIN 1999 Down, which is based on the empirical cumulative distribution function (ECDF) statistics. This test compares the difference between the observed cumulative distribution of distances between substitutions and a theoretical, homogeneous distribution. Critical values of the test statistic for significance tests are obtained by Monte Carlo simulations of the null model (see TANG and LEWONTIN 1999 Down for details). We applied the test to analyze the clustering of silent and replacement polymorphisms and fixed substitutions to identify differences between silent and replacement substitutions and between lineages.

We compared the frequency distributions of silent and replacement polymorphisms in the population samples to detect effects of weak selection (AKASHI 1997 Down, AKASHI 1999 Down; AKASHI and SCHAEFFER 1997 Down). First, we determined whether silent substitutions change a codon from a preferred to an unpreferred one, or vice versa. Codons were classified into preferred and unpreferred codons according to AKASHI 1995 Down under the assumption that the same codons are preferred in D. melanogaster and D. simulans (AKASHI and SCHAEFFER 1997 Down). Second, the frequency distributions of preferred, unpreferred, and replacement substitutions were determined essentially as described by AKASHI 1997 Down and compared by Mann-Whitney U tests. We used two different variants of the tests: the fdMWU test (AKASHI and SCHAEFFER 1997 Down), where only polymorphisms are included in calculating the frequency distribution of the different mutational classes, and the fddMWU test (AKASHI 1997 Down), which also includes fixed differences.


*  RESULTS
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

A schematic representation of the sequenced regions is shown in Figure 1. Sequence alignments showing polymorphic sites and fixed differences can be found in an appendix provided at our web site (see METHODS).

Locus anon1A3:
This locus was sequenced from 26 lines of D. melanogaster and 12 lines of D. simulans; 930 bp were obtained from the ORF (63% of 1467 bp). The only intron within the surveyed region has a length of 58 bp and is located close to the 3' end of the ORF. Sixteen polymorphisms were detected in D. melanogaster ({pi} = 0.0023), of which 5 are synonymous and 11 nonsynonymous; 18 polymorphisms (5 synonymous, 11 nonsynonymous, and 2 noncoding) occur in the D. simulans sample ({pi} = 0.0045; Table 1). In D. melanogaster, a deletion polymorphism affecting a single amino acid (Val) was found in the Iraq line. There are also two independent, fixed indel mutations; a comparison with the sequence of D. yakuba shows that they are caused by an insertion of Glu and Thr, respectively, in D. melanogaster. In D. melanogaster and D. simulans, the gene is located in 71A, on the left arm of chromosome 3.


 
View this table:
In this window
In a new window

 
Table 1. Nucleotide diversity in D. melanogaster and D. simulans

Locus anon1E9:
At this locus, little nucleotide polymorphism is observed in 15 lines from D. melanogaster ({pi} = 0.0007), but a much higher level is observed among the 8 lines of D. simulans ({pi} = 0.0158). In D. melanogaster, 3 of the segregating variants are synonymous, and 4 are nonsynonymous; in D. simulans, the numbers are 31 for synonymous and 33 for nonsynonymous variants. In both species, anon1E9 harbors a small variable trinucleotide microsatellite with 5–9 repeat units of the GAG codon (coding for glutamate). Two alleles with 6 and 7 repeats were observed in D. melanogaster and four alleles with 5, 6, 7, and 9 repeat units in D. simulans. A second 6-bp deletion polymorphism (deleting Cys and Asn) is found in one strain of D. simulans. There are two fixed deletions, a 3-bp deletion in D. melanogaster (loss of a Ser) and a 6-bp deletion in D. simulans (loss of Ala and Val). In both species, nucleotide polymorphism at noncoding positions is not significantly different from silent positions in the coding region (Table 1). The physical location in D. melanogaster is 85B/C, on the right arm of chromosome 3. This region is inverted in D. simulans (see below).

Locus anon1G5:
This locus was sequenced in 16 lines from D. melanogaster ({pi} = 0.0042) and 14 lines from D. simulans ({pi} = 0.0125). There are 6 silent and 4 replacement polymorphic sites among the 16 lines of D. melanogaster; there are 17 silent and 20 replacement polymorphic sites in the 14 lines of D. simulans (Table 1). Nucleotide diversity is lower in the intron (Table 1), but the difference from silent polymorphism is not significant in either D. melanogaster or D. simulans. Total polymorphism is threefold higher in D. simulans than in D. melanogaster. Three indel mutations are fixed between the two species. One deletion (2 bp) is found in the intron; the other two occur in the coding sequences of D. melanogaster (insertion of three residues: Ser-Phe-Arg) and D. simulans (deletion of two residues: Ser-Val). In D. simulans, an indel polymorphism affecting two residues (Ala-Arg) segregates with a frequency of ~50%. The gene maps to 95D/E on the right arm of chromosome 3 in D. melanogaster and D. simulans.

Nucleotide polymorphism in D. melanogaster and D. simulans:
The data in Table 1 show that nucleotide diversity differs among genes and also between D. melanogaster and D. simulans. Still, the polymorphism estimates are well within the range observed for other genes from both species (see Table 1). Note, however, that the level of nucleotide polymorphism between the species varies among the three loci: at anon1A3 total nucleotide polymorphism ({pi}) is about two times higher in D. simulans than in D. melanogaster, at anon1G5 three times higher, and at anon1E9 23 times higher (Table 1). In the coding regions within each species, nucleotide diversity at silent sites is on average only threefold higher than at replacement sites. Total nucleotide diversity in D. simulans is about five times higher than in D. melanogaster; this difference has been noted before (e.g., AQUADRO 1992 Down; MORIYAMA and POWELL 1996 Down). In Drosophila, nucleotide polymorphism is correlated with recombination rate (BEGUN and AQUADRO 1992 Down; AQUADRO et al. 1994 Down). In regions of low recombination, hitchhiking combined with selective sweeps (MAYNARD SMITH and HAIGH 1974 Down; KAPLAN et al. 1989 Down; STEPHAN et al. 1992 Down) or background selection (CHARLESWORTH et al. 1993 Down) is hypothesized to remove nucleotide variation at linked loci. The chromosomal location of all three genes was determined by in situ hybridization; a measure of recombination rate in D. melanogaster (adjusted coefficient of exchange, ACE) was obtained from KINDAHL 1994 Down(anon1A3, 1.569; anon1E9, 0.727; anon1G5, 1.739). The observed levels of nucleotide polymorphism at the three loci show a positive correlation with recombination rate in D. melanogaster.

Particularly strong evidence for the effect of recombination rate on the level of intraspecific nucleotide polymorphism is observed at locus anon1E9. At this locus, nucleotide diversity ({pi}) is 23 times higher in D. simulans than in D. melanogaster, which is much more than the average difference between both species (MORIYAMA and POWELL 1996 Down). This difference is consistent with variation in the recombination rate between the two species at this locus (STURTEVANT 1929 Down; OHNISHI and VOELKER 1979 Down). In D. melanogaster, anon1E9 maps to 85B/C in the centromeric region of chromosome 3 (Figure 2A). Two reports described a large inversion of this region between D. melanogaster and D. simulans. The studies disagree about the exact breakpoints: 84B3 to 92C in OHNISHI and VOELKER 1979 Down, and 84F1 to 93F6-7 in LEMEUNIER et al. 1986 Down. Figure 2B shows that this inversion translocated the anon1E9 locus away from the centromer into a region of a higher recombination rate. This might explain the much higher nucleotide polymorphism at this locus in D. simulans.



View larger version (95K):
In this window
In a new window
Download PPT slide
 
Figure 2. Chromosomal in situ hybridization of gene anon1E9 in D. melanogaster and D. simulans. (A) Location of anon1E9 on the third chromosome of the D. melanogaster Oregon R strain. Major polytene band divisions are marked according to the maps in SORSA 1988 Down. (B) Location of anon1E9 on the third chromosome of a D. simulans strain captured at Soda Lake, California. One of the two inversions breakpoints is marked by an arrow.

Tests of neutral evolution:
Results of tests of neutral evolution are summarized in Table 2 and Table 3. The observed levels of sequence variation at loci anon1A3 and anon1G5 in D. melanogaster and D. simulans and at locus anon1E9 in D. simulans do not reject a neutral model of molecular evolution in the TAJIMA 1989 Down, FU and LI 1993 Down, and HKA (HUDSON et al. 1987 Down) tests. The only significant deviation from neutrality is observed at locus anon1E9 in D. melanogaster. Variation at this locus shows a significant difference from neutrality in the Tajima (D = -2.156, P < 0.01), Fu and Li (D = -2.504, P < 0.05), and HKA tests. In the latter test, a comparison with the 5' Adh region of KREITMAN and HUDSON 1991 Down that is often used as a supposedly neutral control region rejects neutral evolution due to a lack of polymorphic sites (Table 2). We also applied the tests of MCDONALD 1998 Down to detect deviation from neutrality in subregions of the three genes. Across a wide range of recombination rates used in these tests, we have not uncovered a significant deviation from neutrality in any of the three loci in either species (analyses not shown).


 
View this table:
In this window
In a new window

 
Table 2. Tests of neutral evolution using estimates of total nucleotide diversity


 
View this table:
In this window
In a new window

 
Table 3. McDonald-Kreitman test of neutral evolution

Neutral theory predicts that the ratio of silent to replacement substitutions should be identical for polymorphisms within species and for fixed differences between species. This prediction is tested in the McDonald-Kreitman (MK) test (MCDONALD and KREITMAN 1991 Down). Table 3 shows that the MK test does not reject the null hypothesis of neutral evolution in any of the three loci. The test at locus anon1E9 is close to significance (G = 2.98, P = 0.08), because the ratio of replacement to silent substitutions is higher for fixed differences than for polymorphisms. The MK test can be modified with respect to length of regions analyzed. Such tests were carried out with subregions of loci anon1E9 and anon1G5, because in these genes replacement substitutions cluster in certain regions (see below). The coding sequence of gene anon1E9 was partitioned in four subregions: the N-terminal domain, the first zinc-finger cluster, the linker between the two zinc-finger clusters, and the second zinc-finger cluster (Figure 1). None of the subregion MK tests were significant. The same result was obtained with anon1G5, which is characterized by two conserved N- and C-terminal regions and a very rapidly evolving central domain (analyses not shown).

Lineage effects:
We used D. yakuba as outgroup to assign fixed substitutions to either the D. melanogaster or D. simulans lineages. The number of these substitutions was then compared between lineages using the relative-rate test described by TAJIMA 1993 Down. Under the null hypothesis of neutral evolution, there should be no significant differences in the number of substitutions between D. melanogaster and D. simulans lineages. Table 4 shows that significant rate differences were observed only for locus anon1A3. There are more than three times more replacement substitutions in the D. melanogaster than in the D. simulans lineage (18:6, {chi}2 = 6.0, P < 0.05). Identical results were obtained with the relative-rate test of MUSE and GAUT 1994 Down using a randomly selected allele from the D. melanogaster and D. simulans samples and the D. yakuba sequence as outgroup (Table 4).


 
View this table:
In this window
In a new window

 
Table 4. Relative-rate tests

The test by TANG and LEWONTIN 1999 Down detected differences in the spatial distribution of substitutions along the coding region in the D. melanogaster and D. simulans lineages (Table 5). At loci anon1E9 and anon1G5, both replacement polymorphisms and fixed differences are significantly clustered in the D. simulans, but not in the D. melanogaster lineage. At anon1E9, the replacement substitutions are clustered in the linker regions between the zinc-finger domains and at anon1G5 in the central domain of the protein. No difference between the two lineages was seen at anon1A3. The test shows a homogeneous distribution of silent polymorphisms and silent fixed differences in five out of six comparisons. The only significant clustering of synonymous substitutions is seen at locus anon1G5 in D. melanogaster. There, silent polymorphisms are absent in the region that shows a large number of replacement polymorphisms.


 
View this table:
In this window
In a new window

 
Table 5. Test for heterogeneity in the location of lineage-specific substitutions along the coding sequence (TANG and LEWONTIN 1999 Down)

The comparison of the frequency spectra of replacement, unpreferred, and preferred silent substitutions in different lines provides further evidence for the nature and direction of weak selection within populations. Since the three different types of mutation are interspersed along the sequence, identical frequency distributions of polymorphisms in each class are expected under a neutral model. This prediction forms the basis of tests for neutrality developed by Akashi, which are powerful for detecting weak selection if the assumptions of the test are met (AKASHI 1997 Down, AKASHI 1999 Down; AKASHI and SCHAEFFER 1997 Down). Preferred and unpreferred polymorphisms do not to appear to have different fitness effects at all three loci, and there is little evidence for the strong major codon usage observed in many other Drosophila genes (AKASHI 1995 Down). The numbers of preferred and unpreferred silent substitutions are relatively similar to each other in D. melanogaster and D. simulans (Table 6). In most other Drosophila genes studied so far, the number of unpreferred silent substitutions exceeds preferred substitutions in the D. melanogaster line. This is supported by comparisons of frequency distributions of preferred and unpreferred silent substitutions in the fdMWU and fddMWU tests. Frequency distributions are somewhat biased toward low frequencies and are not significantly different from each other at all three loci in both species. Similarly, no significant differences between frequency distributions of replacement and preferred or unpreferred silent polymorphisms are observed, although frequencies of replacement polymorphisms tend to be lower (results not shown).


 
View this table:
In this window
In a new window

 
Table 6. Changes in codon preference at fixed silent substitutions


*  DISCUSSION
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

The present survey in D. melanogaster and D. simulans demonstrates that the proteins encoded by loci anon1A3, anon1E9, and anon1G5 exhibit a large degree of amino acid sequence variation not only between (SCHMID and TAUTZ 1997 Down) but also within species. The common characteristic of all three loci is that, in their coding regions, more replacement than silent substitutions are segregating within populations and are fixed between closely related species. At most loci that were studied in Drosophila, the opposite pattern was observed, namely an excess of silent over replacement substitutions within populations and between species. For example, in a survey of nucleotide polymorphism in Drosophila (22 loci from D. melanogaster, 12 loci from D. simulans; MORIYAMA and POWELL 1996 Down), and in more recent studies of Gld (HAMBLIN and AQUADRO 1997 Down), white (KIRBY and STEPHAN 1995 Down, KIRBY and STEPHAN 1996 Down), Tpi (HASSON et al. 1998 Down), and hunchback (TAUTZ and NIGRO 1998 Down), more silent than replacement polymorphisms in the coding region are segregating in populations of D. melanogaster and D. simulans. In the study of MORIYAMA and POWELL 1996 Down, 26.4% of all polymorphisms in D. melanogaster and 11.6% in D. simulans were replacement polymorphisms. Only at loci encoding the sperm-gland accessory protein Acp26Aa (AGUADE et al. 1992 Down; TSAUR and WU 1997 Down; AGUADE 1998 Down; TSAUR et al. 1998 Down) and the viral resistance protein ref(2)p (WAYNE et al. 1996 Down) were more replacement than silent polymorphisms observed, and they evolve under positive selection. Therefore, it is interesting that all three loci surveyed in this study show a high proportion of replacement polymorphisms in both species. Three different hypotheses could explain this: a high mutation rate, a lack of constraints (high rate of neutral evolution), or positive selection. These factors will be discussed in turn.

No evidence for a higher mutation rate:
It has been suggested that mutation rates may be variable in the genome of Drosophila. Interspecific DNA-DNA hybridization revealed a substantial fraction of single-copy DNA in the Drosophila genome that evolves rapidly (WERMAN et al. 1990 Down). Sequencing of a boundary of fast and slowly evolving genomic regions led to the notion that the differences are not due to selection but to different mutation rates (MARTIN and MEYEROWITZ 1986 Down). However, a high mutation rate is not supported as a plausible explanation for the rapid sequence divergence at the loci surveyed in this study. A high mutation rate should also affect silent sites of a locus and, consequently, a high silent substitution rate (in the absence of codon usage bias, which is the case at all three loci) would be expected. Compared to the silent divergence between D. melanogaster and D. simulans in the genes surveyed by MORIYAMA and POWELL 1996 Down, no larger numbers of silent substitutions per site are observed in interspecific comparisons of the three loci in this study (Table 7). Additionally, in our earlier screen (SCHMID and TAUTZ 1997 Down), 18 pairs of homologous sequences (including the three loci of this study) were compared between D. melanogaster and D. yakuba. Among all genes, the numbers of synonymous substitutions per site varied only 4-fold, while the numbers for replacement substitutions varied 30-fold. Since the number of silent substitutions per site is similar among all genes and is not correlated with the number of nonsynonymous substitutions, it is unlikely that the rapid evolution of these genes is driven by a high locus-specific mutation rate.


 
View this table:
In this window
In a new window

 
Table 7. Number of nonsynonymous and synonymous substitutions per site between D. melanogaster, D. simulans, and D. yakuba

No evidence for strong positive selection:
The other two hypotheses, namely neutral evolution and positive selection, were analyzed with various tests for neutral evolution. KREITMAN and AKASHI 1995 Down reviewed evidence that patterns of polymorphism and divergence seen at many loci under study in Drosophila are not in accord with the hypothesis that the variation seen is strictly neutral or unaffected by linked sites. Positive selection, purifying selection, and differences in recombination must be taken into account to explain the data. In fact, in the survey of MORIYAMA and POWELL 1996 Down, about half of the loci from D. melanogaster and D. simulans failed one of the tests for neutrality. Other studies also uncovered certain deviations from neutrality in a number of loci (Gld, HAMBLIN and AQUADRO 1997 Down; concertina, WAYNE and KREITMAN 1996 Down; hunchback, TAUTZ and NIGRO 1998 Down; white, KIRBY and STEPHAN 1995 Down; ref(2)p, WAYNE et al. 1996). At the three loci surveyed in this study, despite the high level of amino acid polymorphism and divergence, neutrality was not rejected by the tests, with the exception of locus anon1E9 in D. melanogaster. Clearly, the rapid evolution of their amino acid sequences is not driven by strong selection for sequence divergence, which, for example, was implicated in the rapid evolution of the accessory gland protein, Acp26Aa (TSAUR and WU 1997 Down; AGUADE 1998 Down; TSAUR et al. 1998 Down). All nucleotide polymorphisms at locus anon1E9 in Drosophila melanogaster are singletons and cause negative Tajima's D and Fu and Li's D values, which suggest that the excess of rare polymorphisms is due to a recent selective sweep at this locus. However, anon1E9 may not have been the target of this selective sweep. First of all, the MK test at this locus is not significant, so there is no evidence for selection in the protein. Further, this gene resides in a region of very low recombination, and the lack of polymorphic sites may result from hitchhiking with a recent selective sweep at another linked locus (MAYNARD SMITH and HAIGH 1974 Down; BERRY et al. 1991 Down). As recent theoretical studies on selection incorporating the effects of recombination suggest, background selection may also be strong enough to decrease the level of polymorphism in centromeric regions as seen at locus anon1E9 (HUDSON and KAPLAN 1995 Down; NORDBORG et al. 1996 Down). But Tajima's D is highly (and significantly) negative, which is not predicted by background selection (CHARLESWORTH et al. 1995 Down). The most compelling evidence against selection-driven divergence at locus anon1E9 comes from the fact that the region harboring this gene is inverted in D. simulans relative to D. melanogaster. Because of this chromosomal inversion, anon1E9 is located in the middle of chromosomal arm 3R in D. simulans where recombination rates are higher than in the centromeric region. The observed level of polymorphism in D. simulans is 10-fold higher, and in this species, the tests for neutrality do not give any evidence for the hypothesis that the rapid evolution at anon1E9 results from continuous positive selection.

Nearly neutral polymorphisms:
The fixation rate of completely neutral mutations is determined only by the mutation rate (KIMURA 1983 Down), while the fixation of nearly neutral mutations is also dependent on the effective population size. In small populations, nearly neutral mutations behave effectively neutral if Nes < 1, and their fate is determined mainly by random drift (OHTA 1973 Down, OHTA 1992 Down). Different average heterozygosities of D. melanogaster and D. simulans genes suggest that the effective total population size of D. melanogaster is three to six times smaller than that of D. simulans (AQUADRO et al. 1988 Down; AQUADRO 1992 Down; MORIYAMA and POWELL 1996 Down). Under a neutral model, slightly deleterious mutations are expected to be more efficiently removed from D. simulans than D. melanogaster populations, and slightly advantageous mutations should be more frequently fixed in D. simulans. Both the relative-rate test and the test by TANG and LEWONTIN 1999 Down detect lineage-specific differences at the three loci, supporting the hypothesis that a substantial number of segregating replacement polymorphisms are not neutral but slightly deleterious. The relative-rate test reveals a significantly larger number of replacement substitutions at locus anon1A3 in the D. melanogaster lineage. The Tang and Lewontin test shows that nonsynonymous substitutions are clustered at anon1E9 and anon1G5 in D. simulans, but not in D. melanogaster (Table 5). A similar pattern was also found in the G6pd gene, where a larger number of replacement substitutions could be observed in the D. simulans lineage (EANES et al. 1996 Down). The MK test was highly significant in this case due to an excess of fixed replacement substitutions, indicating the occurrence of positive selection in the D. simulans lineage. At anon1G5, the number of replacement substitutions is also larger in the D. simulans than in the D. melanogaster lineage, but the difference is not significant in the relative-rate test, and the MK test gives no evidence for an excess of replacement substitutions. Replacement substitutions are also clustered at anon1E9 in the D. simulans sample, but the number of replacement substitutions in the D. simulans lineage is smaller than in the D. melanogaster lineage. The lineage effects at anon1A3 and anon1E9 loci are probably due to the smaller effective population size in D. melanogaster. A certain proportion of the substitutions appears to be slightly deleterious with selection coefficients too small to be "seen" by selection (Nes < 1), but large enough to be removed from D. simulans populations, particularly if they occur in constrained regions of the protein. This conclusion is supported by a comparison of the frequency distributions of replacement and silent (preferred and unpreferred) substitutions. In comparison to silent polymorphisms, the distribution of replacement polymorphisms tends to be skewed toward low frequencies, suggesting that most of them are slightly deleterious.

Nucleotide polymorphism and interspecific divergence:
Sequences that evolve under a neutral model are expected to show a correlation between interspecific divergence and intraspecific polymorphism (KIMURA 1983 Down). This prediction was not met in several studies of polymorphism and divergence in Drosophila, where polymorphism was lower (particularly in regions of low recombination) than expected from the interspecific divergence (BEGUN and AQUADRO 1991 Down, BEGUN and AQUADRO 1992 Down; BERRY et al. 1991 Down; LANGLEY et al. 1993 Down). For example, a survey of the cubitus interruptusD locus on the fourth chromosome did not uncover a single polymorphism in D. melanogaster and only one in D. simulans (BERRY et al. 1991 Down). Yet, the level of sequence divergence between both species is ~5%. This lack of correlation was explained by genetic hitchhiking with selective sweeps or background selection that removed most or all polymorphism within regions linked to the affected one.

The results of this survey are consistent with the findings of the earlier studies. Levels of nucleotide polymorphism among the three loci are different and correlate with the recombination rate. Under a neutral model, divergence between species should correspond to the observed level of nucleotide polymorphism. This is not observed; rather, the synonymous (Ks) and nonsynonymous divergences (Ka) are very similar among the three loci between D. melanogaster and D. simulans (Table 7). This is particularly evident at locus anon1E9, where D. melanogaster exhibits much less polymorphism (e.g., silent sites: {pi} = 0.0001) than D. simulans ({pi} = 0.0032), yet the numbers of substitutions per site of D. melanogaster and D. simulans are similar when compared to D. yakuba (D. melanogaster vs. D. yakuba, Ks = 0.2834; D. simulans vs. D. yakuba, Ks = 0.2987).

Limitations of neutrality tests:
Although tests for neutral evolution suggest that most sequence evolution in these genes is neutral or nearly neutral, our results need to be interpreted with caution. The main goal of this study was to determine whether the large variation of amino acids we observed between species also exists within populations of Drosophila. This is achieved most easily by comparing individuals sampled from across the whole geographic distribution of a species. Therefore, we sequenced alleles from worldwide collections of D. melanogaster and D. simulans lines and only small numbers of alleles from the same local populations. Such a sample, however, does not allow an analysis of the geographic population structure of species or an identification of different patterns of selection in local populations. For example, population-specific sweeps for certain loci were detected in a study of microsatellite variation in separate populations across the world (SCHLOTTERER et al. 1997 Down). Also, more detailed analyses of populations of D. melanogaster and D. simulans have revealed that both species indeed exhibit a considerable amount of population structure (BEGUN and AQUADRO 1993 Down; HAMBLIN and VEUILLE 1999 Down). Nucleotide polymorphism of surveyed loci can vary significantly between different populations and affect tests of neutrality if they assume a mutation-drift equilibrium. For example, at the Gld locus in D. melanogaster (HAMBLIN and AQUADRO 1997 Down), the ratio of replacement to silent substitutions is significantly elevated (in a MK test) in the Chinese population sample, but not in two samples from Africa or a third sample from North America. In our sample, singletons may not necessarily be rare alleles (although they are treated like that in Tajima's test, therefore rendering D negative), but could segregate at high frequency in their local populations. A more comprehensive survey might reveal significant population differentiation at the three genes.

An additional problem is that current tests of neutral evolution are useful for detecting strong positive selection, but do not reject the null hypothesis of neutral evolution if selection coefficients are small. Power analyses have shown that Tajima's D and Fu and Li's D fail to detect a selective sweep when it occurred in the distant past or very recently and that their power is low with small sample sizes (SIMONSEN et al. 1995 Down). Similar results were obtained in an analysis of the HKA test (M. FORD and C. F. AQUADRO, unpublished results). This situation becomes even more complicated because weak and episodic selection models produce patterns of nucleotide polymorphism under realistic parameters that are indistinguishable from neutral evolution in a test like Tajima's D (GILLESPIE 1994 Down). The existence of weak selection and the problems associated with detecting it are now widely acknowledged (AKASHI 1996 Down; KREITMAN 1996 Down; OHTA 1996 Down; OHTA and GILLESPIE 1996 Down; WAYNE and SIMONSEN 1998 Down).

Although strong positive selection does not seem to drive the rapid evolution of the three loci, we do not entirely exclude (for reasons discussed above) the possibility that at least a certain proportion of the large number of replacement polymorphisms may be subject to weak positive or balancing selection. For example, in the complete absence of positive selection, one would expect a higher nonsynonymous rate in the D. melanogaster lineage, because of its smaller effective population size; not only completely neutral but also slightly deleterious substitutions should get fixed in this lineage. Indeed, at loci anon1A3 and anon1E9, more replacement substitutions occur in the D. melanogaster lineage. In the most rapidly evolving gene anon1G5, however, more replacement substitutions occur in the D. simulans lineage (Table 4). Although the relative-rate test and the other tests for neutral evolution do not reject neutral evolution, the existence of some positive selection cannot be entirely excluded.

Implications for genome-wide surveys of nucleotide polymorphism:
The three loci we surveyed for this study constitute a random sample of protein coding genes from the genome of Drosophila with regard to phenotypic effects. Although their biochemical functions are probably very different, their common characteristic is the fast evolution of their amino acid sequence as shown in our previous screen (SCHMID and TAUTZ 1997 Down) and in this study. Because of the random isolation of these clones, it is possible to estimate the fraction of genes in the Drosophila genome that are expected to show similar rates of evolution. In the original screen, about one-third of ~100 clones was scored as fast evolving by genomic cross-hybridization experiments. Sequence comparisons of 10 clones with their D. yakuba homologs lead to the estimate that ~20% of the Drosophila genes are fast evolving and exhibit a large number of replacement polymorphisms. Since the Drosophila genome probably has a similar number of genes as Caenorhabditis elegans (~19,000; C. ELEGANS SEQUENCING CONSORTIUM 1998), several thousand Drosophila genes can be expected to evolve with few evolutionary constraints.

We propose that a similar proportion of rapidly evolving genes can be expected in the genomes of other eukaryotes. All three genes of this study have no or only low sequence similarity to genes from other species and therefore are "orphans." Since orphans are also common in other eukaryotes whose genome has been partially or completely sequenced (GOFFEAU et al. 1996 Down; BEVAN et al. 1998 Down; C. ELEGANS SEQUENCING CONSORTIUM 1998), it is probable that these fast evolving genes are ubiquitous components of eukaryotic genomes. It will be interesting to explore the long-term evolution of these rapidly evolving genes and their utility for phylogenetic analyses of closely related taxa. It will also be of critical importance to understand the relationship between the rapid sequence evolution and the structure and function of the proteins encoded by these genes. If there is only little conservation on sequence level, it may not be possible to identify homologs in other phyla (if they exist there at all). For example, we were not able to detect significant sequence similarity between anon1A3 and anon1G5, and the genes from the C. elegans genome. In these cases, additional studies such as a genetic analysis or a determination of the protein structure will be necessary for identifying the function of these proteins. It will also be important to study whether these genes contribute to the phenotypic differences between species (TAUTZ and SCHMID 1997 Down).


*  FOOTNOTES

1 Present address: Institut für Genetik, Universität zu Köln, Weyertal 121, 50931 Köln, Germany. Back


*  ACKNOWLEDGMENTS

This article is dedicated to the memory of our collaborator Loredana Nigro who sadly died in October 1998. We thank M. Hamblin for advice about in situ hybridization and the members of the Aquadro lab for discussion. This work was supported by a postdoctoral fellowship of the Deutsche Forschungsgemeinschaft (DFG) to K.J.S., an European Molecular Biology Organization short-term fellowship to L.N., a National Institutes of Health grant to C.F.A., and various DFG grants to D.T.

Manuscript received May 13, 1999; Accepted for publication August 3, 1999.


*  LITERATURE CITED
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

AGUADÉ, M., 1998  Different forces drive the evolution of the Acp26Aa and Acp26Ab accessory gland genes in the Drosophila melanogaster species complex. Genetics 150:1079-1089[Abstract/Free Full Text].

AGUADÉ, M., N. MIYASHITA, and C. H. LANGLEY, 1992  Polymorphism and divergence in the Mst26A male accessory gland gene region in Drosophila. Genetics 132:755-770[Abstract].

AKASHI, H., 1995  Inferring weak selection from patterns of polymorphism and divergence at "silent" sites in Drosophila DNA. Genetics 139:1067-1076[Abstract].

AKASHI, H., 1996  Molecular evolution between Drosophila melanogaster and D. simulans: reduced codon bias, faster rates of amino acid substitution, and larger proteins in D. melanogaster.. Genetics 144:1297-1307[Abstract].

AKASHI, H., 1997  Codon bias in Drosophila: population genetics of mutation-selection drift. Gene 205:269-278[Medline].

AKASHI, H., 1999  Inferring the fitness effects of DNA mutations from polymorphism and divergence data: statistical power to detect directional selection under stationarity and free recombination. Genetics 151:221-238[Abstract/Free Full Text].

AKASHI, H. and S. W. SCHAEFFER, 1997  Natural selection and the frequency distribution of "silent" DNA polymorphism in Drosophila. Genetics 146:295-307[Abstract].

AQUADRO, C. F., 1992  Why is the genome variable? Insights from Drosophila. Trends Genet. 8:355-362[Medline].

AQUADRO, C. F., K. M. LADO, and W. A. NOON, 1988&#