- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Kern, A. D.
- Articles by Begun, D. J.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Kern, A. D.
- Articles by Begun, D. J.
Genomic Effects of Nucleotide Substitutions in Drosophila simulans
Andrew D. Kern2,a, Corbin D. Jones2,a, and David J. Begunaa Center for Population Biology, University of California, Davis, California 95616
Corresponding author: Andrew D. Kern, University of California, 1 Shields Ave., Davis, CA 95616., adkern{at}ucdavis.edu (E-mail)
Communicating editor: S. W. SCHAEFFER
| ABSTRACT |
|---|
Selective fixation of beneficial mutations reduces levels of linked, neutral variation. The magnitude of this "hitchhiking effect" is determined by the strength of selection and the recombination rate between selected and neutral sites. Thus, depending on the values of these parameters and the frequency with which directional selection occurs, the genomic scale over which directional selection reduces levels of linked variation may vary widely. Here we present a permutation-based analysis of nucleotide polymorphisms and fixations in Drosophila simulans. We show evidence of pervasive small-scale hitchhiking effects in this lineage. Furthermore, our results reveal that different types of fixations are associated with different levels of linked variation.
FIXATION of beneficial mutations results in reductions of linked, neutral variation. The scale of this hitchhiking effect (![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
We can study the importance of small-scale hitchhiking effects by determining if levels of polymorphism are reduced near nucleotide sites that have fixed in the recent past. If some fraction of such sites fixed under directional selection, we may observe less heterozygosity in regions flanking fixations compared to randomly selected regions of DNA. Moreover, a priori categorization of fixations allows us to investigate possible heterogeneity of the substitution process across mutant classes. For example, if a greater fraction of amino acid fixations result from selection (compared to silent or noncoding fixations), the level of polymorphism in regions near amino acid fixations may be reduced relative to that in regions near silent fixations. More generally, sites experiencing stronger or more recent directional selection should be associated with regions of lower heterozygosity.
Here we develop a permutation-based test for detecting small-scale reductions of heterozygosity. Using standard methods from meta-analysis (![]()
![]()
![]()
![]()
![]()
| MATERIALS AND METHODS |
|---|
Sequence data:
The names, physical locations, and summary statistics of variation for the loci used in our analysis are in Table 1. Most of the sequence data we used are from ![]()
|
All D. simulans-specific fixations were identified by parsimony using D. melanogaster and D. yakuba outgroup data. Parsimony should reliably identify ancestral states given the relatively low levels of sequence divergence between these species (![]()
![]()
and
(![]()
![]()
|
Permutation analysis:
The goal of the permutation analysis was to use DNA polymorphism data to empirically generate null distributions of nucleotide heterozygosities for windows of defined size. The null hypothesis is that levels of DNA polymorphism in regions near sites that fixed in a gene along the D. simulans lineage ("test sites") are the same as levels of polymorphism observed in randomly selected regions within the same gene. The alternative hypothesis is that regions near test sites have reduced heterozygosity. Each test site was assigned to a category of fixation (i.e., unpreferred, preferred, replacement, silent). For each gene, we estimated
and
for a window of 200 bp centered on each test site of a given category and then calculated the mean heterozygosity across test sites. Test site windows could overlap if fixations were close to one another. We then permuted the locations of test sites within each gene by centering windows of the same size on "randomly" selected sites (permuting locations of test sites rather than polymorphic sites preserves any underlying physical heterogeneity of polymorphic sites in a gene). Given that third positions of codons tend to be more variable than first and second positions, randomly selected sites in a gene may not reflect the underlying distribution of test sites across codon positions in real data. Thus, not all sites are exchangeable (![]()
and
were then estimated for these randomly selected windows. This process was repeated several hundred times for each gene to generate a distribution of heterozygosities for windows of a given size. The total number of permutations depended on the number of exchangeable sites, which is approximately equal to the number of codons in the sample.
The observed mean heterozygosity across test sites for each gene was compared to the permutation-generated distribution. This yields the probability that the observed
and
for windows around test sites were lower than expected under the null hypothesis (thus, this is a one-tailed test). A potential problem with our approach could be that the ends of surveyed DNA sequences were undersampled during the permutations as windows exceeding the ends of the sequences were excluded from the analysis. However, this is not a major concern as heterozygosities at the 5' and 3' ends of the surveyed regions were not significantly different from heterozygosities at other regions (analysis not shown).
The choice of window size is a complex and important issue that may affect the picture of variation within a gene and the power of our analysis (![]()
![]()
75 bp, with a standard deviation of 25 bp. A 200-bp window is approximately twice the mean number of base pairs between polymorphic sites (75 bp) plus one standard deviation (25 bp). This means that most randomly chosen windows of 200 bp will include at least one polymorphic site. We used the mean plus a standard deviation because using only the mean would have reduced our power in genes with less polymorphism, whereas increasing the window size was likely not biased. Regardless, the results using a window size based on the mean were not significantly different from those using a window size based on the mean plus a standard deviation (data not shown). Software and source code implementing this method are available from the authors and at http://limulus.ucdavis.edu/~cojo/.
Statistical analysis:
Failure to reject the null hypothesis at a locus may reflect a lack of polymorphism at that locus or other factors limiting the power of our analysis. We used Fisher's combined probability test to effectively increase our ability to detect a significant trend in the data. This test is suitable when separate statistical tests on different data sets test the same scientific hypothesis (![]()
![]()
![]()
One of the limitations of Fisher's test is that it cannot distinguish between several tests with consistent weak effects vs. a mixture of tests with strong effects and tests with no effect. To address this limitation, we used a standard test statistic from metaanalysis, Glass's g:

(![]()
E is the mean level of polymorphism in windows surrounding test sites,
C is the mean level of polymorphism in windows surrounding all sites of the gene, and sc is the standard deviation of polymorphism in windows surrounding all sites of the gene. Thus, g is a unitless measure of the reduction (or inflation) of polymorphism surrounding test sites in a gene.
Using g has two main advantages. First, it allows us to estimate the relative magnitude of the reduction in heterozygosity near test sites from different categories of fixation. Second, g scores can be used to compare the relative reduction of polymorphism surrounding test sites across loci. If, on average, there is no difference in the levels of heterozygosity surrounding test sites and the levels of heterozygosity at all sites then the mean g across loci should be zero. A t-test can be used to test the null hypothesis that g is zero (i.e., there is no effect). This is an improvement over Fisher's test in that the null is less likely to be rejected if there are only a few genes of strong effect and many genes of no effect.
| RESULTS |
|---|
Table 3 and Table 4 show the results of permutation analyses of polymorphism in 200-bp windows centered on replacement and silent fixations, respectively. Heterozygosity near replacement fixations (
= 0.0067,
= 0.0073) is slightly, though not significantly, reduced compared to overall levels of heterozygosity (
= 0.0087,
= 0.0094) in sequenced regions of individual genes (Fisher's combined probability, d.f. = 16, P = 0.15 and P = 0.08 for
and
, respectively). To avoid the confounding effects of pooling data across loci, we calculated Glass's g statistic for each locus [g is a dimensionless measure of the difference between heterozygosity at our test sites and that of the gene as a whole (![]()
; g = -0.379, t = -1.195, P = 0.25 for
). Similarly, heterozygosity in 200-bp windows centered on silent fixations (
= 0.0077;
= 0.0081) is not significantly reduced compared to overall levels of heterozygosity (
= 0.0084,
= 0.0086) in sequenced regions of individual genes (Fisher's combined probability, d.f. = 50, P = 0.06 and P = 0.054 for
and
, respectively). Interestingly, mean g across all loci for silent fixations is significant for
(g = -0.484, t = -2.726, P = 0.0118) and is marginally significant for
(g = -0.385, t = -2.049, P = 0.0516). None of the individual genes shows significantly reduced heterozygosity near replacement or silent fixations when critical values are Bonferroni corrected for multiple tests.
|
|
Genomic patterns of codon usage in Drosophila suggest that silent mutations can be placed into at least two categories, preferred and unpreferred (![]()
= 0.0079,
= 0.0086). Furthermore, no individual genes showed a significant reduction of polymorphism in windows centered on unpreferred fixations. Windows centered on preferred fixations, however, show a highly significant reduction of polymorphism (Table 6; d.f. = 46, P = 0.0096 and P = 0.0041 for
and
, respectively). Average polymorphism in regions near preferred codon fixations (
= 0.0065,
= 0.0065) is
25% lower compared to the genes from which they were sampled (
= 0.0084,
= 0.0086). This effect is confirmed by the mean g across all loci with preferred fixations (g = -0.612, t = -2.851, P = 0.0093 for
; g = -0.631, t = -3.159, P = 0.0045 for
), which also indicates a significant reduction in levels of heterozygosity flanking preferred fixations. Fig 1 provides a visual comparison of the distributions of g across loci for preferred and unpreferred fixations. Although five loci (AP-50, Cen190, crq, mei-218, and Pgd) show large reductions of
for windows centered on preferred fixations, none are individually significant when critical values are conservatively adjusted for multiple tests.
|
|
|
Fig 2 illustrates the effect of window size on our analysis. Fig 2A shows the results of an expanding window analysis for a strongly significant result, in this case a significant reduction around preferred sites. Clearly, the reduction in heterozygosity is statistically detectable for a variety of window sizes. Fig 2B and Fig C, shows results typical of nonsignificant genes. Both genes lack the long stretch of significant window sizes seen in Fig 2A. In Fig 2C, preferred fixations drop below P = 0.10 for windows of
140 bases, but only briefly, which suggests that this dip was due to chance.
|
One concern regarding our analysis is that we have assumed that four-codon families can be represented as having only two fitness classes, preferred and unpreferred. However, conserved patterns of rank order of codon usage within codon families across widely divergent Drosophila (![]()
= 0.0055,
= 0.0054; unpreferred,
= 0.0091,
= 0.0099), although this difference is not statistically significant (perhaps as the consequence of reduced power in this restricted data set).
| DISCUSSION |
|---|
Our analysis of D. simulans polymorphism and divergence data revealed no evidence of hitchhiking effects associated with replacement fixations. One possible explanation is that the proteins in our sample evolve by genetic drift in D. simulans. Alternatively, replacement fixations may be composed of a large class of neutral mutants and a small class of strongly selected mutations. If this were the case we might not observe an overall association of replacement fixations with reduced heterozygosity. Finally, the power of our analyses of replacement fixations could be compromised if the physical scale of reduced variation near replacement fixations were greater than the size of the windows or gene regions used in our analyses. This explanation, however, seems unlikely because loci that have fixed at least one amino acid are roughly as polymorphic as those that have fixed only unpreferred mutations (Table 2).
We observed a reduction of linked polymorphism near preferred fixations, but not near unpreferred fixations (Table 7). One might expect this result under the simple premise that preferred and unpreferred mutations are slightly beneficial and slightly deleterious alleles, respectively. However, this expectation is probably incorrect because we are examining a special set of mutations, namely those that have fixed. ![]()
![]()
![]()
![]()
![]()
|
Previous analyses suggested that the D. simulans lineage has fixed significantly more unpreferred mutations (![]()
![]()
![]()
![]()
![]()
As is the case for silent sites, the lack of hitchhiking effects associated with amino acid fixations could be explained by invoking episodic evolution if these fixation events occurred in the more distant past compared to preferred fixations. This hypothesis may be testable if data from D. mauritiana and D. sechellia allow us to identify which mutations in the D. simulans lineage fixed in the more recent vs. more ancient past. It will also be interesting to investigate whether the spatial distribution of polymorphisms across D. melanogaster genes is similar to what we have observed in D. simulans.
Although we are not in a position to strongly favor a particular substitution model for our data, the results reported here certainly provide motivation for additional analyses of linked selection. For example, we have little understanding of how different population genetic parameters affect our permutation test or the population genetic scenarios under which we may be able to detect the local footprint of selection. Finally, our results underscore AKASHI's (1995, 1999) cautionary notes regarding the dangers of making population genetics inferences on the causes of protein evolution under the premise that silent mutations are neutral (e.g., ![]()
![]()
| FOOTNOTES |
|---|
Sequence data from this article have been deposited with the EMBL/GenBank Data Libraries under accession nos.
AF544231 and
AF544232,
AF544233,
AF544234,
AF544235,
AF544236,
AF544237,
AF544238,
AF544239. ![]()
2 These authors contributed equally to this work. ![]()
| ACKNOWLEDGMENTS |
|---|
We thank P. Awadalla, A. Betancourt, J. Gillespie, C. Langley, M. Lawniczak, M. Przeworski, S. Schaeffer, D. Weinreich, and two anonymous reviewers for comments on drafts of this manuscript. A.D.K. is a Howard Hughes Medical Institute predoctoral fellow. C.D.J. and D.J.B. were funded by the National Science Foundation.
Manuscript received April 22, 2002; Accepted for publication September 26, 2002.
| LITERATURE CITED |
|---|
AKASHI, H., 1995 Inferring weak selection from patterns of polymorphism and divergence at silent sites in Drosophila. Genetics 139:1067-1076.[Abstract]
AKASHI, H., 1999 Inferring the fitness effects of DNA mutations from polymorphism and divergence data: statistical power to detect directional selection under stationarity and free recombination. Genetics 151:221-238.
AQUADRO, C. F., D. J. BEGUN and E. C. KINDAHL, 1994 Selection, recombination and DNA polymorphism in Drosophila, pp. 4656 in Non-neutral Evolution: Theories and Molecular Data, edited by B. GOLDING. Chapman & Hall, New York.
BEGUN, D. J., 2001 The frequency distribution of nucleotide variation in Drosophila simulans.. Mol. Biol. Evol. 18:1343-1352.
BEGUN, D. J. and C. F. AQUADRO, 1992 Levels of naturally occurring DNA polymorphism correlate with recombination rates in D. melanogaster.. Nature 356:519-520.[Medline]
BEGUN, D. J. and P. WHITLEY, 2000 Reduced X-linked nucleotide polymorphism in Drosophila simulans.. Proc. Natl. Acad. Sci. USA 97:5960-5965.
BERRY, A. J., J. W. AJIOKA, and M. KREITMAN, 1991 Lack of polymorphism on the Drosophila fourth chromosome resulting from selection. Genetics 129:1111-1117.[Abstract]
BULMER, M. G., 1991 The selection-mutation-drift theory of synonymous codon usage. Genetics 129:897-907.[Abstract]
COMERON, J. M. and M. KREITMAN, 2002 Population, evolutionary and genomic consequences of interference selection. Genetics 161:380-410.
FAY, J. C., G. J. WYCKOFF, and C.-I WU, 2002 Testing the neutral theory of molecular evolution with genomic data from Drosophila.. Nature 415:1024-1026.[Medline]
FISHER, R. A., 1935 The Design of Experiments. Hafner Press, New York.
FISHER, R. A., 1954 Statistical Methods for Research Workers, Ed. 12. Hafner Press, New York.
GLASS, G. V., 1976 Primary, secondary, and meta-analysis of research. Educ. Res. 5:3-8.
GOOD, P., 2000 Permutation Tests: A Practical Guide to Resampling Methods for Testing Hypotheses. Springer, New York.
HILL, W. G. and A. ROBERTSON, 1966 The effect of linkage on limits to artificial selection. Genet. Res. 8:269-294.[Medline]
KAPLAN, N. L., R. R. HUDSON, and C. H. LANGLEY, 1989 The "hitchhiking effect" revisited. Genetics 123:887-899.
KIM, Y. and W. STEPHAN, 2002 Detecting a local signature of genetic hitchhiking along a recombining chromosome. Genetics 160:765-777.
KREITMAN, M., and M. ANTEZANA, 2000 The population and evolutionary genetics of codon bias, pp. 82101 in Evolutionary Genetics: From Molecules to Morphology, edited by R. S. SINGH and C. B. KRIMBAS. Cambridge University Press, Cambridge, UK.
LANGLEY, C. H., J. MACDONALD, N. MIYASHITA, and M. AGUADE, 1993 Lack of correlation between interspecific divergence and intraspecific polymorphism at the suppressor of forked region in Drosophila melanogaster and Drosophila simulans.. Proc. Natl. Acad. Sci. USA 90:1800-1803.
MARUYAMA, T., 1974 The age of an allele in a finite population. Genet. Res. 23:137-143.[Medline]
MAYNARD SMITH, J. and J. HAIGH, 1974 The hitch-hiking effect of a favourable gene. Genet. Res. 23:23-35.[Medline]
MCVEAN, G. A. and B. CHARLESWORTH, 2000 The effects of Hill-Robertson interference between weakly selected mutations on patterns of molecular evolution and variation. Genetics 155:929-944.
MCVEAN, G. A. and J. VIEIRA, 2001 Inferring parameters of mutation, selection and demography from patterns of synonymous site evolution in Drosophila. Genetics. 157:245-257.
NACHMAN, M. W., 1997 Patterns of DNA variability at X-linked loci in Mus domesticus.. Genetics 147:1303-1316.[Abstract]
NACHMAN, M. W., 2001 Single nucleotide polymorphisms and recombination rate in humans. Trends Genet. 17:481-485.[Medline]
NACHMAN, M. W., V. L. BAUER, S. L. CROWELL, and C. F. AQUADRO, 1998 DNA variability and recombination rates at X-linked loci in humans. Genetics 150:1133-1141.
NEI, M., 1987 Molecular Evolutionary Genetics. Columbia University Press, New York.
PRZEWORSKI, M., 2002 The signature of positive selection at randomly chosen loci. Genetics 160:1179-1189.
SHARP, P. M., and A. T. LLOYD, 1993 Codon usage, pp. 378397 in An Atlas of Drosophila Genes: Sequences and Molecular Features, edited by G. MARONI. Oxford University Press, Oxford.
SILVERMAN, B. W., 1986 Density Estimation for Statistics and Data Analysis. Chapman & Hall, London.
SIMONSEN, K. L., G. A. CHURCHILL, and C. F. AQUADRO, 1995 Properties of statistical tests of neutrality for DNA polymorphism data. Genetics 141:413-429.[Abstract]
SMITH, N. G. C. and A. EYRE-WALKER, 2002 Adaptive protein evolution in Drosophila.. Nature 415:1021-1024.
SOKAL, R. R., and F. J. ROHLF, 1995 Biometry, Ed. 3. W. H. Freeman, New York.
WATTERSON, G. A., 1975 On the number of segregating sites in genetical models without recombination. Theor. Popul. Biol. 7:256-276.[Medline]
YANG, Z., S. KUMAR, and M. NEI, 1995 A new method of inference of ancestral nucleotide and amino acid sequences. Genetics 141:1641-1650.[Abstract]
This article has been cited by other articles:
![]() |
A. Sanchez-Gracia and J. Rozas Unusual Pattern of Nucleotide Sequence Variation at the OS-E and OS-F Genomic Regions of Drosophila simulans Genetics, April 1, 2007; 175(4): 1923 - 1935. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Nielsen, V. L. Bauer DuMont, M. J. Hubisz, and C. F. Aquadro Maximum Likelihood Estimation of Ancestral Codon Usage Bias Parameters in Drosophila Mol. Biol. Evol., January 1, 2007; 24(1): 228 - 235. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. D. Jones, A. W. Custer, and D. J. Begun Origin and Evolution of a Chimeric Fusion Gene in Drosophila subobscura, D. madeirensis and D. guanche Genetics, May 1, 2005; 170(1): 207 - 219. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Hermisson and P. S. Pennings Soft Sweeps: Molecular Population Genetics of Adaptation From Standing Genetic Variation Genetics, April 1, 2005; 169(4): 2335 - 2352. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. D. Kern, C. D. Jones, and D. J. Begun Molecular Population Genetics of Male Accessory Gland Proteins in the Drosophila simulans Complex Genetics, June 1, 2004; 167(2): 725 - 735. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. B. DuMont, J. C. Fay, P. P. Calabrese, and C. F. Aquadro DNA Variability and Divergence at the Notch Locus in Drosophila melanogaster and D. simulans: A Case of Accelerated Synonymous Site Divergence Genetics, May 1, 2004; 167(1): 171 - 185. [Abstract] [Full Text] [PDF] |
||||
- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Kern, A. D.
- Articles by Begun, D. J.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Kern, A. D.
- Articles by Begun, D. J.


), unpreferred (
), and preferred (
)] were used in this analysis. (A) Results from mei-218, which showed a significant reduction in variation around preferred fixations in our earlier analysis and is intermediate in average heterozygosity. (B) Results from Osbp, which was not significant for any class of fixation and is intermediate in average heterozygosity. (C) Results from ry, which was not significant for any class of fixation and had the highest average heterozygosity of the 26 genes studied.
