Genetics, Vol. 162, 1805-1810, December 2002, Copyright © 2002

The Evolution of Isochores: Evidence From SNP Frequency Distributions

Martin J. Lerchera, Nick G. C. Smithb, Adam Eyre-Walkerc, and Laurence D. Hursta
a Department of Biology and Biochemistry, University of Bath, Bath BA2 7AY, United Kingdom,
b Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, SE-752 36 Uppsala, Sweden
c Centre for the Study of Evolution and School of Biological Sciences, University of Sussex, Brighton BN1 9QG, United Kingdom

Corresponding author: Martin J. Lercher, University of Bath, Claverton Down, Bath, Somerset BA2 7AY, UK., m.j.lercher{at}bath.ac.uk (E-mail)

Communicating editor: J. HEY


*  ABSTRACT
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

The large-scale systematic variation in nucleotide composition along mammalian and avian genomes has been a focus of the debate between neutralist and selectionist views of molecular evolution. Here we test whether the compositional variation is due to mutation bias using two new tests, which do not assume compositional equilibrium. In the first test we assume a standard population genetics model, but in the second we make no assumptions about the underlying population genetics. We apply the tests to single-nucleotide polymorphism data from noncoding regions of the human genome. Both models of neutral mutation bias fit the frequency distributions of SNPs segregating in low- and medium-GC-content regions of the genome adequately, although both suggest compositional nonequilibrium. However, neither model fits the frequency distribution of SNPs from the high-GC-content regions. In contrast, a simple population genetics model that incorporates selection or biased gene conversion cannot be rejected. The results suggest that mutation biases are not solely responsible for the compositional biases found in noncoding regions.


BASE composition varies along mammalian chromosomes over hundreds of kilobases (BERNARDI 2000 Down; IHGSC 2001 Down), with the regions of similar composition referred to as "isochores." Why this might be has been a focus of much debate (EYRE-WALKER and HURST 2001 Down). The currently dominating model for isochore evolution suggests that they are the result of neutral evolution, with localized compositional differences being due to variation in the pattern of mutation (FILIPSKI 1987 Down; SUEOKA 1988 Down; WOLFE et al. 1989 Down; FRANCINO and OCHMAN 1999 Down).

Recent studies have tested the mutation bias hypothesis (that mammalian compositional variation is due to mutation bias variation alone) by considering single-nucleotide polymorphism (SNP) data (EYRE-WALKER 1999 Down; SMITH and EYRE-WALKER 2001 Down). If we consider biallelic sites, which are polymorphic for an A or a T nucleotide and a G or a C nucleotide, then we can define two types of polymorphisms, those generated by AT -> GC mutations and those generated by GC -> AT mutations (termed AT -> GC and GC -> AT polymorphisms, respectively). Under the mutation bias hypothesis it can be shown that the numbers of the two types of polymorphisms are expected to be equal irrespective of what the sequence composition is, so long as the sequences are at compositional equilibrium (EYRE-WALKER 1997 Down). EYRE-WALKER (1999) and SMITH and EYRE-WALKER (2001) have shown that there is an excess of GC -> AT polymorphisms at synonymous sites and introns in GC-rich mammalian protein-coding genes; this suggests that either selection or biased gene conversion affects synonymous base composition. Given that the base composition of synonymous sites and introns is highly correlated with the base composition of the isochore in which the gene resides (CLAY et al. 1996 Down), these results suggest that high-GC isochores are not generated by mutation bias.

However, these studies assume compositional equilibrium of the examined sequences, while base composition can change profoundly and systematically over evolutionary time (LOBRY 1997 Down; POWELL and MORIYAMA 1997 Down; BERNARDI 2000 Down; RODRIGUEZ-TRELLES et al. 2000 Down). In particular, recent comparisons of primate sequences suggest that human DNA is not at compositional equilibrium (DURET et al. 2002 Down). In this situation, it is not clear how to interpret the results of the above tests. Furthermore, these previous tests were restricted to protein-coding genes, and so a direct test of whether the base composition of noncoding DNA is solely a consequence of mutational processes is still lacking. Below, we examine SNP frequency distributions in noncoding DNA. In contrast to the previous studies that counted alleles, this test does not require compositional equilibrium (SAWYER et al. 1987 Down) and is compromised only by very recent changes in the mutation pattern.

A comparison of the frequency distributions was first used by SAWYER et al. 1987 Down to detect differences in the fitness effects of different classes of polymorphisms. Their work was partly based on that of EWENS 1972 Down, who first described the sampling theory of selectively neutral alleles. Related approaches were employed in several other studies that analyzed protein-coding genes to detect selection on amino acid composition (SAWYER and HARTL 1992 Down) or codon usage bias (AKASHI and SCHAEFFER 1997 Down; AKASHI 1999 Down; KLIMAN 1999 Down).


*  MATERIALS AND METHODS
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

We identified 2769 biallelic SNPs that were segregating an A or a T nucleotide and a G or a C nucleotide and that had been assayed for allele frequencies, in build 95 of the refSNP database (http://www.ncbi.nlm.nih.gov/SNP). From refSNP, we also obtained the surrounding sequence of each SNP and its location in relation to known or predicted genes (from contig annotation or BLASTing against mRNA in GenBank). To avoid sites that are under selection, we excluded all SNPs within 2 kb of a gene, resulting in 1504 SNPs. This also reduces the effect of selection on linked sites (CHARLESWORTH 1994 Down; FAY and WU 2000 Down). The remaining polymorphisms were classified according to the frequency of the A/T allele (divided into 10 polymorphism frequency classes of width 10%) and according to the GC content of the surrounding sequence (on average 400 bp in length) into three classes of equal SNP number (Table 1). The mean local GC contents of the three compositional classes were 34, 44, and 57%.


 
View this table:
In this window
In a new window

 
Table 1. Polymorphism frequency distributions of human nongenic SNPs in different GC classes, in total numbers of SNPs observed

We test the null hypothesis of neutrality under purely mutational forces by constructing a population genetics model for the expected distribution of A/T allele frequencies. To this end we have developed a model that is expected to be less sensitive to compositional nonequilibrium. We take as our starting point the classic formula (WRIGHT 1937 Down; SAWYER and HARTL 1992 Down) for the stationary distribution of polymorphism frequencies, D(x), with irreversible mutations (i.e., the infinite-sites assumption) and directional selection,

(1)

where x is the frequency of the new allele; S = 4Nes, where Ne is the effective population size and s is the fixation bias favoring the new allele (i.e., the selective coefficient if selection acts on GC content and a neutral bias in the case of biased gene conversion); and c is a scaling factor. Below, we test two alternative models: when considering mutation bias as the driving force of local GC composition, we set S = 0; when considering fixation bias (from selection or biased gene conversion), we allow S to vary.

The SNPs in our data set were sampled in a two-stage process: before being assayed for allele frequency in a large number of chromosomes, the SNPs were originally discovered in samples of much fewer chromosomes. For any given SNP with known allele frequency x, the probability of detecting both alleles in an original sample of n chromosomes is

The second stage of allele frequency determination can be modeled by a binomial sampling formula (see SAWYER and HARTL 1992 Down). Although we know the sample sizes used to determine the SNPs in our data set, there is considerable variation in these values, and so it is preferable to model the sampling process with parameters fitted from the data (for a justification, see the end of this section). In practice, there is little need to model the second sampling stage since the final polymorphism distribution is much more sensitive to the sample size of the first stage than to the sample size of the second stage. Therefore, we can assume that the number of chromosomes used to estimate allele frequencies is effectively infinite (the qualitative results presented here were unchanged if both stages of the sampling process were modeled and treated as parameters to be fitted from the data; see also below) and models just the primary stage of the sampling process. Thus we can estimate the distribution of polymorphism frequency by integrating the product of D(x, S) and P(x, n).

Although the formula for D(x, S) assumes that the distribution of polymorphism frequencies is stationary, this is a considerably less stringent assumption than the assumption of compositional stationarity used in previous tests of compositional neutrality. The test using the numbers of AT -> GC and GC -> AT polymorphisms is sensitive to changes in the mutation bias over a timescale of roughly 1/u, where u is the mutation rate (EYRE-WALKER 1997 Down). Taking u in humans as 10-8 mutations per base per generation (DRAKE et al. 1998 Down), we have a timescale of roughly 100 million generations over which the mutation pattern is required to have remained constant. Although the polymorphism frequency distribution is not completely unaffected by changes in the mutation pattern, the mutation pattern is required to have remained constant for a much shorter time than when the numbers of polymorphisms are considered. At equilibrium, the average age of neutral polymorphisms is ~4Ne generations, with a standard deviation of the same order of magnitude (KIMURA and OHTA 1973 Down; KIMURA 1983 Down). In humans it has been estimated that Ne is ~10,000 (A. EYRE-WALKER, P. D. KEIGHTLEY and N. G. C. SMITH, unpublished data) and the generation time is ~25 years. Thus, the polymorphism frequency test is compromised only by changes in the mutation bias over a timescale of 1–2 million years.

To achieve this increased robustness of our model against compositional nonequilibrium, we have to allow the proportions of GC -> AT and AT -> GC mutations to vary (under a neutral mutation model the proportions are equal when the composition is at equilibrium, even if the GC -> AT and AT -> GC mutation rates are not equal). We can then calculate the expected distribution of A/T polymorphism frequencies by combining the formulas for the GC -> AT and AT -> GC mutations. Thus the expected number of polymorphisms for which the A/T allele frequency is between x1 and x2, E(x1, x2), is given by

(2)

where T is the total number of polymorphisms, PGC->AT and PAT->GC are the proportions of GC -> AT and AT -> GC mutations, respectively (PGC->AT + PAT->GC = 1), and S is the fixation bias of GC over AT alleles. Note that the frequency distributions D(x, -S) and D(x, S) are not normalized in Equation 2, but rather the entire distribution of expected polymorphisms is normalized so that it sums to T.

We estimated S, n, and PGC->AT by finding those parameters that minimize the value of the G test statistic, summed over the 10 polymorphism frequency classes of 10%,

(3)

where Oj and Ej are the observed and expected numbers of polymorphisms in the jth frequency class. The significance of the minimum G value is tested by approximation to the chi-square distribution, with the numbers of degrees of freedom given by nine minus the number of parameters estimated from the data.

The explicit population genetics model described so far makes a number of evolutionary assumptions: constant mutation rates, constant population size, no population subdivision, unbiased sampling, and no linkage between polymorphic sites and sites under positive or negative selection. We can drastically simplify our assumptions by generalizing our null hypothesis, assuming only that the frequency distributions of AT -> GC and GC -> AT polymorphisms are identical. Under a null hypothesis of purely mutational bias, this will be true if the mutational pattern was constant over the last 4Ne generations. This general model can be implemented by minimizing Equation 3 over the space of all possible distributions D(x, 0). We approximate this procedure by replacing D(x, 0) with three "arbitrary" monotonically decreasing functions, e-zx, (e-z1x + e-z2x), and x-z, where z, z1, and z2 are positive numbers that describe the shape of the distribution.

We first performed a preliminary study to justify our assumption of infinite sample size n2 for the second sampling stage (the allele frequency assay, see above). We analyzed the polymorphism frequency data summed across all GC classes (see Table 1) in an explicit purely mutational model (S = 0). This model (SAWYER and HARTL 1992 Down) uses both primary and assay sample sizes, n and n2, whereas the simpler model in Equation 2 requires only n. We found that the fit to data improves with increasing n2: G decreases from 21.5 at n2 = 100 to 16.4 at n2 = 500 (G = 15.4 for infinite n2). Accordingly, the parameter estimates approached the infinite sample size estimates (data not shown). Thus we are justified in using the simplified model in Equation 2 in the analyses below.

We then performed simulations to test if a mixture of SNPs that were detected in samples of varying sizes n can be adequately described by a single "mean" sample size n0. As an extreme case, we analyzed two samples of 251 SNPs each, with n = 2 and n = 20, respectively. We calculated the predicted allele frequency distributions from Equation 2, with S = 0 and PGC->AT = PAT->GC = 0.5, with added Poisson noise. We then used Equation 2 with a single parameter n0 to fit the summed frequency data of these two samples. When minimizing G (Equation 3) by varying PAT->GC and n0, we could reject our simplified "mean n0" model at P = 0.05 for only 5.1% of simulated data sets. This suggests that fitting a mean value for n in Equation 2 is a valid approximation.


*  RESULTS
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

Although our population genetics model for fitting polymorphism frequency distributions can incorporate selection or biased gene conversion (i.e., a directional fixation bias; see MATERIALS AND METHODS), our first aim is to see whether a neutral mutation model (S = 0) is capable of explaining the SNP data. We have two parameters to fit for the neutral mutation model: the primary sample size, n, and the proportion of GC -> AT mutations, PGC->AT (see MATERIALS AND METHODS). The neutral mutation model cannot be rejected for SNPs from regions with low and intermediate GC content (P = 0.09 and P = 0.78, respectively; see Fig 1 and Table 2). It is interesting to note that there is substantial evidence of compositional nonequilibrium, with an AT -> GC bias in low GC regions (PGC->AT = 0.46) and a GC -> AT bias in intermediate GC regions (PGC->AT = 0.58).



View larger version (18K):
In this window
In a new window
Download PPT slide
 
Figure 1. Explicit neutral mutation model fit between observed and expected numbers of polymorphisms in regions of low and intermediate GC. Low A/T allele frequencies correspond mostly to ancestral G/C sites, high frequencies to ancestral A/T sites.


 
View this table:
In this window
In a new window

 
Table 2. Summary of goodness-of-fit tests to the observed frequency distribution data using the two explicit models

In contrast, the neutral mutation model provides a poor fit to SNPs from regions of the genome with high GC content (P = 0.014, see Table 2). In Fig 2 we compare the observed data and the data expected on the basis of the neutral mutation model. The failure of the neutral mutation model seems to be due to the combination of the large excess of low-frequency (0–0.1) A/T alleles and the flat shape of the polymorphism frequency distribution at high A/T frequencies (0.7–1).



View larger version (19K):
In this window
In a new window
Download PPT slide
 
Figure 2. Comparison between observed and expected numbers of polymorphisms in regions of high GC, with predictions by the general mutation bias model and explicit mutation and fixation bias models. Low A/T allele frequencies correspond mostly to ancestral G/C sites, high frequencies to ancestral A/T sites.

This effect is not likely to be due to CpG hypermutability. We tested this by applying our neutral mutation model to the high GC SNP data after removal of all polymorphisms that may have been generated by CpG mutations (CpG -> TpG or CpG -> CpA). Upon removal of such mutations we find that the rejection of the explicit neutral mutation model is only marginally significant (P = 0.054, see Table 3). However, we do not consider this reduction in statistical support as strong evidence for an effect of CpG mutations, as it is most likely due to the decrease in sample size. To check this, we simulated 1000 data sets in which the high GC data were randomly discarded to generate a data set the same size as the high GC minus CpG data set. In 413 cases the resultant G value was lower than that obtained using the high GC minus CpG data set. These simulations indicate that it is worth addressing alternative explanations of the high GC SNP data.


 
View this table:
In this window
In a new window

 
Table 3. Summary of goodness-of-fit tests to the observed frequency distribution data using the general model

The differences between observed and expected high GC SNP frequency distributions appear consistent with the action of a directional fixation bias, i.e., natural selection or biased gene conversion (NAGYLAKI 1983 Down), acting in favor of high GC. In this model, GC -> AT mutations are preferentially removed and AT -> GC polymorphisms are preferentially retained (in our notation such fixation bias is equivalent to S > 0). We fitted the high GC SNP data using a three-parameter model, varying n, PAT->GC, and S. Upon the addition of the fixation bias parameter S we find a significant improvement in fit ({Delta}G = 6.8 and P = 0.009; see Fig 2). The estimated value of S implies selection in favor of G/C alleles, which results in a mutation bias in favor of A/T because the GC content is elevated above its mutational equilibrium.

Our explicit model incorporates a number of assumptions that may in fact not be met by the data. Population size changes, population subdivision, biased sampling, or selection on linked sites (CHARLESWORTH 1994 Down; FAY and WU 2000 Down) could all be responsible for discrepancies between observed and expected frequencies. However, we can circumvent these problems by generalizing our null hypothesis of neutral mutation bias. We drop all model assumptions and replace them by a simple requirement that AT -> GC and GC -> AT polymorphisms have the same frequency distributions. This will be satisfied under very general conditions, assuming only the stationarity of the allele frequency distributions and the absence of fixation bias. To compare the frequency distributions, we modeled D(f) using three different monotonically decreasing functions (see MATERIALS AND METHODS). Analyses using all three distributions gave very similar answers and so we present only results using D(f) = e-z f, where z > 0 is the fitted parameter. Table 3 presents the results for each GC content category. In agreement with our results from the explicit population genetics model, this general model of mutational bias provides an adequate fit to the data for SNPs from regions with low and intermediate GC content (P = 0.08 and P = 0.52, respectively). In contrast, the model gives a poor fit to SNPs from regions of the genome with high GC content (P = 0.008; see Fig 2). Again, the discrepancies between observed and expected distributions are consistent with the action of a fixation bias acting in favor of high GC.


*  DISCUSSION
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

In the above analysis, we have used two models: an explicit population genetics model and a general model avoiding any assumptions about the shape of the frequency distribution. The explicit model allows a detailed analysis of mutational and selective parameters and facilitates a direct comparison of models of mutation and/or fixation bias. However, it is built on a number of population genetical assumptions, which may not be met by our data. The general model, which directly compares the observed frequency distributions, does not give a quantitative description of the processes shaping composition. However, it is built on less stringent assumptions about the population history and is thus more robust.

Both explicit and general models of neutral mutation bias cannot be rejected for low and intermediate GC SNPs, but can be rejected on the basis of their failure to fit the high GC SNP frequency distribution (P = 0.014 and 0.008, respectively). Under the explicit model, we tested fixation bias (i.e., selection or biased gene conversion) as an explanation for the failure of the neutral mutation model. This explicit model suggests that there is selection/biased gene conversion in favor of GC, but that this is counteracted by mutation bias.

A possibility for the failure of our neutral mutation model is recent compositional nonequilibrium. Both the explicit population genetics model and the general model allow for ancient but not recent changes in base composition: the polymorphism frequency distribution needs to be at equilibrium, whereas the base composition at fixed sites need not be at equilibrium (see MATERIALS AND METHODS). Although the activity of transposable elements appears to have been low in the recent past of the human genome (IHGSC 2001 Down), other processes of compositional change may be ongoing. As an example of changes in genomic compositional structure, consider the GC homogenization process, which has affected murid genomes (GALTIER and MOUCHIROUD 1998 Down). If such a process started very recently in the human genome (in the last 1,000,000 years, see MATERIALS AND METHODS), then regions of high GC content will be decreasing in GC, and so there will be an excess of low-frequency GC -> AT mutations.

The data analyzed are polymorphism counts in frequency classes. Regardless of the details of population history, these numbers are approximately Poisson distributed as long as there is no linkage between polymorphisms (SAWYER et al. 1987 Down). To see this, imagine an infinite sequence, with a fraction f(x, y) of sites showing a polymorphism with minor allele frequency between x and y. If we repeatedly draw samples of length L, the numbers of polymorphisms in these samples will be binomially distributed with expectation L x f(x, y); with small f(x, y) this approximates a Poisson distribution. Thus, our statistical approach is justified regardless of the details of population history, so long as recombination between the polymorphisms in our data is free. How sure can we be that this is indeed the case? Unfortunately, available genetic maps for humans lack the precision necessary to answer this question reliably. However, when we analyze the polymorphisms with known position (N = 823) in each GC class, we find that they are on average 3.6 Mb (median 58 kb) away from their nearest neighbor. With 1 Mb corresponding approximately to 1.3 cM (YU et al. 2001 Down), the majority of neighboring sites will experience several recombinations between them over the course of 4Ne = 40,000 generations. This suggests that the sites comprising our data set are indeed unlinked to a reasonable approximation.

Our results have important consequences for previous studies of compositional neutrality (EYRE-WALKER 1997 Down, EYRE-WALKER 1999 Down; SMITH and EYRE-WALKER 2001 Down), which concluded that mutation bias was incapable of explaining high GC synonymous codon usage. The limited comparative data then available for synonymous sites suggested that the assumption of equilibrium was reasonable, but our results here with nongenic data indicate that changes in mutation bias need to be incorporated in tests of genomic compositional neutrality. Until such checks are performed, it is possible that previous inferences of synonymous and intronic compositional selection (or biased gene conversion) were caused by nonequilibrium composition. For example, if we assume equilibrium composition (PGC->AT = PAT->GC = 0.5), then the neutral mutation fit to the high GC SNP data would be strongly rejected even after removal of CpG mutations (P = 0.002; data not shown).


*  ACKNOWLEDGMENTS

We thank Laurent Duret for interesting discussions and two anonymous referees for helpful suggestions. We acknowledge support from The Wellcome Trust (M.J.L.), the Biotechnology and Biological Sciences Research Council (L.D.H. and A.E.-W.), and The Royal Society (A.E.-W.).

Manuscript received September 4, 2001; Accepted for publication September 3, 2002.


*  LITERATURE CITED
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

AKASHI, H., 1999  Inferring the fitness effects of DNA mutations from polymorphism and divergence data: statistical power to detect directional selection under stationarity and free recombination. Genetics 151:221-238.[Abstract/Free Full Text]

AKASHI, H. and S. W. SCHAEFFER, 1997  Natural selection and the frequency distributions of "silent" DNA polymorphism in Drosophila. Genetics 146:295-307.[Abstract]

BERNARDI, G., 2000  Isochores and the evolutionary genomics of vertebrates. Gene 241:3-17.[Medline]

CHARLESWORTH, B., 1994  The effect of background selection against deleterious mutations on weakly selected, linked variants. Genet. Res. 63:213-227.[Medline]

CLAY, O., S. CACCIO, S. ZOUBAK, D. MOUCHIROUD, and G. BERNARDI, 1996  Human coding and noncoding DNA: compositional correlations. Mol. Phylogenet. Evol. 5:2-12.[Medline]

DRAKE, J. W., B. CHARLESWORTH, D. CHARLESWORTH, and J. F. CROW, 1998  Rates of spontaneous mutation. Genetics 148:1667-1686.[Abstract/Free Full Text]

DURET, L., M. SEMON, G. PIGANEAU, D. MOUCHIROUD, and N. GALTIER, 2002  Vanishing GC-rich isochores in mammalian genomes. Genetics 162:1837-1847.[Abstract/Free Full Text]

EWENS, W. J., 1972  The sampling theory of selectively neutral alleles. Theor. Popul. Biol. 3:87-112.[Medline]

EYRE-WALKER, A., 1997  Differentiating between selection and mutation bias. Genetics 147:1983-1987.[Medline]

EYRE-WALKER, A., 1999  Evidence of selection on silent site base composition in mammals: potential implications for the evolution of isochores and junk DNA. Genetics 152:675-683.[Abstract/Free Full Text]

EYRE-WALKER, A. and L. D. HURST, 2001  The evolution of isochores. Nat. Rev. Genet. 2:549-555.[Medline]

FAY, J. C. and C.-I WU, 2000  Hitchhiking under positive Darwinian selection. Genetics 155:1405-1413.[Abstract/Free Full Text]

FILIPSKI, J., 1987  Correlation between molecular clock ticking, codon usage, fidelity of DNA-repair, chromosome-banding and chromatin compactness in germline cells. FEBS Lett. 217:184-186.[Medline]

FRANCINO, H. P. and H. OCHMAN, 1999  Isochores result from mutation not selection. Nature 400:30-31.[Medline]

GALTIER, N. and D. MOUCHIROUD, 1998  Isochore evolution in mammals: a human-like ancestral structure. Genetics 150:1577-1584.[Abstract/Free Full Text]

Initial sequencing and analysis of the human genome. (2001) Nature 409:860-921.[Medline]

KIMURA, M., 1983 The Neutral Theory of Evolution. Cambridge University Press, Cambridge, UK.

KIMURA, M. and T. OHTA, 1973  The age of a neutral mutant persisting in a finite population. Genetics 75:199-212.[Abstract/Free Full Text]

KLIMAN, R. M., 1999  Recent selection on synonymous codon usage in Drosophila. J. Mol. Evol. 49:343-351.[Medline]

LOBRY, J. R., 1997  Influence of genomic G+C content on average amino-acid composition of proteins from 59 bacterial species. Gene 205:309-316.[Medline]

NAGYLAKI, T., 1983  Evolution of a finite population under gene conversion. Proc. Natl. Acad. Sci. USA 80:6278-6281.[Abstract/Free Full Text]

POWELL, J. R. and E. N. MORIYAMA, 1997  Evolution of codon usage bias in Drosophila. Proc. Natl. Acad. Sci. USA 94:7784-7790.[Abstract/Free Full Text]

RODRIGUEZ-TRELLES, F., R. TARRIO, and F. J. AYALA, 2000  Evidence for a high ancestral GC content in Drosophila. Mol. Biol. Evol. 17:1710-1717.[Abstract/Free Full Text]

SAWYER, S. A. and D. L. HARTL, 1992  Population genetics of polymorphism and divergence. Genetics 132:1161-1176.[Abstract]

SAWYER, S. A., D. E. DYKHUIZEN, and D. L. HARTL, 1987  Confidence interval for the number of selectively neutral amino acid polymorphisms. Proc. Natl. Acad. Sci. USA 84:6225-6228.[Abstract/Free Full Text]

SMITH, N. G. C. and A. EYRE-WALKER, 2001  Synonymous codon bias is not caused by mutation bias in G+C-rich genes in humans. Mol. Biol. Evol. 18:982-986.[Abstract/Free Full Text]

SUEOKA, N., 1988  Directional mutation pressure and neutral molecular evolution. Proc. Natl. Acad. Sci. USA 85:2653-2657.[Abstract/Free Full Text]

WOLFE, K. H., P. M. SHARP, and W.-H. LI, 1989  Mutation rates differ among regions of the mammalian genome. Nature 337:283-285.[Medline]

WRIGHT, S., 1937  The distribution of gene frequencies in populations. Proc. Natl. Acad. Sci. USA 23:307-320.[Free Full Text]

YU, A., C. ZHAO, Y. FAN, W. JANG, and A. J. MUNGALL et al., 2001  Comparison of human genetic and sequence-based physical maps. Nature 409:951-953.[Medline]




This article has been cited by other articles:


Home page
GeneticsHome page
M. M. Desai and J. B. Plotkin
The Polymorphism Frequency Spectrum of Finitely Many Sites Under Selection
Genetics, December 1, 2008; 180(4): 2175 - 2191.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
M. Reuter, J. Engelstadter, P. Fontanillas, and L. D. Hurst
A Test of the Null Model for 5' UTR Evolution Based on GC Content
Mol. Biol. Evol., May 1, 2008; 25(5): 801 - 804.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
J. E. Karro, M. Peifer, R. C. Hardison, M. Kollmann, and H. H. von Grunberg
Exponential Decay of GC Content Detected by Strand-Symmetric Substitution Rates Influences the Evolution of Isochore Structure
Mol. Biol. Evol., February 1, 2008; 25(2): 362 - 374.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
T. R. Dreszer, G. D. Wall, D. Haussler, and K. S. Pollard
Biased clustered substitutions in the human genome: The footprints of male-driven biased gene conversion
Genome Res., October 1, 2007; 17(10): 1420 - 1430.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
R. D. Hernandez, S. H. Williamson, L. Zhu, and C. D. Bustamante
Context-Dependent Mutation Rates May Cause Spurious Signatures of a Fixation Bias Favoring Higher GC-Content in Humans
Mol. Biol. Evol., October 1, 2007; 24(10): 2196 - 2202.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
C. Schmegner, J. Hoegel, W. Vogel, and G. Assum
The Rate, Not the Spectrum, of Base Pair Substitutions Changes at a GC-Content Transition in the Human NF1 Gene Region: Implications for the Evolution of the Mammalian Genome Structure
Genetics, January 1, 2007; 175(1): 421 - 428.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
M. T. Webster, E. Axelsson, and H. Ellegren
Strong Regional Biases in Nucleotide Substitution in the Chicken Genome
Mol. Biol. Evol., June 1, 2006; 23(6): 1203 - 1216.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
N. Galtier, E. Bazin, and N. Bierne
GC-Biased Segregation of Noncoding Polymorphisms in Drosophila
Genetics, January 1, 2006; 172(1): 221 - 228.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
E. S. Balakirev, V. R. Chechetkin, V. V. Lobzin, and F. J. Ayala
Entropy and GC Content in the {beta}-esterase Gene Cluster of the Drosophila melanogaster Subgroup
Mol. Biol. Evol., October 1, 2005; 22(10): 2063 - 2072.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
I. Ebersberger and M. Meyer
A Genomic Region Evolving Toward Different GC Contents in Humans and Chimpanzees Indicates a Recent and Regionally Limited Shift in the Mutation Pattern
Mol. Biol. Evol., May 1, 2005; 22(5): 1240 - 1245.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
A. E. Vinogradov
Noncoding DNA, isochores and gene expression: nucleosome formation potential
Nucleic Acids Res., January 26, 2005; 33(2): 559 - 563.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
E. Bazin, L. Duret, S. Penel, and N. Galtier
Polymorphix: a sequence polymorphism database
Nucleic Acids Res., January 1, 2005; 33(suppl_1): D481 - D484.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
M. J. Lercher, J.-V. Chamary, and L. D. Hurst
Genomic Regionality in Rates of Evolution Is Not Explained by Clustering of Genes of Comparable Expression Profile
Genome Res., June 1, 2004; 14(6): 1002 - 1013.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
A. E. Vinogradov
Isochores and tissue-specificity
Nucleic Acids Res., September 1, 2003; 31(17): 5212 - 5220.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
A. E. Vinogradov
DNA helix: the importance of being GC-rich
Nucleic Acids Res., April 1, 2003; 31(7): 1838 - 1844.
[Abstract] [Full Text] [PDF]