- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Data Supplement
-
All Versions of this Article:
genetics.108.087122v1
179/2/907 most recent - Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Martin, G.
- Articles by Lenormand, T.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Martin, G.
- Articles by Lenormand, T.
Originally published as Genetics Published Articles Ahead of Print on May 27, 2008.
Genetics, Vol. 179, 907-916, June 2008, Copyright © 2008
doi:10.1534/genetics.108.087122
The Distribution of Beneficial and Fixed Mutation Fitness Effects Close to an Optimum
Guillaume Martin1 and Thomas Lenormand
Centre d'Ecologie Fonctionnelle et Evolutive, UMR CNRS 5175, 34295 Montpellier, France
1 Corresponding author: Centre d'Ecologie Fonctionnelle et Evolutive, UMR CNRS 5175, 1919 Rte. de Mende, 34295 Montpellier, France.
E-mail: guillaume.martin{at}cefe.cnrs.fr
>ABSTRACT
MODEL
RESULTS
DISCUSSION
APPENDIX A: EXACT DISTRIBUTION...
APPENDIX B: TAIL BEHAVIOR...
APPENDIX C: APPROXIMATE...
ACKNOWLEDGEMENTS
LITERATURE CITED
The distribution of the selection coefficients of beneficial mutations is pivotal to the study of the adaptive process, both at the organismal level (theories of adaptation) and at the gene level (molecular evolution). A now famous result of extreme value theory states that this distribution is an exponential, at least when considering a well-adapted wild type. However, this prediction could be inaccurate under selection for an optimum (because fitness effect distributions have a finite right tail in this case). In this article, we derive the distribution of beneficial mutation effects under a general model of stabilizing selection, with arbitrary selective and mutational covariance between a finite set of traits. We assume a well-adapted wild type, thus taking advantage of the robustness of tail behaviors, as in extreme value theory. We show that, under these general conditions, both beneficial mutation effects and fixed effects (mutations escaping drift loss) are beta distributed. In both cases, the parameters have explicit biological meaning and are empirically measurable; their variation through time can also be predicted. We retrieve the classic exponential distribution as a subcase of the beta when there are a moderate to large number of weakly correlated traits under selection. In this case too, we provide an explicit biological interpretation of the parameters of the distribution. We show by simulations that these conclusions are fairly robust to a lower adaptation of the wild type and discuss the relevance of our findings in the context of adaptation theories and experimental evolution.
UNDERSTANDING the distribution of fitness effects of beneficial mutation [hereafter fb(sb)] is necessary to predict the rate and genetic basis of adaptation (ORR 1998). It is also important to calibrate models of molecular evolution where positive selection is involved (EYRE-WALKER 2006) or to study processes involving the segregation of several beneficial mutations, like clonal interference (GERRISH and LENSKI 1998). So far, this distribution has been studied along two directions (for a historical review see ORR 2005a). The first is based on FISHER's (1930) geometric model of adaptation, while the second uses GILLESPIE's (1984) mutational landscape model. These two models differ in their basic assumptions, and each has its own limitations (discussed in ORR 2005b). Fisher's model (FM) considers stabilizing selection around an optimum in an n-dimensional phenotypic space and focuses on the fitness effect of random phenotypic changes. The mutational landscape model (MLM) directly focuses on the effect of single-nucleotide substitutions on fitness. The strength of the FM is to predict the full distribution of fitness effects of mutations [hereafter f(s)], including both deleterious and beneficial mutations and their respective proportions, which depends on the level of adaptation of the wild type (distance to the optimum). The MLM is less general in that it considers only beneficial mutation, but its strength is to avoid explicit assumptions on the phenotype-to-fitness map inherent to the FM. This is made possible when beneficial mutations can be considered drawn from the extreme right tail of f(s). In this case indeed, extreme value theory can be used to predict the (unique) limiting distribution of extreme draws, i.e., fb(sb). Importantly, this is robust to a wide range of f(s) (GILLESPIE 1984; ORR 2002). A now famous and remarkably simple prediction of this theory is that fb(sb) should be exponential (ORR 2003). This finding has, since then, been widely used (GERRISH and LENSKI 1998; WILKE 2004; PARK and KRUG 2007). Note that this is not the same as the distribution of effects fixed over a bout of adaptation, which is also predicted to be exponential (ORR 1998). In this article, we determine fb(sb) under a general model of stabilizing selection, based on an extension of the FM (MARTIN and LENORMAND 2006b), but we study our model in the same biological conditions as assumed in the MLM, thus allowing the use of extreme value theory in this context. We show that under this general model, the exponential approximation for fb(sb) can be substantially inaccurate unless there are a large number of weakly correlated traits under selection. We provide an alternative, in terms of a beta distribution, that includes the exponential as a limiting case. Before presenting these results, we first discuss the limit of the MLM and classic FM approaches to predict fb(sb).
The MLM approach has provided simple, robust, and testable conclusions, but, as any model, it has limits. First, the MLM is by construction valid only when the wild type is well adapted to its environment, so that mutations with fitness above the wild type's are indeed drawn from the rightmost tail of f(s) (GILLESPIE 1984); this may not be the case in a new environment. Second, the MLM makes explicit assumptions on the genetic basis of adaptation (single-nucleotide substitutions with equal probability of occurrence). Although it is often seen as more realistic, these genetic assumptions may also limit the scope of the theory. Indeed, even in a simple situation involving only point mutations, corrections were required to compare empirical data to MLM theory because of differences in transition vs. transversion rates (ROKYTA et al. 2005). Third, even when considering well-adapted wild types, the MLM is not robust to any f(s). Only the so-called Gumbel types of distributions, characterized by an exponential-like tail (ORR 2002; BEISEL et al. 2007), are consistent with the historical models by Gillespie and Orr. There are in fact three possible "domains of attraction" determining extreme values behavior: the Gumbel type discussed above, the Fréchet type, for heavily tailed distributions, and the Weibull type, for distributions that have a rightmost endpoint. There is no obvious reason to prefer one type over the others (discussed in BEISEL et al. 2007). Fourth, the MLM does not provide any prediction on how the distribution of mutant fitnesses should change over several generations, as the population adapts to its environment. In the classic formulation, the effect of adaptation is only to shift the wild type to higher and higher fitness ranks in an otherwise constant distribution of mutant fitnesses. By contrast, the FM is free of these limits but at the cost of explicit assumptions on the genotype-to-phenotype-to-fitness map, which could be unrealistic.
Overall, from an empirical point of view, it has proved difficult to validate predictions on beneficial mutation effects so far, because they are often rare. The exponential distribution appears to give a reasonable but still imperfect fit to empirical distributions of beneficial effects (ROKYTA et al. 2005; KASSEN and BATAILLON 2006;). From a more statistical point of view, no alternative theoretical fb(sb) had been proposed until recently, when a statistical framework was proposed to test alternative predictions, all stemming from extreme value theory (BEISEL et al. 2007). Overall, empirical studies so far could neither clearly accept nor reject the predictions of the MLM, so that theoretical arguments may help settle the issue. In particular, it would be important that FM and MLM approaches yield consistent results. We now turn to this question.
In a recent article, ORR (2006) sought to bridge the gap between the FM and the MLM and showed that they share strong similarities regarding fb(sb). Indeed, under the classic Fisher model, the distribution of fitness effects of single mutations is close to Gaussian, which pertains to the domain of attraction of the Gumbel-type extreme value distribution assumed in the MLM. This property ensures that the derivations of the MLM are at least approximately valid under the biological assumptions of the FM, which reinforces the view that fb(sb) should indeed be exponential. However, this conclusion should be taken with caution. First, under the FM, f(s) is necessarily bounded on the right: there is no better mutation than the one bringing the phenotype at the optimum. As we have seen, distributions that are bounded on the right pertain to the Weibull domain of attraction, not to the Gumbel, and this will be true of any model of selection for an optimum. ORR (2006) mentioned this problem and showed that the exponential approximation could nevertheless be accurate under the FM, provided that there are a large number of equivalent and independent traits affected by pleiotropic mutation and selection. However, (i) selective and mutational independence of the traits is often considered biologically unrealistic (ORR 2005b), and (ii) the number of traits affected by mutation may be limited, at least when considering single genes, as is usual in molecular evolution. Overall, the assumption of a large number of independent traits leads to an approximately Gaussian f(s) (simply by the central limit theorem) whereas reviews of empirical f(s) show that they are better approximated by a skewed gamma distribution when only deleterious mutations are observed (MARTIN and LENORMAND 2006b; EYRE-WALKER and KEIGHTLEY 2007). Because it is this Gaussian f(s) that leads to an approximately exponential fb(sb) in the FM (ORR 2006), the exponential result may not be robust if traits are fewer and correlated. Indeed, recent models have shown that nonequivalence between traits in the FM could substantially affect the predictions of the FM (WAXMAN and WELCH 2005; MARTIN and LENORMAND 2006b).
In this article, we apply extreme value theory to a general model of selection for an arbitrary optimum, where mutations have pleiotropic effects on an arbitrary number of potentially inequivalent and correlated traits. In this case, fb(sb) belongs to the Weibull, rather than the Gumbel domain of attraction. Using a recent approximation for the tail behavior of quadratic forms in Gaussian vectors (JASCHKE et al. 2004), we show that beneficial effects are approximately beta distributed provided the wild type is relatively well adapted (as assumed in the MLM). This result is based on tail approximations (similar to the extreme value theory approach) and is robust to any continuous phenotype-to-fitness function close to an optimum (contrary to the classic FM). Our conclusions are checked using exact simulations, which show that the tail approximation yields surprisingly accurate results, even away from the tail. We discuss our results and compare them with results from the MLM.
ABSTRACT
>MODEL
RESULTS
DISCUSSION
APPENDIX A: EXACT DISTRIBUTION...
APPENDIX B: TAIL BEHAVIOR...
APPENDIX C: APPROXIMATE...
ACKNOWLEDGEMENTS
LITERATURE CITED
f(s) under arbitrary stabilizing selection:
We consider an extension of the FM that has been detailed previously (MARTIN and LENORMAND 2006b). The fitness W(z) of a phenotype z (of arbitrary dimension n, the number of phenotypic traits under selection) is a multivariate Gaussian function of z,
), where superscript T denotes transposition, and S is an arbitrary positive semidefinite matrix of selective interactions between phenotypic traits. This assumption is justified when close to the optimum, as many continuous fitness functions around a single optimum can be approximated by a Gaussian function close to that optimum (LANDE 1979). This does not preclude the existence of other optima, but it does assume that they are too remote from the mutant "cloud" around the wild type to influence f(s). We consider an initial genotype (or wild type), with phenotype zo [and fitness W(zo) = Wo]. The distribution of mutant phenotypes (dz) around zo is assumed to be multivariate Gaussian with mean 0 and arbitrary (positive semidefinite) covariance matrix M. Again, this assumption is not as restrictive as it seems; what is required in fact is that there exists a set of trait definitions for which their mutational effect distribution is Gaussian (MARTIN and LENORMAND 2006b), which requires only that the distribution of mutant effects on the original traits be continuous, unimodal, and approximately centered on the wild type.
This model is a quite general description of stabilizing selection, when not too far from the optimum, and with universal pleiotropy of mutations (mutations affect all traits simultaneously). Contrary to the classic (isotropic) Fisher model, it allows for differences and correlations between traits for both mutation and selection. Following assumptions of the MLM, we assume that the wild type is well adapted (close to the optimum), so that beneficial mutation effects are small. Consequently, the selection coefficient of a beneficial mutant relative to the wild type (s = W/Wo – 1) is approximately equal to the log-relative fitness: s
log(1 + s) = log(W/Wo). Therefore, s is approximately a quadratic function of mutational phenotypic effects dz (MARTIN and LENORMAND 2006b), i.e., a quadratic form in Gaussian vectors (MATHAI and PROVOST 1992).
Beyond their mathematical convenience or robustness, these assumptions are supported by data: the model seems to correctly account for the variation of empirical distributions of mutational fitness effects across species (MARTIN and LENORMAND 2006b), across environments (MARTIN and LENORMAND 2006a), and among mutations (fitness epistasis; MARTIN et al. 2007). Under these assumptions, the probability density function (pdf) of s, f(s), is entirely determined by the n eigenvalues of the matrix product S.M and the position zo of the initial phenotype relative to the optimum (APPENDIX A; Equation A2 in MARTIN and LENORMAND 2006b). There is no analytic expression for f(s) in the general case, but it can be approximated by a displaced gamma distribution (JASCHKE et al. 2004; MARTIN and LENORMAND 2006b), as illustrated in APPENDIX A.
Importantly, the distribution of s is bounded on its rightmost endpoint by so = log(Wmax/Wo) that is the selection coefficient of the individual with optimal phenotype [with fitness W(0) = Wmax] relative to the wild type [with fitness W(zo) = Wo]. As we saw above, this kind of right-bounded distribution is inherent to any model of selection for an optimum.
Distribution of beneficial fitness effects close to the optimum:
A tail approximation for the distribution of quadratic forms in Gaussian vectors (such as s) has been derived recently (JASCHKE et al. 2004). When the wild type is well adapted (as so
0), a simple approximation can be deduced from this tail approximation, for the distribution fb(sb) of beneficial mutations (0 < sb < so), yielding
![]() | (1) |
min(rank(M), rank(S)), correlations among a large number of traits will result in a relatively small m, unless these correlations are weak.
Domain of attraction:
The beta distribution given in Equation 1 is an example of the so-called generalized Pareto distribution (GPD) that encompasses the three possible domains of attraction of extreme value theory (PICKANDS 1975). To use the classic formulation (used, e.g., in BEISEL et al. 2007), this beta distribution in Equation 1 is a GPD with location µ = 0, scale
= 2/m, and shape
= –2/m. As long as m is not infinitely large,
is negative so that f(s) falls into the Weibull domain of attraction. However, with an infinitely large m,
would be zero, and f(s) would fall into the Gumbel domain of attraction. This explains why the classic FM, which assumes a very large number of independent traits, is consistent with the MLM (ORR 2006) and close to a Gumbel-type distribution. Consistent with this, the cumulative distribution function of the beta distribution in Equation 1 converges to that of an exponential distribution when m is large:
![]() | (2) |
Overall, under our general model of selection for an optimum, and with a well-adapted wild type, we obtain an approximately exponential distribution of beneficial mutations only when there are sufficiently many independent and weakly correlated traits under selection. When a limited number of traits are affected by the mutational target under consideration (e.g., a single gene), or when there are many but strongly correlated traits, there is no reason to expect an exponential fb(sb). In these cases, one should use the full model (beta approximation) given in Equation 1. Because the exponential approximation is a limiting case of the beta approximation, the two behaviors may be easily compared statistically. We now turn to the study of the fitness effect distribution of those beneficial mutations that reach fixation.
The distribution of fitness effects among beneficial mutations escaping drift loss:
Not all beneficial mutations will fix in a population: even in an infinitely large population, most are lost soon after their appearance, due to the stochasticity of offspring number. From fb(sb), it is possible to derive the distribution of selection coefficients among those beneficial mutations that escape drift loss when they are still rare (i.e., those that reach fixation in a sexual population). From Equation 1, assuming a well-adapted wild type and a population not too small, we can use a weak selection approximation (
(s)
2s) for the fixation probability of beneficial mutations (HALDANE 1927; WHITLOCK 2000) and find another beta approximation for the distribution of fixed mutation effects sf:
![]() | (3) |
![]() | (4) |
![]() | (5) |
As expected, these two values increase with the wild-type maladaptation so and decrease with increased dimensionality m, which is a part of the "cost of complexity" defined by ORR (2000). The other part of this cost, not dealt with here, is the reduction of the fraction of beneficial mutations as m increases. When m is large, E(sf) and E(sb) are
4so/m (fixed effects) and 2so/m (beneficial effects), respectively, which converges to the results derived from the exponential approximation (see APPENDIX C).
Simulations:
To check the accuracy of the above results, we simulated distributions of beneficial mutations in the Fisher model with correlated traits as in MARTIN and LENORMAND (2006b). We drew the mutational and selective covariance matrices (M and S) from n x n Wishart distributions Wp(n, I) (where I is the n x n identity matrix) so that the rank of S.M is m = min(p, n). M and S were then scaled to obtain an average deleterious effect of mutations
, where tr(.) denotes matrix trace. Then the phenotype of the wild-type zo was drawn as a Gaussian vector and scaled so that log(Wmax/Wo) = log(1/Wo) = so, for a given fitness distance to the optimum. Finally, for each single mutant, we drew a mutation effect vector dz from a multivariate Gaussian distribution N(0, M) and computed s as s = log(W(zo + dz)/W(zo)). The resulting distributions of s are illustrated in supplemental Figure 1.
We chose p to get a distribution of fitness effects (among all mutations) with a large skewness, as is typically observed in empirical studies (e.g., SANJUÀN et al. 2004). More precisely, a given distribution of deleterious s (when so = 0) corresponds to a given effective number of traits ne (depending on the magnitude of correlations in M and S) that determines the shape of f(s) (MARTIN and LENORMAND 2006b). With M and S drawn as Wishart deviates S, M
Wp(n, I), ne
n/(1 + 2n/p) (for details, see Appendix 2 in MARTIN and LENORMAND 2006b), so we chose p as the integer part of 2n · ne/(n – ne) to obtain a given ne and n. Figures 1–3![]()
show the same two examples corresponding to alternative levels of pleiotropy. In the low pleiotropy case, n = 4 and ne = 2.5, so that m = n = 4 (S and M are positive definite). In the high pleiotropy case (still keeping a small ne), n = 40 and ne = 4, so that m = p = 9 (S and M are positive semidefinite). Therefore, these two cases correspond to a lower (resp. higher) number of traits jointly affected by mutation and selection and to a lower (resp. higher) dimensionality m. They are denoted low (resp. high) pleiotropy in the figures.
|
|
|
From a set of 400,000 simulated single mutants we kept only those with fitness higher than the wild type as beneficial mutants. To compute the fitness effect distribution among mutants that escape drift loss, we computed the exact fixation probability Pfix of each of the nb beneficial mutants (with selection coefficient s) by numerically solving Pfix =
, according to HALDANE (1927). Then we sampled nb times the beneficial mutants according to their individual fixation probability Pfix. ABSTRACT
MODEL
>RESULTS
DISCUSSION
APPENDIX A: EXACT DISTRIBUTION...
APPENDIX B: TAIL BEHAVIOR...
APPENDIX C: APPROXIMATE...
ACKNOWLEDGEMENTS
LITERATURE CITED
Accuracy of the beta approximation for beneficial and fixed effects sb:
The beta approximation given in Equation 1 gives a very good fit to the simulations when so is small relative to the average of all mutations (so <<
), so that beneficial mutations are rare and at the rightmost tail of f(s). Figure 1, a and b, illustrates fb(sb) in this situation and supplemental Figure 1 shows the corresponding f(s). However, the prediction is still fairly accurate for larger values of so (and larger proportions of beneficial mutations), as illustrated in supplemental Figure 2, for so =
. As expected (Equation 2), when compared to the beta, even the best-fitting exponential distribution provides a less accurate description of fb(sb) when m is small (Figure 1a), but a similarly satisfying one, even with a moderately large m (m = 9, Figure 1b, the two approximations are almost indistinguishable). Nevertheless, even in the latter case, a closer investigation (Figure 2) shows that the exponential approximation inaccurately captures the distribution on its rightmost part (for the largest beneficial effects), while the beta approximation (Equation 1) is accurate on the whole range of beneficial effects. This has little influence on the accuracy of the exponential model for beneficial effects (with large m), but is more problematic when deriving the distribution of fixed effects. Indeed, as for beneficial effects, the beta approximation for the distribution of fixed effects (Equation 3, Figure 1, c and d) gives a good fit to individual simulations (compare solid lines and open circles in Figure 1, c and d). As a comparison, the distribution of sf under the (best-fitting) exponential approximation for sb (Equation C4, APPENDIX C) gives a less good fit to simulations (dashed lines, Figure 1, c and d), worse when the dimensionality is low (Figure 1c, low pleiotropy case). The exponential approximation gives less accurate results for fixed effects (Figure 1, c and d) than for beneficial effects (Figure 1, a and b) because the exponential inaccurately describes the distribution of large beneficial s (Figure 2) that are overrepresented among fixed mutations.
Robustness of the results away from the optimum:
As in the MLM, the results presented here are all weak selection approximations in that they assume that the wild type is close to the optimum (small so), so that beneficial mutations are all of small effect. We checked the robustness of these results when so gets larger: supplemental Figure 2 shows that while the beta approximation for beneficial effects (sb, Equation 1) is less (but still reasonably) accurate when so =
, the beta approximation for fixed effects (sf in Equation 3) remains fairly accurate in this case. More surprisingly, the average value of fixed effects [E(sf), Figure 3] remains close to the tail approximation result (Equation 4), even for fairly large values of so (up to 10 times the average effect of all mutations: so = 0.5 = 10
). As expected again, the prediction from the exponential approximation is less accurate, at least with a small m for beneficial effects, and in both cases for fixed effects. It becomes less and less accurate as so increases (constant difference on log scale, Figure 3). The same pattern holds for E(sb) (not shown). Overall, while the shape of the distributions, away from the tail, is less accurately described by Equations 1 and 3 (see supplemental Figure 2), their means are still fairly robustly predicted for large so. ABSTRACT
MODEL
RESULTS
>DISCUSSION
APPENDIX A: EXACT DISTRIBUTION...
APPENDIX B: TAIL BEHAVIOR...
APPENDIX C: APPROXIMATE...
ACKNOWLEDGEMENTS
LITERATURE CITED
Comparing the beta and exponential distributions for beneficial effects:
Under these fairly general conditions, and although the exact system depends on many parameters, the distribution of beneficial effects, fb(sb), is accurately approximated by a simple beta distribution (Equation 1). This distribution is an example of the generalized Pareto distribution of the Weibull type, not of the Gumbel type as is classically assumed in most of the literature on adaptation theory. However, in the limit of a large number of weakly correlated traits (increased dimensionality m), our beta approximation converges to the exponential, consistent with previous results (ORR 2006). Overall, the distribution of beneficial effects (sb, Equation 1) will substantially differ from an exponential when only a limited number of traits are considered (Figure 1a, supplemental Figure 2). Indeed, the convergence to an exponential is quick as dimensionality increases (Figure 1b, m = 9). However, this convergence to the exponential is slower for the distribution of fixed beneficial effects sf (Figure 1, c and d). This occurs because, even for large m, the exponential distribution tends to particularly overestimate the proportion of largely beneficial effects [the really extreme right tail of f(s)] compared to the beta (Figure 2), and these effects are strongly overrepresented among fixed mutations.Therefore, when assuming selection for an optimum, one should use the beta approximations proposed here (Equations 1–4), whenever possible, as they provide better accuracy in the general case, while retaining the simplicity that made the exponential approximation theoretically attractive. However, when considering beneficial-effect distributions (and with caution for fixed effects), and provided the dimensionality m is even moderately large (an issue we discuss more fully below), the exponential can provide an even simpler and similarly accurate approximation.
An important aspect of our result is that, beyond classic results from extreme value theory, our model provides a biological interpretation of the two parameters that emerge in the tail approximation: so is the selection coefficient of the optimal genotype relative to the wild type; m measures the level of pleiotropy (i.e., the number of not fully dependent dimensions of the phenotypic space under selection). Note that when n > m, there are n – m traits that are completely determined by linear combinations of the first m traits: m is therefore akin to a "degree of freedom," the number of traits that suffice to fully describe the fitness landscape. Although included in the model for the sake of generality, the extra n – m traits are somehow meaningless in terms of pleiotropy. These two parameters (m and so) are, a priori, biologically independent. For instance, because so measures adaptation of the wild type, we may predict how this parameter changes through time as individuals adapt, while m could be expected to remain constant, at least over short evolutionary timescales. Beyond characterizing the distribution of beneficial effects, this model therefore provides a means to predict how this distribution changes through time, as in the FM, while preserving the robustness provided by the use of tail behaviors (extreme value theory) as in the MLM. This is true, including when the exponential approximation is valid (large m, Figure 1b): our model and simulations then show that beneficial effects sb are exponentially distributed, with rate m/2so.
Robustness of the results:
There are few other assumptions in the model, apart from the existence of an optimum, to which the wild type is well adapted. Indeed, the model is approximately valid near a local maximum of any continuous fitness function and for any selective or mutational covariance between traits (by construction). Another relevant issue is modularity: our model assumes total pleiotropy of all mutations on all the traits considered. If distinct mutational targets (e.g., genes) affect at least partly distinct sets of traits, then there is modularity in the effect of mutation (WELCH and WAXMAN 2003). We suspect that the total f(s) would then be a sum of each module's f(s), weighted by the probability of mutation in each module. The effect of such modularity would have to be studied in more detail, but even then there would still be a maximum value of s, so that f(s) would likely pertain to the Weibull domain of attraction, leading to a distribution of beneficial effects of the type of Equation 1. However, the biological interpretation of the two parameters is probably less straightforward in this case.
A surprising property of our model is the robustness of the predictions when away from the tail. The approximate distribution of both beneficial and fixed effects is still reasonably accurate when they are of the same order as the mean effect of all mutations (so =
, supplemental Figure 2), for which the proportion of beneficial mutations is >10%. Even more surprisingly, the mean of these distributions (both beneficial and fixed effects) is accurately predicted, even farther away from the tail (up to so = 10
, Figure 3). Overall, the average log-fitness gain per adaptive fixation [E(sf), Equation 4] is
4so/(4 + m), for fairly arbitrary levels of adaptation of the wild type. However, the prediction would probably fail, away from the tail, if the phenotype-to-fitness function W(z) was not close enough to a Gaussian, which is possible when away from the optimum. Whether or not fitness functions are Gaussian remains an open question, although a review of mutation effects in stressful environments (wild-type ill-adapted) did suggest that it may be a reasonable approximation even away from the optimum (MARTIN and LENORMAND 2006a). Overall, while the simple results derived here may apply also in new and stressful environments (away from the optimum), they are a priori more likely to be valid in benign ones (close to it).
How large is m?
An important issue is whether m is large or not, as it determines the accuracy of the exponential approximation. When m is at least moderately large (m
10), then the classic exponential (Equation 2) could be a sufficient approximation, when describing beneficial effects (although it would be less accurate for fixed effects, as we already mentioned). Although a large m seems intuitively likely because many traits are under selection, it needs not be so: as we have seen above, (i) with even weak mutational and selective correlations, a large set of traits is necessarily mutually dependent, which reduces m, and (ii) traits may be organized in modules. That empirical f(s) are not Gaussian suggests that these effects are important. In fact, whether m is large enough that the exponential is a sufficient description of fb(sb) is mainly an open empirical question; we now turn to this issue.
Empirical estimation of m:
One may estimate m empirically, with a similar approach as proposed to estimate ne (MARTIN and LENORMAND 2006b): from empirical distributions of single-mutant fitnesses [empirical f(s)]. Indeed, at any distance from the optimum, m may be estimated by fitting a generalized Pareto distribution to the extreme right tail of mutation fitness effects. The shape of the GPD,
= –2/m, will then provide an estimate for m. Such a fit can be performed using the method of BEISEL et al. (2007) or routines proposed in the "POT" R package (http://r-forge.r-project.org/projects/pot/), for example. Alternatively, m may be measured from evolution experiments data (TENAILLON et al. 2007). However, some simulations would be needed to check whether this latter method does estimate the required quantity (i.e., m not n) when traits are correlated.
Predicting the proportion of beneficial mutations?
The tail approximation used in this article (JASCHKE et al. 2004) can also be used to derive the proportion pb of beneficial mutations when so is small (Equation B3 of APPENDIX B). Unfortunately, this prediction is much less robust than that on fb(sb), giving strongly inaccurate results unless pb is of the order of
10–3 (simulations not shown). Consistent with this, JASCHKE et al. (2004) showed that their tail approximation gave a poor fit to quadratic forms distributions, unless considering values very close to the rightmost endpoint. Because our prediction for fb(sb) depends on the same approximation, modulo the scaling constant d (see APPENDIX B), we believe the poor robustness of the approximation for f(s) and pb comes from the expression for d being valid only very close to the rightmost endpoint of the distribution. Anyhow, this lack of fit means that our results do not provide a satisfactory expression for pb, and that the displaced gamma approximation should be preferred for this purpose (MARTIN and LENORMAND 2006b), although it yields a slightly more complicated expression.
Extreme value theory and clonal interference:
Clonal interference is the mechanism by which beneficial mutations occurring in different individuals compete for ultimate fixation in asexuals. GERRISH and LENSKI (1998; GERRISH 2001) showed that, in the limit of fairly low mutation rates, this process implies that the mutations that fix are the ones with the largest selection coefficient among all those that appear during a selective sweep. Such a sieving process therefore consists in drawing the maximum value among a set of draws from a distribution, which is exactly what is described by extreme value theory or tail approximations. Therefore, the MLM and the model discussed in this article may prove useful for describing the distribution of fixed mutations in asexuals, in a much more general context than for sexuals, i.e., at any distance from the optimum. As extreme s values (largely beneficial) are strongly overrepresented among mutations that escape clonal interference, it will probably be safer to use the general model (beta approximations) than the exponential limit in this context, as the latter is less accurate when it comes to describing the very right tail of f(s) (Figure 2).
Conclusion:
Overall, our study clarifies the conditions under which the different ways to model the distribution of fitness effects of beneficial mutations give similar or different results and why. In particular, we stress that beneficial mutations may not be exponentially distributed. Under selection for an optimum, the fitness effects of both beneficial and fixed mutations are Beta distributed, which is close to exponential only when there are a large number of weakly correlated traits subject to selection and pleiotropic mutation.ABSTRACT
MODEL
RESULTS
DISCUSSION
>APPENDIX A: EXACT DISTRIBUTION...
APPENDIX B: TAIL BEHAVIOR...
APPENDIX C: APPROXIMATE...
ACKNOWLEDGEMENTS
LITERATURE CITED
![]() | (A1) |
N(0, M). This is a quadratic form in Gaussian vectors; it can always, without loss of generality, be expressed in diagonal form (JASCHKE et al. 2004); i.e., in a new basis where the new phenotypic vectors (x) are linear combinations of the original phenotypic vectors (z),
![]() | (A2) |
= diag(
1, ...,
n) is an n x n diagonal matrix where the
i
0 are the n eigenvalues of –S.M, dx is distributed as a standard multivariate Gaussian, dx
N(0, In), and
.
= {
1, ...,
n}. xo = {x1, ..., xn} is simply zo expressed in the new basis. It is important to note that the expression in (A2) is a particular type of quadratic form as
=
.xo; it has a zero element wherever there is a zero eigenvalue in matrix
. This implies that the dimension of the whole system is not n, but the number m
n of nonzero eigenvalues
i (i.e., the rank of
). As a consequence, we can always express (A2) in a positive-definite form by focusing only on these m dimensions by setting
= diag(
1, ...,
m), where all
i < 0 and
.
= {
1, ...,
m}, where, by identification,
i =
ixi. This argument guarantees that we can apply JASCHKE et al.'s (2004) proposition 3.3, which is valid for positive definite
, even when S and M are not positive definite but only semidefinite. The distribution of s defined in (A2) is bounded on its rightmost end by so = –log(W(0)/W(xo)) =
xo.
.xo
0, which is the selection coefficient of the optimum phenotype (x = 0) relative to the wild type (x = xo). For consistency with JASCHKE et al.'s (2004) notation, note that so can also be expressed as so =
xo.
.xo =
or equivalently as so =
.
The distribution of s on its whole range [–
, so] can be approximated by a displaced gamma distribution that has been introduced by SHAW et al. (2002) for the analysis of mutation fitness effects distributions with beneficial mutations. The resulting approximate pdf of s is given by
![]() | (A3) |
(.) is the gamma function, the shape β and scale
are chosen to fit the mean and variance of f(s), and the displacement parameter is so, the maximum s (MARTIN and LENORMAND 2006b). When the wild type is at any fitness distance so from the optimum, these parameters are approximately
![]() | (A4) |
o are the shape and scale (respectively) of the gamma distribution that fits f(s) at the optimum (i.e., when there are only deleterious mutations), and where
is the distance to the optimum of the wild type (so), measured in terms of fitness and scaled by the average fitness effect of mutation
. The variable
describes the degree of adaptation of the wild type (so) relative to the average fitness effect of single mutations (
). Note that the expressions for
and β in (A4) are only approximate, based on an approximation for the moments of f(s) as a function of so, Equation A4 of MARTIN and LENORMAND (2006b), but the displaced gamma remains a good approximation of f(s), even when the best-fitting parameters are not exactly those given in (A4) (see supplemental Figure 1). ABSTRACT
MODEL
RESULTS
DISCUSSION
APPENDIX A: EXACT DISTRIBUTION...
>APPENDIX B: TAIL BEHAVIOR...
APPENDIX C: APPROXIMATE...
ACKNOWLEDGEMENTS
LITERATURE CITED
i < 0) in our context. In what follows we denote results derived from this tail approximation by an asterisk (*). Applying the tail approximations shows that, to the leading order in (s – so), f(s) approaches
![]() | (B1) |
![]() | (B2) |
i and
j. Here we have assumed that all nonzero eigenvalues have multiplicity of 1 (i.e., they are distinct for each of m traits). With arbitrary multiplicity, the only change is in the expression of d in (B2) (for details, see JASCHKE et al. 2004). From the above tail approximation one easily retrieves the proportion of beneficial mutations
![]() | (B3) |
![]() | (B4) |
This is equivalent to stating that the approximate distribution of sb/so, when so is small, is a beta with shape parameters 1 and m/2:
![]() | (B5) |
The cumulative distribution function (cdf) of the beta distribution above (Fβ(x)) is approximately equal to that of the exponential distribution (with rate m/2) as m gets large,
![]() | (B6) |
Finally, note that the same kind of tail behavior as in Equation B1 is obtained with the displaced gamma approximation defined in Equation A3,
![]() | (B7) |
–β/
(β) is a constant. However, while the exact f(s) and the gamma approximation f
(s) have the same tail behavior qualitatively (compare Equations B1 and B7), they differ quantitatively, as d
d' and m/2
β. Therefore, while the displaced gamma approximation is fairly accurate for the whole distribution of s (supplemental Figure 1), it is less so for the subset of beneficial mutations fb(sb), when so is small, in which case the beta approximation in Equations B4 and B5 is the most accurate. As so gets large, the displaced gamma approximation will become the most accurate for both f(s) and fb(sb). Notably, as so gets close to 0, β
βo = ne/2 (see Equation A2 and MARTIN and LENORMAND 2006b), so that the discrepancy between the two tail behaviors [(B1) vs. (B7)] depends on the difference between the "effective number of traits" ne and the "dimensionality" m. ABSTRACT
MODEL
RESULTS
DISCUSSION
APPENDIX A: EXACT DISTRIBUTION...
APPENDIX B: TAIL BEHAVIOR...
>APPENDIX C: APPROXIMATE...
ACKNOWLEDGEMENTS
LITERATURE CITED
(sf). The pdf of fixed mutation effects, ffix(sf), is therefore
![]() | (C1) |
(s)
2s (HALDANE 1927). As all beneficial mutations are <so, which is assumed to be small, this weak selection approximation should always be fairly accurate whenever the Jaschke tail approximation is valid (wild type well adapted). The pdf of the distribution of fixed effects is approximately
![]() | (C2) |
![]() | (C3) |
Note that for populations of finite (though not too small) size N and effective size Ne (and still with weak selection as assumed here), the probability of fixation of a beneficial allele becomes
2s*Ne/N (WHITLOCK 2000). This scaling factor does not affect Equation C2 so that the distribution of fixed mutation effects is not affected by finite population sizes (i.e., only the probability of a beneficial mutation fixing is affected, not its effect distribution). Note, however, that for smaller N, the above approximation is inaccurate a priori; in particular, deleterious mutations can fix, which was neglected here.
As a comparison, we compute, in the same way, the distribution of fixed effects when the distribution of beneficial mutation effects is exponential with rate
(note that the integral in the numerator of C2 is over the range [0,
] for the exponential). The distribution of fixed mutation effects is then
![]() | (C4) |
![]() | (C5) |
= m/2so is the rate of the exponential distribution of beneficial effects sb (as x = sb/so is exponential with rate m/2, see Equation B6).
ABSTRACT
MODEL
RESULTS
DISCUSSION
APPENDIX A: EXACT DISTRIBUTION...
APPENDIX B: TAIL BEHAVIOR...
APPENDIX C: APPROXIMATE...
>ACKNOWLEDGEMENTS
LITERATURE CITED
ABSTRACT
MODEL
RESULTS
DISCUSSION
APPENDIX A: EXACT DISTRIBUTION...
APPENDIX B: TAIL BEHAVIOR...
APPENDIX C: APPROXIMATE...
ACKNOWLEDGEMENTS
>LITERATURE CITED
BEISEL, C. J., D. R. ROKYTA, H. A. WICHMAN and P. JOYCE, 2007 Testing the extreme value domain of attraction for distributions of beneficial fitness effects. Genetics 176: 2441–2449.
EYRE-WALKER, A., 2006 The genomic rate of adaptive evolution. Trends Ecol. Evol. 21: 569–575.[CrossRef][Medline]
EYRE-WALKER, A., and P. D. KEIGHTLEY, 2007 The distribution of fitness effects of new mutations. Nat. Rev. Genet. 8: 610–618.[CrossRef][Medline]
FISHER, R. A., 1930 The Genetical Theory of Natural Selection. Oxford University Press, Oxford.
GERRISH, P., 2001 The rhythm of microbial adaptation. Nature 413: 299–302.[CrossRef][Medline]
GERRISH, P. J., and R. E. LENSKI, 1998 The fate of competing beneficial mutations in an asexual population. Genetica 103: 127–144.[CrossRef]
GILLESPIE, J. H., 1984 Molecular evolution over the mutational landscape. Evolution 38: 1116–1129.[CrossRef]
HALDANE, J. B. S., 1927 A mathematical theory of natural and artificial selection V. Selection and mutation. Proc. Camb. Philos. Soc. 26: 220–230.
JASCHKE, S., C. KLUPPELBERG and A. LINDNER, 2004 Asymptotic behavior of tails and quantiles of quadratic forms of Gaussian vectors. J. Multivariate Anal. 88: 252–273.[CrossRef]
KASSEN, R., and T. BATAILLON, 2006 The distribution of fitness effects among beneficial mutations prior to selection in experimental populations of bacteria. Nat. Genet. 38: 484–488.[CrossRef][Medline]
LANDE, R., 1979 Quantitative genetic analysis of multivariate evolution, applied to brain:body size allometry. Evolution 33: 402–416.[CrossRef]
MARTIN, G., and T. LENORMAND, 2006a The fitness effect of mutations in stressful environments: a survey in the light of fitness landscape models. Evolution 12: 2413–2427.
MARTIN, G., and T. LENORMAND, 2006b A general multivariate extension of Fisher's geometrical model and the distribution of mutation fitness effects across species. Evolution 60: 893–907.[CrossRef][Medline]
MARTIN, G., S. F. ELENA and T. LENORMAND, 2007 Distributions of epistasis in microbes fit predictions from a fitness landscape model. Nat. Genet. 39: 555–560.[CrossRef][Medline]
MATHAI, A. M., and S. B. PROVOST, 1992 Quadratic Forms in Random Variables. Marcel Dekker, New York.
ORR, H. A., 1998 The population genetics of adaptation: the distribution of factors fixed during adaptive evolution. Evolution 52: 935–949.[CrossRef]
ORR, H. A., 2000 Adaptation and the cost of complexity. Evolution 54: 13–20.[CrossRef][Medline]
ORR, H. A., 2002 The population genetics of adaptation: the adaptation of DNA sequences. Evolution 56: 1317–1330.[CrossRef][Medline]
ORR, H. A., 2003 The distribution of fitness effects among beneficial mutations. Genetics 163: 1519–1526.
ORR, H. A., 2005a The genetic theory of adaptation: a brief history. Nat. Rev. Genet. 6: 119–127.[CrossRef][Medline]
ORR, H. A., 2005b Theories of adaptation: what they do and don't say. Genetica 123: 3–13.[CrossRef][Medline]
ORR, H. A., 2006 The distribution of fitness effects among beneficial mutations in Fisher's geometric model of adaptation. J. Theor. Biol. 238: 279–285.[Medline]
PARK, S., and J. KRUG, 2007 Clonal interference in large populations. Proc. Natl. Acad. Sci. USA 104: 18135–18140.
PICKANDS, I. J., 1975 Statistical inference using extreme order statistics. Ann. Stat. 3: 119–131.[CrossRef]
ROKYTA, D. R., P. JOYCE, S. B. CAUDLE and H. A. WICHMAN, 2005 An empirical test of the mutational landscape model of adaptation using a single-stranded DNA virus. Nat. Genet. 37: 441–444.[CrossRef][Medline]
SANJUÀN, R., A. MOYA and S. F. ELENA, 2004 The distribution of fitness effects caused by single-nucleotide substitutions in an RNA virus. Proc. Natl. Acad. Sci. USA 101: 8396–8401.
SHAW, F. H., C. J. GEYER and R. G. SHAW, 2002 A comprehensive model of mutations affecting fitness and inferences for Arabidopsis thaliana. Evolution 56: 453–463.[CrossRef][Medline]
TENAILLON, O., O. K. SILANDER, J. UZAN and L. CHAO, 2007 Quantifying organismal complexity using a population genetic approach. Plos One 2: e217.[CrossRef]
WAXMAN, D., and J. J. WELCH, 2005 Fisher's microscope and Haldane's ellipse. Am. Nat. 166: 447–457.[CrossRef][Medline]
WELCH, J. J., and D. WAXMAN, 2003 Modularity and the cost of complexity. Evolution 57: 1723–1734.[CrossRef][Medline]
WHITLOCK, M. C., 2000 Fixation of new alleles and the extinction of small populations: drift load, beneficial alleles, and sexual selection. Evolution 54: 1855–1861.[CrossRef][Medline]
WILKE, C. O., 2004 The speed of adaptation in large asexual populations. Genetics 167: 2045–2053.
Communicating editor: M. W. FELDMAN
This article has been cited by other articles:
![]() |
P. Joyce, D. R. Rokyta, C. J. Beisel, and H. A. Orr A General Extreme Value Theory Model for the Adaptation of DNA Sequences Under Strong Selection and Weak Mutation Genetics, November 1, 2008; 180(3): 1627 - 1643. [Abstract] [Full Text] [PDF] |
||||
- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Data Supplement
-
All Versions of this Article:
genetics.108.087122v1
179/2/907 most recent - Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Martin, G.
- Articles by Lenormand, T.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Martin, G.
- Articles by Lenormand, T.






]. The beta approximation leads to a more accurate prediction than the exponential in both cases: the increase in accuracy is almost undetectable for high pleiotropy (b and d), but substantial for low pleiotropy (a and c).

. Open circles give E(sf) from simulations, while lines give the prediction from either the beta approximation (















