- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Slatkin, M.
- Articles by Muirhead, C. A.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Slatkin, M.
- Articles by Muirhead, C. A.
Overdominant Alleles in a Population of Variable Size
Montgomery Slatkina and Christina A. Muirheadaa Department of Integrative Biology, University of California, Berkeley, California 94720-3140
Corresponding author: Montgomery Slatkin, Department of Integrative Biology, University of California, 3060 Valley Life Sciences Bldg., Berkeley, CA 94720-3140., slatkin{at}socrates.berkeley.edu (E-mail)
Communicating editor: N. TAKAHATA
| ABSTRACT |
|---|
An approximate method is developed to predict the number of strongly overdominant alleles in a population of which the size varies with time. The approximation relies on the strong-selection weak-mutation (SSWM) method introduced by J. H. Gillespie and leads to a Markov chain model that describes the number of common alleles in the population. The parameters of the transition matrix of the Markov chain depend in a simple way on the population size. For a population of constant size, the Markov chain leads to results that are nearly the same as those of N. Takahata. The Markov chain allows the prediction of the numbers of common alleles during and after a population bottleneck and the numbers of alleles surviving from before a bottleneck. This method is also adapted to modeling the case in which there are two classes of alleles, with one class causing a reduction in fitness relative to the other class. Very slight selection against one class can strongly affect the relative frequencies of the two classes and the relative ages of alleles in each class.
STRONG balancing selection can result from overdominance in
tness or from disassortative mating of the kind created by self-incompatibility systems in plants. Models of balancing selection are difficult to analyze completely because of the large number of alleles present, so numerous approximations have been used to provide some quantitative understanding of how a balance is achieved between mutation, selection, and genetic drift. Approximations that have been usedincluding those of ![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
In this article, we introduce an approximate analytic theory that allows us to obtain relatively simple results for populations of variable size and to generalize models of balancing selection to allow for slight differences in relative fitness of different classes of alleles. Our approximate method is based on the strong-selection weak-mutation (abbreviated as SSWM and pronounced "swim") approximation of ![]()
![]()
![]()
![]()
The Markov chain can be further approximated by an ordinary differential equation for the expected numbers of alleles and thus can provide a rough prediction of the rate at which alleles are gained and lost because of a bottleneck. We verify that our method provides an accurate approximation by comparing our predictions with computer simulations. Our results are very similar to and consistent with those of ![]()
![]()
| ONE SELECTIVE CLASS OF ALLELES |
|---|
Analytic theory:
Throughout, we consider a single autosomal locus in a randomly mating diploid population in which the population size, N(t), may vary with time. We assume that mutations occur at rate u and that every mutation is to an allele not previously found in the populationthe infinite alleles model of ![]()
tness, in which every heterozygote has a fitness of 1 + s relative to every homozygote. This parameterization is different from that of ![]()
New mutations arise and if they become common, they persist in the population as common alleles for a long time. This situation is suitable for GILLESPIE's (1984, 1991) SSWM approximation. We treat i, the number of alleles, as a random variable for which transitions from one time to the next can be modeled by a Markov chain. In fact, the Markov chain we use when there is one class of alleles is of a particularly simple kind because it allows for an increase or decrease in i by only one. Consider a population with i common alleles and assume that they are at the deterministic equilibrium with each having a frequency 1/i. Now suppose that a new allele appears by mutation. Let the frequency of the new mutant be z. The change in z is approximately
![]() |
(1) |
With a mutation rate u per generation, there are on average 2N(t)u mutations per generation, so the probability per generation that a new mutant appears and becomes common is
![]() |
(2) |
i(t), we are assuming that if a mutant is to become common, it does so instantaneously on the time scale of interest. If an allele does not become common, we assume that it is lost from the population so quickly that it can be ignored.
We also need the probability of loss of a common allele. ![]()
![]() |
(3) |
(x) must be computed numerically from the equation given on p. 2420 of
When i alleles are equally frequent, F = 1/i. Equation 3 differs from Takahata's equation because time is measured in generations and not in units of 2N generations. When S is large, common alleles are held relatively close to their expected frequency, x = 1/i, with only a weak force of genetic drift pulling them toward the boundary (x = 0). This stochastic process is similar to that described by ![]()
.
When population size varies with time, it is still reasonable to use the assumption that loss of an allele occurs instantaneously with the probability of loss per generation determined by N(t), provided that 2N(t)s is large enough that the selection dominates genetic drift. In that case, the instantaneous rate of loss is still
with S replaced by 2N(t)s. Because there are i alleles that are liable to be lost by drift, the probability of loss per generation is
![]() |
(4) |
i(t).
We can compare our approximation with TAKAHATA's (1990) by assuming a constant population size N and finding the value of i for which µi =
i. The result is
![]() |
(5) |
|
|
We can use the Markov chain to predict the stationary probability distribution of i in the population,
i. Because of the simple form of the chain, we can use standard theory to find
i
![]() |
(6) |
1 is chosen so that the
i sum to 1 (
i is very close to the values of a normal distribution with mean equal to î and variance equal to î3/(2S). Thus, it is not necessary to use (6) for most practical purposes.
Simulation results:
To test the accuracy of our analytic approximation, we developed a computer simulation of our model. We assumed a population of N diploid individuals. A generation consisted of mutation, viability selection, and random mating. Beginning with N zygotes, the program generated a Poisson distributed random number of mutations with mean 2Nu. For each mutation, one of the 2N copies was chosen at random to mutate to an allele not found in the population. It was possible for the same copy to mutate more than once, but because 2Nu was one or less in our simulations, double mutations were extremely rare. Viability selection was modeled by computing the deterministic change in allele frequencies given the allele frequencies and selection parameter. Then random mating was modeled by randomly drawing 2N new alleles from a multinomial distribution with allele frequencies given by the values after selection. We began each simulation with the population initially fixed for one allele and waited until a stochastic equilibrium was reached before recording summary statistics. The waiting time depended primarily on the mutation rate and was always <1/u generations.
In the simulations, we recorded the numbers of common alleles at each time. We used TAKAHATA's (1990) threshold for a common allele, his value
= 1/(4Nsm), where m = F - u/s and F is the homozygosity. We found that the exact value of this threshold made little difference. Figure 1 shows some typical results for the distribution of i, the number of common alleles across replicates, compared to normal distribution with the mean and variance computed from the Markov chain approximation. The fit is close although not perfect. There are slightly more common alleles found than predicted. Figure 2 shows the average number of common alleles as a function of N. These results show that there is little difference between the predictions of Equation 5 and TAKAHATA's (1990) result (his Equation 5) and that both provide good approximations to the simulated results. Although in Figure 2 the results from the Markov chain are slightly closer to the simulation results, that is not always the case for other combinations of parameters.
| POPULATION BOTTLENECK |
|---|
We can use our method to predict the response to changes in population size. These predictions can be obtained in two ways. The first is to use the solution to the equation that treats i as a deterministic variable with
i and µi as the deterministic rates of increase and decrease. That is, we use the equation
![]() |
(7) |
![]() |
(8) |
i(i0,N1) and µl(i0,N1) are the values at i = i0 and N = N1. For small t this approximate equation has the solution ![]() |
(9) |
The second and more accurate way to predict the effect of a change in population size is to iterate the Markov chain using the time-dependent values of
i and µi. Figure 3 compares the simulation results with the results from iterating the Markov chain and with the linear approximation, Equation 9.
|
In many discussions of the effect of a population bottleneck on variability at an overdominant locus, an important question is how rapidly are alleles lost as a result of the bottleneck. Our results support the conclusions of ![]()
(x) by numerical integration with values based on assuming no selection and values based on Equation 3. We can see that the probability of loss under selection is always smaller than under neutrality, as expected. Overdominant selection retards the loss of common alleles. The analytic approximation, Equation 3, is quite accurate for a reduction in size by a factor of two but not for a larger reduction. The neutral approximation is accurate if the reduction is by a factor of five or more.
|
![]()
![]()
(x) over that time period. Because all common alleles are equivalent in survival probability, the probability distribution of the number of lineages surviving from before the bottleneck is a binomial distribution with the probability being the probability of survival of a single lineage and the sample size being the number of lineages immediately before the bottleneck. The distribution of the number of new lineages is then obtained by subtraction.
Both TAKAHATA's (1993) and our models of a bottleneck in population size assume a diffusion limit in which at most one common allele can be lost per generation. That assumption is appropriate if a bottleneck is not too sudden or small, and it appears to be appropriate for modeling the history of human populations. If the reduction in population size is extreme and rapid as might happen in the colonization of an island or other isolated region, then many alleles could be lost in a single generation. In that case, a single generation of random sampling would have to be included in the analysis to allow for the loss of several alleles in one generation.
These results support the discussions of ![]()
![]()
We have not been able to obtain a simple criterion for the ranges of parameter values for which the asymptotic approximation for strong selection, Equation 4, applies and when the asymptotic approximation for neutral alleles applies. The exact result depends in a complex way on i. The integral is easy to evaluate using a mathematical algebra program, so that is the best way to determine whether either of the asymptotic approximations can be used in any case.
| TWO SELECTIVE CLASSES OF ALLELES |
|---|
Our analysis of overdominant selection follows a long tradition of assuming equal fitnesses of all homozygotes and all heterozygotes. Even that case has proved difficult enough to analyze. The method we have developed allows us to test the robustness of results based on assuming homogeneity of selection on heterozygotes and homozygotes. We explore the possibility that there is heterogeneity in selection and show that relatively small differences in fitness can lead to large differences from the results previously obtained. We use a fitness scheme that leads to relatively simple algebra, but other assumptions about heterogeneity in fitness can be analyzed in the same way. ![]()
We assume that there are two classes of alleles, with i different alleles in class A and j different alleles in class a. The relative fitnesses are
![]() |
(10) |
We first investigate the deterministic theory of this selection model. We let x be the equilibrium frequency of each allele in class A and y be the equilibrium frequency of each allele in class a (ix + jy = 1). It is relatively easy to show that at equilibrium
![]() |
(11) |
Using the same approach as in the previous section, we assume that the population contains i class A alleles, each in frequency x, and j class a alleles, each in frequency y, and then find the change in frequency of a new mutant of each type. It is straightforward to show that if zA is the frequency of a new class A mutant and za is the frequency of a new class a mutant, then
![]() |
(12a) |
![]() |
(12b) |
F = ix2 + jy2 is the homozygosity and s and r are assumed to be small. Therefore, the probability that a new class A or class a allele becomes common is approximately 2smA and 2sma. If the mutation rate to new class A alleles is u and the rate to class a alleles is v, then the probabilities of i and j increasing by one are approximately
![]() |
(13) |
The expected time to loss of a class A allele initially at frequency x,
A(x), is given by Equation 3, where now x takes the value in Equation 11 is replaced by mA, and the expected time to loss of a class a allele initially at frequency y defined in (11),
a(x), is given by (3) with y replacing x and ma replacing m. The probabilities per generation that i and j are reduced by 1 are then
![]() |
(14a) |
![]() |
(14b) |
If the population size is constant, we can estimate the equilibrium numbers of alleles in both classes by solving the pair of equations,
![]() |
(15) |
|
We tested the accuracy of the Markov chain approximation for the case of two classes of alleles. The simulation program was the same as described above except for the modifications necessary to account for two classes of alleles. As shown in Figure 5, the average numbers of alleles are as predicted by the analytic theory.
If the population is constant in size, the average time to loss of alleles in the two classes is approximately the average allele age. The average ages of class A and a alleles can be quite different, even for relatively small values of the ratio r/s. By substituting into (3), we find
![]() |
(16) |
|
Figure 6 also shows the ratio of allele ages found in the simulations. For relatively small values of r/s the Markov chain accurately predicts the ratio of average ages (and both average ages as well), but for larger r/s the ratio of ages is even larger than predicted because alleles in the a class are so rare that their loss is no longer slowed by overdominant selection. Note that even very small values of r/s can result in ratios of average ages of five or larger.
| DISCUSSION |
|---|
We have shown that the SWMM approximation of ![]()
![]()
Our results are of relevance to the problem of understanding the effects of a bottleneck in population size on the number of alleles at a locus such as those in the major histocompatibility complex (MHC) in humans and other mammals. Many loci in the MHC are highly polymorphic and the high level of polymorphism is thought to be maintained by overdominance in fitness (![]()
![]()
![]()
![]()
![]()
![]()
Our results are also of relevance to discussions of the relative ages of overdominant alleles. Under TAKAHATA's (1990) theory, common alleles follow a coalescent model with a rescaled population size. According to that theory, the relative ages of alleles follow a geometric distribution. Our results with two selective classes of alleles show that even very slight heterogeneity in selective effects of different alleles can lead to large deviations from a geometric distribution. Ratios of ages of 5 or 10 are easily obtained when there is much less than a 1% difference in relative fitnesses. Slight differences in relative fitness are amplified by the long persistence time of both classes of alleles. Our model of two selective classes is not the only one possible; a similar approach could be used for three or more selective classes. The result for two selective classes tells us that, unfortunately, pleiotropic effects far too small to measure could cause substantial deviations from the coalescent approximation of ![]()
| ACKNOWLEDGMENTS |
|---|
This research was supported in part by U.S. Public Health Service grant R01-GM40282 to M.S. C.A.M. also received support from U.S. Public Health Service grant T32-GM07127-24. We thank J. Gillespie and M. Turelli for helpful discussions of this topic, M. Uyenoyama for telling us about Sasaki's previous work on the SSWM approximation, and A. Sasaki for providing additional information about his Ph.D. dissertation.
Manuscript received November 16, 1998; Accepted for publication February 26, 1999.
| LITERATURE CITED |
|---|
AYALA, F. J., 1995 The myth of Eve: molecular biology and human origins. Science 270:1930-1936
ELLEGREN, H., S. MIKKO, K. WALLIN, and L. ANDERSON, 1996 Limited polymorphism at major histocompatibility complex (MHC) loci in the Swedish moose A. alces.. Mol. Ecol. 5:3-9[Medline].
EWENS, W. J., 1979 Mathematical Population Genetics. Springer-Verlag, New York.
GILLESPIE, J. H., 1984 Some properties of finite populations experiencing strong selection and weak mutation. Am. Nat. 121:691-708.
GILLESPIE, J. H., 1991 The Causes of Molecular Evolution. Oxford University Press, Oxford.
HUGHES, A. L. and M. NEI, 1988 Pattern of nucleotide substitution at major histocompatibility complex class I loci reveals overdominant selection. Nature 335:167-170[Medline].
KIMURA, M. and J. F. CROW, 1964 The number of alleles that can be maintained in a finite population. Genetics 49:725-738
KLEIN, J., N. TAKAHATA, and F. J. AYALA, 1993 MHC polymorphism and human origins. Sci. Am. 269:78-83[Medline].
MILLER, K. M., R. E. WITHLER, and T. D. BEACHAM, 1997 Molecular evolution at Mhc genes in two populations of chinook salmon Oncorhynchus tshawytscha. Mol. Ecol. 6:937-954[Medline].
NEWMAN, C. M., J. E. COHEN, and C. KIPNIS, 1985 Neo-darwinian evolution implies punctuated equilibria. Nature 315:400-401.
SASAKI, A., 1989 Evolution of Pathogen Strategies. Ph.D. Thesis, Kyushu University, Fukuoka, Japan.
SASAKI, A., 1992 The evolution of host and pathogen genes under epidemiological interaction, pp. 247263 in Population Paleo-Genetics, edited by N. TAKAHATA. Japan Scientific Society Press, Tokyo.
SATTA, Y., 1997 Effects of intra-locus recombination of HLA polymorphism. Hereditas 127:105-112[Medline].
TAKAHATA, N., 1990 A simple genealogical structure of strongly balanced allelic lines and trans-species evolution of polymorphism. Proc. Natl. Acad. Sci. USA 87:2419-2423
TAKAHATA, N., 1993 Evolutionary genetics of human paleo-populations, pp. 121 in Mechanisms of Molecular Evolution, edited by N. TAKAHATA. Sinauer Associates, Sunderland, MA.
VEKEMANS, X. and M. SLATKIN, 1994 Gene and allelic genealogies at a gametophytic self-incompatibility locus. Genetics 137:1157-1165[Abstract].
WENINK, P. W., A. F. GROEN, M. E. ROELKE-PARKER, and H. H. T. PRINS, 1998 African buffalo maintain high genetic diversity in the major histocompatibility complex in spite of historically known population bottlenecks. Mol. Ecol. 7:1315-1322[Medline].
WRIGHT, S., 1939 The distribution of self-sterility alleles in populations. Genetics 24:538-552
This article has been cited by other articles:
![]() |
V. Laurent, M. Voisin, and S. Planes Genetic Clines in the Bay of Biscay Provide Estimates of Migration for Sardina pilchardus J. Hered., January 1, 2006; 97(1): 81 - 88. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. A. Muirhead, N. L. Glass, and M. Slatkin Multilocus Self-Recognition Systems in Fungi as a Cause of Trans-Species Polymorphism Genetics, June 1, 2002; 161(2): 633 - 641. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Slatkin and C. A. Muirhead A Method for Estimating the Intensity of Overdominant Selection From the Distribution of Allele Frequencies Genetics, December 1, 2000; 156(4): 2119 - 2126. [Abstract] [Full Text] |
||||
![]() |
M. Slatkin Balancing Selection at Closely Linked, Overdominant Loci in a Finite Population Genetics, March 1, 2000; 154(3): 1367 - 1378. [Abstract] [Full Text] |
||||
- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Slatkin, M.
- Articles by Muirhead, C. A.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Slatkin, M.
- Articles by Muirhead, C. A.

























