- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Epperson, B. K.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Epperson, B. K.
Gene Genealogies in Geographically Structured Populations
Bryan K. Eppersonaa Michigan State University, East Lansing, Michigan 48824
Corresponding author: Bryan K. Epperson, Michigan State University, East Lansing, MI 48824., epperson{at}pilot.msu.edu (E-mail)
Communicating editor: M. W. FELDMAN
| ABSTRACT |
|---|
Population genetics theory has dealt only with the spatial or geographic pattern of degrees of relatedness or genetic similarity separately for each point in time. However, a frequent goal of experimental studies is to infer migration patterns that occurred in the past or over extended periods of time. To fully understand how a present geographic pattern of genetic variation reflects one in the past, it is necessary to build genealogy models that directly relate the two. For the first time, space-time probabilities of identity by descent and coalescence probabilities are formulated and characterized in this article. Formulations for general migration processes are developed and applied to specific types of systems. The results can be used to determine the level of certainty that genes found in present populations are descended from ancient genes in the same population or nearby populations vs. geographically distant populations. Some parameter combinations result in past populations that are quite distant geographically being essentially as likely to contain ancestors of genes at a given population as the past population located at the same place. This has implications for the geographic point of origin of ancestral, "Eve," genes. The results also form the first model for emerging "space-time" molecular genetic data.
UNDERSTANDING how genetic lineages trace through time and space has long been a central concern of population genetics theory. Stochastic models of the spatial or geographic structure among populations in terms of levels of inbreeding (![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
Various methods of estimating probabilities of IBD based on genetic data have been established. For decades various approximate estimators have been developed (e.g., ![]()
![]()
The significance of further understanding of space-time processes lies in part in the fact that we often want to understand historical migrational processes. There is considerable interest in the geographic origins of genetic polymorphisms and in the origins and spread of populations. Recent examples include the use of molecular data to infer origins of ALU polymorphisms in humans and the geographic origins of "modern" humans themselves (e.g., ![]()
![]()
![]()
![]()
![]()
Another emerging feature of modern molecular genetics is the ability to obtain ancient DNA from mummies and even fossils (e.g., ![]()
MALÉCOT (e.g., 1950, 1972, 1973, 1975) developed elegant mathematical tools in characterizing the purely spatial probabilities of IBD, that is, how the probabilities that two genes randomly selected from each of two separate (or also the same) populations (at the same time period) change as the distance between populations increases, for a wide range of situations. He further showed exactly how these probabilities of IBD are related to coalescence or gametic kinship chains. Although it is not pursued in great detail in the present article, an analogous but somewhat more complicated relationship exists between space-time probabilities of IBD and space-time coalescence between two genes (separated in time as well as space). We indicate how coalescence analogs of the present general models may be derived from them and hence may be calculated for specific cases.
In this article, we first develop definitions for space-time probabilities of IBD and determine fundamental linkages among them as functions of time and space, and methods of calculating them, for completely general migration processes. We briefly consider the case of a single population and then develop further theoretical results for isotropic, but otherwise general, models. We also develop the Fourier transform of space-time probabilities of IBD in an APPENDIX. Finally, we develop more explicit results for isotropic migration in systems with only one spatial dimension as examples to illustrate some of the paramount features.
| RESULTS |
|---|
General formulation of space-time probabilities of identity by descent and their relationships to spatial probabilities of identity by descent:
It is assumed that there is a well-ordered array of populations located in multidimensional space. Each population has N diploid individuals undergoing ![]()
![]()
n,b(w, x), is the probability that two genes are IBD, where one gene,
, is selected at random from a population at generation n - b at location w (where w is a vector of coordinates locating the population), and the other,
', is selected at random from a population at the present generation n and located at x. For future reference, the probabilities that the same two genes coalesced s generations prior (time forward) to n - b can be denoted
n,b,s(w, x), and sometimes it is convenient to consider "coalescence" events that occur between generations n and n - b (time backward). [Note that this means that, looking backward from generation n, generation n - b - s (s < 0) saw the "first" gene that is a direct ancestor of the two genes
and
'.]
Let l(w, z) be the (time forward) rate of migration from w to z that occurs over one generation. Note that for the purely spatial probabilities of IBD (b = 0)
n,0(w, x) =
n,0(x, w).
Let us first consider the space-time probabilities of IBD for a single generation time lag (i.e., b = 1),
n,1(w, x). We ignore mutation for the moment, which implies that descendence corresponds to IBD or identity in state (![]()
n,1(w, x), we may consider three separate probabilities of the ways that
and
' may be IBD:
- Probability that
' is directly descended from
, in which case it is necessarily identical by descent. This is simply the probability that the previous generation ancestor of
' came from population w, l(w, x), times the probability that the gene was in fact
, i.e., 1/2N. Thus this probability is l(w, x)/2N. - Probability that
' is not directly descended from
, but is descended from another individual gene in population w, and the latter gene is IBD with
. This is the product of two probabilities. The first probability is that
' is not directly descended from
but is descended from an individual gene in population w, and in the specific case where all populations have the same size this is equal to l(w, x)(1 - 1/2N). Conditional on this, the probability that this gene is IBD with
' is simply the within-population probability of IBD at that generation (n - 1), hence
n-1,0(w, w), which we may also call
n-1,0(0), or more simply
n-1(0), because they are the same for all populations. - Probability that
' is not directly descended from w (i.e., it is descended from some other population, z) but is nonetheless IBD with
,
n-1,0(w, z). To get this, we must sum over all possible migrations from populations other than w; hence this probability is equal to 
(1) - Thus we have the equation

(2)
The space-time probabilities of IBD with time lag one are linear functions of the spatial probabilities of IBD. Hence the stationarity or equilibrium (as n goes to infinity) conditions are the same as those for spatial IBDs. If we include a recall coefficient, 1.0 > k > 0, which is most interesting when it represents mutation rates either in the infinite sites (wherein the
's represent probabilities that two nonrecombining haplotypes have no sites that differ) or infinite alleles mutation models, then the spatial probabilities of IBD reach equilibrium (![]()
![]() |
(3) |
Development of general equations for higher temporal order time lags may be illustrated by first examining the second temporal order probabilities of IBD. In essence, the development follows that for the temporal order one case, except that we must consider all possible paths of migration in the intervening generation(s). There are three components:
- Probability that
' is directly descended from
and hence is identical by descent: 
(4) - Probability that
' is not directly descended from
, but is descended from an individual gene in population w, and the latter gene is IBD with
': 
(5) - Probability that
' is not directly descended from w, but is nonetheless IBD with
: 
(6) - Thus,
n,2(w, x) equals 
(7)
and
![]() |
(8) |
At equilibrium, when mutation is included,
![]() |
(9) |
For temporal lags greater than two time periods, it is possible to construct similar equations by summing over all paths of migrations during intervening time periods. However, the equations become complex and there is a simpler way of expressing space-time probabilities of IBD in terms of those with the next shortest time lag, b. This makes use of the fact that the condition of IBD for
n, b(w, x), for genes in a population w at time n - b (b lags in the past), requires that the genes must also be IBD somewhere, z (for all z), at time n - 1 (one lag in the past). The value of
n, b(w, x) is the sum of the
n-1,b-1(w, z) times the probability that such genes in x at time n descended from z in the previous generation, i.e., the migration rate. Hence, for b > 1,
![]() |
(10) |
This is also shown by induction in APPENDIX A. For the case of equilibrium with k greater than zero, this reduces to
![]() |
(11) |
Thus all of the space-time probabilities of IBD can be obtained from the spatial probabilities of identity by using Equation 8 (or 9 in the case of equilibrium) and then iterating Equation 10 (or 11).
Special case of a single population:
In the case of a single population we may consider the probability,
n,b, that a gene,
', selected at random at generation n, is IBD (barring mutation for the moment), with another gene,
, which is selected at random from the population at generation n - b. For b = 1, the probability that
' is directly descended from
(and hence is also IBD) is simply 1/2N. The probability that it is not directly descended from but is nonetheless IBD is (1 - 1/2N)
n-1,0. Hence
n,1 =
+ (1 -
)
n-1,0. Rather than go through the iterative process, we make use of the fact that because there is a single population,
' must be descended from either
(with probability 1/2N) or elsewhere in the population. Hence we have the simple result:
n,b =
+ (1 -
)
n - b,0. If mutation is included, then
![]() |
(12) |
At equilibrium,
![]() |
(13) |
For large b and N and small k,
![]() |
(14) |
Case of multiple populations with arbitrary dimensionality but with isotropic migrations and at equilibrium:
In the case of isotropic migration rates (where migration rates are the same for spatial lags for each of two directions within a dimension but may differ between dimensions), it is convenient to use spatial lags rather than absolute locations of populations and to assume that the populations either extend infinitely in all dimensions or are supported by a multidimensional torus (![]()
b(y) be the probability of IBD between
and
' at two different populations separated by b generations in time and by spatial lags in the vector y in space. Translation of Equation 3 and factoring (1 - k) gives
![]() |
(15) |
or equivalently
![]() |
(16) |
or
![]() |
(17) |
Similarly,
![]() |
(18) |
or equivalently
![]() |
(19) |
Thus,
![]() |
(20) |
![]() |
(21) |
It is easy to see that in general
b(yb) =
![]() |
(22) |
Thus the space-time probabilities of IBD can be determined from the spatial probabilities of IBD using Equation 22.
Case of multiple populations with one spatial dimension but with isotropic migrations and at equilibrium:
The case of populations located along a single dimension illustrates several aspects of the space-time probabilities of IBD. In this case each spatial index is an integer, not a vector. Using Fourier transforms, ![]()
![]() |
(23) |
where
2 is the variance in the distance of migration. A continuous approximation for large distances of separation is
![]() |
(24) |
![]() |
(25) |
Development of the Fourier transforms of the space-time probabilities of IBD for the one-dimension case is presented in Appendix 1 and can be used for obtaining analytic results on the space-time probabilities of IBD.
For the special case of the strict stepping-stone model,
2 = 2m, where m is the migration rate between adjacent populations, so that
![]() |
(26) |
![]() |
(27) |
letting
0(0) = a and
![]() |
(28) |
we have
0(y) = agy. For y
-1, 0, or 1,
![]() |
(29) |
![]() |
(30) |
![]() |
(31) |
Letting c = mg2 + m + (1 - 2m)g, we have
1(y) = (1 - k)acgy-1. Using this and Equation 11, and for -2 > y, or y > 2, we have
2(y) = (1 - k)2 ac2 gy-2. Iterating Equation 11, we have for -b > y or y > b, the simple relationship
b(y) = (1 - k)b acbgy-b. This indicates that for populations where direct descendence is not possible among the two populations (separated by b in time and y in space and -b > y, or y > b) in the strict stepping-stone model, the space-time probabilities of IBD exponentially decrease with spatial distance.
For y = -1 or 1,
![]() |
(32) |
For y = 0,
![]() |
(33) |
Examples that capture many of the salient features of space-time probabilities of IBD for isotropic migration processes are shown for strict stepping-stone equilibrium models with one spatial dimension in Table 1 Table 2 Table 3 Table 4. These were calculated using Equation 31, Equation 32, and Equation 33 in conjunction with Equation 11. Calculations were first done assuming 40,000 populations to avoid any possible edge effects, but the same numbers occurred for calculations using 400 populations. Computations for 400 populations for 10,000 generations used ~5 sec of CPU on a Sun Microsystems Sparcstation 20. The key features are: (1) the degree to which the probabilities do not decrease smoothly in time or space; (2) the degree to which the function over space may become more flat as time lag increases; and (3) the general effects of the parameters. Population size, N, affects all space-time probabilities of IBD in exactly the same way. Precisely, as is clear in the Fourier transform in Appendix 1 (Equation B19), the probability of IBD within a population,
0(0), decreases linearly with N, and all of the space-time probabilities decrease by (1 -
0(0))/2N, so that the relative values of the
b(y) (for b and y not equal to zero) are unaffected by N. For the models shown in Table 1 Table 2 Table 3 Table 4, an arbitrary but small population size (N = 100) was used to better show the effects of the other parameters, the rates of migration and mutation. Of course, for larger migration rates the purely spatial probabilities of IBD are smaller for short distances, but they also decrease more slowly as distance increases (![]()
|
|
|
|
For short time lags, particularly for time lag 1 the probabilities of IBD for spatial lags 0 or ±1 may actually be greater than
0(0), especially if the mutation rate is not too large. The effect increases as the migration rate increases. Similarly, the probabilities of the type
b(0) tend to decrease rapidly as migration rate increases. Naturally the larger the value of k, the faster the decreases with time lag, generally. However, for some combinations of migration rate and k, small increases in probabilities of IBD can occur from b to b + 1 even at large spatial and temporal lags.
For long temporal lags, there can be remarkable "flattening" of the probabilities of IBD function on distance, especially when mutation and migration are both strong. Nonetheless, it is also remarkable that the curves are relatively flat only up to 10 to 100 distance units in most realistic scenarios. Still this may be a substantial distance.
For the space-time coalescence probabilities, the n is not necessary so long as we are careful not to exceed n generations going backward in time and equilibrium is sufficient but not necessary in this regard. We derived equations analogous to Equation 3, Equation 9, and Equation 11 for
, for example, the following (for s > 1)
![]() |
(34) |
and (for s = 1)
![]() |
(35) |
The exact same equations are found for the coalescence probabilities for two sampled genes separated in space but not time (![]()
![]() |
(36) |
Note the summation includes s = 0 (probability that
' is a direct descendent of
).
| DISCUSSION |
|---|
The mathematical relationships developed in this article demonstrate that the probabilities (
b(y)) of IBD between genes separated by time lags (b) as well as distance lags (y) in space are (usually complex) linear functions of the spatial probabilities of IBD for general migration models with arbitrary numbers of spatial dimensions, with isotropic or anisotropic migration, at equilibrium or not. Equations were generated that can be iterated so that the space-time probabilities of IBD can be calculated from the spatial probabilities of IBD, again for the same range of general models. In all of these systems the effects of number of individuals within populations, N, are simple. The probabilities of IBD within the same population at the same time and the purely spatial probabilities of IBD decrease linearly with N (e.g., ![]()
![]()
![]()
![]()
![]()
![]()
![]()
Several fundamental features of space-time probabilities of IBD were illustrated using the equilibrium one-dimensional strict stepping-stone migration process, for which the purely spatial probabilities have the simple form of an exponential decrease with distance of spatial separation. First, for relatively short time lags, the probabilities of IBD for relatively small distances can exhibit complex behavior, which would not necessarily be expected from consideration of purely spatial patterns. Probabilities of IBD for two genes existing at different generations but within the same population or between two nearby populations may actually increase as the time lag increases. Such effects are greater when mutation rates are higher and are slightly increased (for very short time periods) when migration rates are higher. Naturally, mutation tends to decrease overall probabilities of IBD and low migration rates tend to increase short distance probabilities of IBD. For the purely spatial probabilities, they drop off more sharply with distance when there are higher mutation rates or when there are lower migration rates.
For longer time lags, b, even when there are sharp declines in purely spatial probabilities of IBD (as distance increases), which occur particularly when mutation rates are high and migration rates are low, there can be a remarkable degree of "flatness" when b increases. That is, looking at ancient generations, IBD decreases much more slowly with geographic or spatial distance. Such effects are greatest when migration rates are high, as would be expected. The same also occurs in space-time correlations in allele frequencies (![]()
0(100) (0.0114) is only 4% as large as at the origin,
0(0) (0.2833), whereas for 10,000 generations ago,
10,000(100) (0.0101) is 23% as large as
10,000(0) (0.0441). It may be expected that a system of populations existing in two dimensions would show even greater flatness, as is the case for space-time correlations of gene frequencies (![]()
![]()
Finally, this article developed coalescence probabilities for two genes in samples separated in time as well space. Thus the coalescent can be extended to ancient DNA, and, for example, an ancient DNA sample could be placed in a gene genealogy reconstruction using the coalescent.
| ACKNOWLEDGMENTS |
|---|
I thank two anonymous reviewers for helpful comments on an earlier version of the manuscript. This research was supported in part by grants from McIntire-Stennis and the Michigan Agricultural Experiment Station.
Manuscript received July 1, 1998; Accepted for publication March 8, 1999.
| APPENDIX A |
|---|
To demonstrate the validity of Equation 10 in the text consider the equation for
n,2(w, x),
![]() |
(A1) |
and text Equation 2,
![]() |
(A2) |
Substituting the right-hand side of Equation A2 for
n-1,1(w, k) in Equation A1 produces
![]() |
(A3) |
The simplest way to show the correspondence of this equation to text Equation 7 or Equation 8 is to multiply through the brackets and compare term by term. The first term in Equation A3 is, substituting the dummy variable z for k,
![]() |
(A4) |
which is a rearrangement of the first term in text Equation 7. The same rearrangement of the second term in (A3) is
![]() |
(A5) |
and this clearly equals the second term of Equation 7. For the third term we first interchange the summation signs, giving
![]() |
(A6) |
Substituting y for k yields the third term of Equation 7. Thus Equation A3 and Equation 7 are equivalent. Because the higher-order lags (>2) involve the same types of terms, it follows that Equation 10 is true by induction. Of course, the analog of Equation 11 is true also when there is mutation and the equilibrium is obtained.
| APPENDIX B |
|---|
As an example, we develop the Fourier transform for the case of isotropic migration, equilibrium, and one spatial dimension. The Fourier transform is F =
y
yf(y), where
= e-i
, and let us define K(
) as
![]() |
(B1) |
Applying the transform to both sides of text Equation 17, we have
![]() |
(B2) |
Defining L(
) =
y
y l(y) and recognizing the relationship of the products of Fourier transforms of two functions to the convolution of two functions and that we can interchange the order of the summations in the second term lead to
![]() |
(B3) |
![]() |
(B4) |
To simplify the exposition, let a = (1 -
0(0))/2N, and thus
![]() |
(B5) |
![]() |
(B6) |
![]() |
(B7) |
Thus,
![]() |
(B8) |
Similarly, transforming text Equation 18 for b > 1 gives
![]() |
(B10) |
Thus, for b = 2,
![]() |
(B10) |
or
![]() |
(B11) |
Repeating this process, we see that for b
1,
![]() |
(B12) |
It is possible to obtain some analytical solutions for the value of
b(y) as a function of y and b by taking the inverse of the Fourier transform. Indeed, the inversion involves the same roots as in the purely spatial case because the denominator is the same. That is, we need only consider the singularities as the denominator goes to zero. We use an approach similar to the residue theorem. We let H(
) = L(
)L(1/
) and recognize that we need only consider the singularity where H(
) = 1/(1 - k)2 (because we have assumed that k > 0), which we set equal to 1 + k1. We need consider only the poles
1 and
2 = 1/
1, which are very close to 1.0. Using the inversion formula of ![]()
![]() |
(B13) |
and
![]() |
(B14) |
where
![]() |
(B15) |
![]() |
(B16) |
Because both the numerator and the denominator go to zero in the limit, we can use l'Hopital's rule,
![]() |
(B17) |
or
![]() |
(B18) |
![]()
1) = 2
2(
1 - 1) + o(
1 - 1)2 = -2
+ o(
), where
2 is the variance in the distance of migration. When k is small, H'(
1) = -2
. Taking the limit, and substituting back for a and putting this altogether, we have
![]() |
(B19) |
It is of interest to take the Taylor series expansion of Lb(
1) about 1.0:
![]() |
(B20) |
It is easy to show that Lb(1) = 1.0, and that dLb(1)/d
1 = bl (for b > 1), where l is the average movement of gene migration. In the isotropic case, l is zero. Also in the isotropic case the value of d2Lb(1)/d2
1 = b
2. Thus,
![]() |
(B21) |
The first two terms of Equation B21 provide a good approximation if k is very small compared to
and 1/b (the latter is required because higher terms of the Taylor series may involve higher powers of b). Under these conditions the following approximation is good: Lb(
1) = 1 - b
2(
)2 = 1 + bk. Note also that when b is large the probability of IBD decreases essentially exponentially with the time lag as well as with distance of separation.
Fourier transform methods can be developed for space-time probabilities of IBD for systems with two or more spatial dimensions, although the notation becomes more complicated.
| LITERATURE CITED |
|---|
BATZER, M. A., S. S. ARCOT, J. W. PHINNEY, M. ALEGRIA-HARTMAN, and D. H. KASS et al., 1996 Genetic variation of recent Alu insertions in human populations. J. Mol. Evol. 42:22-29[Medline].
BATZER, M. A., S. T. SHERRY, P. L. DEININGER, and M. STONEKING, 1997 Alu repeats and human evolution-response. J. Mol. Evol. 45:7-8[Medline].
BODMER, W. F. and L. L. CAVALLI-SFORZA, 1968 A migration matrix model for the study of random genetic drift. Genetics 59:565-592
EPPERSON, B. K., 1993a Recent advances in correlation studies of spatial patterns of genetic variation. Evol. Biol. 27:95-155.
EPPERSON, B. K., 1993b Spatial and space-time correlations in systems of subpopulations with genetic drift and migration. Genetics 133:711-727[Abstract].
HUDSON, R. R., 1990 Gene geneologies and the coalescent process, pp. 144 in Oxford Surveys in Evolutionary Biology, edited by D. J. FUTUYMA and J. ANTONOVICS. Oxford University Press, Oxford.
KINGMAN, J. F. C., 1982 The coalescent. Stochast. Proc. Appl. 13:235-248.
KRINGS, M., A. STONE, R. W. SCHMITZ, H. KRAINITZKI, and M. STONEKING et al., 1997 Neanderthal DNA-sequences and the origin of modern humans. Cell 90:9-30[Medline].
MALÉCOT, G., 1946 La consanguinité dans une population limitée. C. R. Acad. Sci. 222:841-843.
MALÉCOT, G., 1948 Les Mathématiques de l'Hérédité. Masson, Paris.
MALÉCOT, G., 1950 Quelques schémas probabilistes sur la variabilité des populations naturelles. Ann. Univ. Lyon Sci. Sect. A 13:37-60.
MALÉCOT, G., 1972 Génétique des populations naturelles dans le cas d'un seul locus. II. Étude du coefficient de parenté. Ann. Génét. Sél. Anim. 4:385-409.
MALÉCOT, G., 1973 Génétique des populations diploïdes naturelles dans le cas d'un seul locus. III. Parenté, mutations et migration. Ann. Génét. Sél. Anim. 5:333-361.
MALÉCOT, G., 1975 Heterozygosity and relationship in regularly subdivided populations. Theor. Popul. Biol. 8:212-241[Medline].
MORTON, N. E., 1969 Human population structure. Annu. Rev. Genet. 3:53-74.
RANNALA, B., 1996 The sampling theory of neutral alleles in an island population of fluctuating size. Theor. Popul. Biol. 50:91-104[Medline].
STANLEY, S. E., 1997 Alu repeats and human-evolution. J. Mol. Evol. 45:6-7[Medline].
WRIGHT, S., 1943 Isolation by distance. Genetics 28:114-138
WRIGHT, S., 1965 The interpretation of population structure by F-statistics with special regard to systems of mating. Evolution 19:395-420.
This article has been cited by other articles:
![]() |
G. T. Skalski Joint Estimation of Migration Rate and Effective Population Size Using the Island Model Genetics, October 1, 2007; 177(2): 1043 - 1057. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. F. Turner, J. P. Wares, and J. R. Gold Genetic Effective Size Is Three Orders of Magnitude Smaller Than Adult Census Size in an Abundant, Estuarine-Dependent Marine Fish (Sciaenops ocellatus) Genetics, November 1, 2002; 162(3): 1329 - 1339. [Abstract] [Full Text] [PDF] |
||||
- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Epperson, B. K.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Epperson, B. K.

























































