Genetics, Vol. 149, 1975-1985, August 1998, Copyright © 1998

Analysis of Genetic Structure and Dispersal Patterns in a Population of Sea Beet

Jarle Tuftoa, Alan F. Raybouldb, Kjetil Hindara, and Steinar Engenc
a Norwegian Institute for Nature Research, 7005 Trondheim, Norway,
b Institute of Terrestrial Ecology, Furzebrook Research Station, Wareham, Dorset UK BH20 5AS, United Kingdom
c Department of Mathematical Sciences, Lade Section, Norwegian University of Science and Technology, 7034 Trondheim, Norway

Corresponding author: Jarle Tufto, Department of Mathematical Sciences, Lade Section, Norwegian University of Science and Technology, 7034 Trondheim, Norway., jarlet{at}math.ntnu.no (E-mail).

Communicating editor: B. S. WEIR


*  ABSTRACT
*TOP
*ABSTRACT
*THE DATA
*COMPUTING THE LIKELIHOOD
*MIGRATION MATRICES IN PLANT...
*BOOTSTRAPPING METHODS
*RESULTS
*DISCUSSION
*APPENDIX 1
*LITERATURE CITED

A model of the migration pattern in a metapopulation of sea beet (Beta vulgaris L. ssp. maritima), based on the continuous distributions of seed and pollen movements, is fitted to gene frequency data at 12 isozyme and RFLP loci by maximum likelihood by using an approximation of the simultaneous equilibrium distribution of the gene frequencies generated by the underlying multivariate stochastic process of genetic drift in the population. Several alternative restrictions of the general model are fitted to the data, including the island model, a model of complete isolation, and a model in which the seed and pollen dispersal variances are equal. Several likelihood ratio tests between these alternatives are performed, and median bias in the estimated parameters is corrected by using parametric bootstrapping. To assess the fit of the selected model, the predicted covariances are compared with covariances computed from the data directly. The dependency of estimated parameters on the ratio between effective and absolute subpopulation sizes, which is treated as a known parameter in the analysis, is also examined. Finally, we note that the data also appear to contain some information about this ratio.


LEVELS of gene flow can be estimated from different kinds of data, either directly by different forms of capture-recapture methods or indirectly from geographic genetic differentiation generated by local genetic drift. Even though all information in genetic data on which indirect methods are based lies in the variances and covariances of the gene frequencies only, indirect methods are potentially very useful because data at several loci represent independent realizations of the same underlying process. If appropriate assumptions are built into the analysis, indirect methods may also potentially produce gene flow estimates reflecting average levels several years into the past (SLATKIN 1985 Down), thus being more relevant in an evolutionary context.

Although some theoretical results that allow inferences to be drawn about the pattern of migration from observations of genetic geographic differentiation exist, these theoretical models are often highly idealized. In the island model of migration, for example, it is assumed that all migrations occur via some large outside world populations (WRIGHT 1943 Down). If we believe this assumption, the effective number of migrants that enter each subpopulation each generation, Nem, can be calculated from the estimated amount of genetic differentiation between the subpopulations. In KIMURA and WEISS 1964 Down analysis of decrease in genetic correlation in infinite stepping stone models, migration also occurs between neighboring subpopulations. It is also assumed that the subpopulations constitute completely panmictic, discrete, regularly spaced units. In most natural populations, however, individuals are more or less continuously distributed across space in varying densities. In most plant species, dispersal of genes, through seed and pollen, also occur over distances of almost any length. The distributions of these seed and pollen displacements are often highly leptokurtic.

To obtain reliable and useful estimates of levels of gene flow, it seems necessary to incorporate more realistic assumptions into the analysis and to take into account the geographic structure of the population under study, variation in local subpopulation sizes, and the form of dispersal. In this article, we analyze a set of gene frequency data (RAYBOULD et al. 1996 Down) from a metapopulation of sea beet (Beta vulgaris L. ssp. maritima) by using and further developing methods from TUFTO et al. 1996 Down. In this approach, the likelihood of different migration matrices, given the data, is computed using an approximation of the simultaneous stationary distribution of the gene frequencies generated by the underlying multivariate stochastic process of genetic drift in the subpopulations under consideration. After presenting the data and a brief summary of the basic approach, we introduce a general model of migration patterns in plant populations in which the migration matrix is related to the continuous dispersal probability distributions of seed and pollen movements. Two such dispersal distributions applying specifically to the data used are then given. The parameters of these distributions are then estimated from the gene frequency data by maximum likelihood.

Because only an approximate likelihood is computed and the data are limited, we can expect estimates of the different parameters to be biased. By making extensive use of bootstrapping methods, however, these biases can be corrected to a great extent. Using parametric bootstrapping from several restrictions of the general model of the migration pattern, likelihood ratio tests between some alternative restricted models, including the island model, are also performed. The fit of the selected model is then inspected by comparing predicted and observed covariances. Finally, we examine the dependency of the estimated parameters on the assumed ratio between effective and absolute subpopulation sizes. We also note that the data, which are purely nontemporal, appear to contain some information about local effective subpopulation sizes, possibly because time in the underlying process is discrete.


*  THE DATA
*TOP
*ABSTRACT
*THE DATA
*COMPUTING THE LIKELIHOOD
*MIGRATION MATRICES IN PLANT...
*BOOTSTRAPPING METHODS
*RESULTS
*DISCUSSION
*APPENDIX 1
*LITERATURE CITED

The data consist of allele frequencies of 12 isozyme and RFLP loci from 220 sea beet plants at Furzey Island, Poole Harbour, UK, that were previously analyzed in RAYBOULD et al. 1996 Down. The coordinates of all sampled plants in space and the subdivision used to construct subpopulations are shown in Figure 1.



View larger version (11K):
In this window
In a new window
Download PPT slide
 
Figure 1. The geographic location of the 220 sea beet plants (represented by the dots) divided into subpopulations 1–21. The coordinate system is in units of meters. In the RESULT and DISCUSSION sections, subpopulations 1–16 and 17–21 will be referred to as transects 1 and 2, respectively.

Because almost all plants seen in the study area were sampled, the absolute subpopulation sizes, which are part of the model input, could be counted directly. This also means that the gene frequencies in subpopulations could be determined without any sampling error.

The model also involves the effective sizes of each of the subpopulations, which are generally much harder to determine. The effective size Ne of a population will generally equal the absolute population size N, multiplied by some factor that depends on demographic parameters. Sea beet is a self-incompatible, iteroparous perennial with a life span of at least 3 years in cultivation (BRUUN et al. 1995 Down; VAN DIJK et al. 1997 Down). Personal observations suggest that plants in the Dorset populations live for at least 5 years. Although sea beet is gynodioecious in France (BOUTIN et al. 1988), the Dorset plants are all hermaphrodite. According to NUNNEY 1993 Down, there is a general tendency for Ne to approach N/2 as generation length and generation overlap increases. We have therefore used this limiting value for the effective size of each subpopulation, although estimates of this ratio are known to vary widely from ~0.2 to 0.8 (FRANKHAM 1995 Down).


*  COMPUTING THE LIKELIHOOD
*TOP
*ABSTRACT
*THE DATA
*COMPUTING THE LIKELIHOOD
*MIGRATION MATRICES IN PLANT...
*BOOTSTRAPPING METHODS
*RESULTS
*DISCUSSION
*APPENDIX 1
*LITERATURE CITED

TUFTO et al. 1996 Down present a general approach for estimating the matrix M of migration rates between a set of populations, i = 1,2,...,n, from geographic neutral genetic variation, in part using ideas in COURGEAU 1974 Down and FELSENSTEIN 1982 Down. The model is based on the underlying multivariate stochastic process

(1)
of genetic drift in the vector pt = (p1,t...,pn,t) of the gene frequencies in each subpopulation. The elements of the column vector et representing genetic drift have variances equal to , where Ni is the effective size of subpopulation i. For this process, the exact covariance matrix C of the stationary distribution of the standardized gene frequencies can be found by solving the matrix equation

(2)
where

(3)
using numerical techniques (TUFTO et al. 1996 Down; Appendix 1). This can be done for migration matrices of any form, for systems with a moderate number of subpopulations.

The basic idea of the approach is to assume that the migration matrix is of a certain form, e.g., stepping stone like, then compute the covariance matrix from the migration matrix and the effective population sizes, and then use the multivariate normal distribution to repeatedly compute the probability of the observed gene frequencies for different parameter values until the likelihood is maximized. With gene frequency data on several independent loci, the total likelihood is equal to the product of the multinormal probabilities calculated for each loci.

Conditioning on:
The model initially also involves an (n + 1)th large "outside world" population with gene frequency remaining constant over time and equal to q. To eliminate this nuisance parameter from the model, the distribution of the gene frequencies, conditioned on a sufficient statistic for q, have to be considered. In the case of multivariate normality, which would hold if the fluctuations around the equilibrium gene frequency are small, the weighted mean observed gene frequency = {Sigma}iwipi is known to be sufficient for q, provided that the wi's are chosen to minimize the variance of . The sampling distribution of the model, conditioned on , is then, by definition, independent of q. The major difficulty of the approach is to find a good approximation of the covariance matrix of this conditional distribution, more generally, when multivariate normality does not hold (see Appendix 1). The approximation of the conditional covariances do not have to be completely accurate, however. As we shall see, as long as we are able to simulate bootstrap data from the correct sample distribution, biases, which in part may be caused by approximations involved in the computation of the likelihood, can be corrected to a great extent.


*  MIGRATION MATRICES IN PLANT POPULATIONS
*TOP
*ABSTRACT
*THE DATA
*COMPUTING THE LIKELIHOOD
*MIGRATION MATRICES IN PLANT...
*BOOTSTRAPPING METHODS
*RESULTS
*DISCUSSION
*APPENDIX 1
*LITERATURE CITED

General considerations:
Following BODMER and CAVALLI-SFORZA 1968 Down, each element mij of the matrix M in (1), the backward migration matrix, is defined as the conditional probability that a gene originates from population j, given that it dispersed to population i. The purpose of this section is to define a biologically realistic model for the migration pattern, which is essentially a specification of M as a function of the basic parameters involved.

The migration probabilities will generally depend on the relative number of individuals in each subpopulation because large subpopulations will contribute with a greater proportion of genes to subsequent generations. It must also be remembered that gene movements in plant populations occur in either one or two stages: through seed dispersal only or through both pollen and seed dispersal, possibly via some third population.

Let M(s) and M(p) be the backward migration matrices for seed and pollen dispersal, i.e., m(s)ij and m(p)ij denote the conditional probabilities that a seed or a pollen grain, respectively, originates from subpopulation j, given that it dispersed to subpopulation i. Consider the event that dispersal occurs through both pollen and seed via some third population k. The conditional total probability that the gene originates from j is then {Sigma}k m(p)kj m(s)ik . Given the other event that the gene dispersed through a seed only, the probability that it originates from j will be m(s)ij . Because all zygotes in any generation are formed by the union of a pollen grain and a seed, these two events both have probability 1/2, implying that mij = 1/2 {Sigma}km(p)kj m(s)ik + m(s)ij or, in matrix notation,

(4)

For each form of dispersal, {nu} {isin} {s,p}, let the forward matrix M˜({nu}) with elements ({nu})ij represent the probabilities that a seed or a pollen grain disperses to subpopulation i, given that it originates from subpopulation j. As noted by BODMER and CAVALLI-SFORZA 1968 Down, the relationship between the elements of the forward and backward migration matrices M˜({nu}) and M({nu}) is

(5)

For self-incompatible species, the Nj's in (5) for {nu} = p should be replaced by Nj - 1 for j = i.

Let f({nu})z(z), {nu} {isin} {s,p} , represent the distribution of the random displacement Z of seed or pollen occurring during dispersal in the relevant spatial dimension(s). In general, for a seed or pollen originating from some point in subpopulation j, the probability that it disperses to some point in another population i will be

(6)
where the expectation is taken over points Zj and Zi in subpopulations i and j. The constant ci represents the effects of local environmental conditions on the success of the dispersing seed or pollen grain in the receiving subpopulation i. The local environment ci may, for example, involve local population density and the spatial size of the receiving subpopulation. These effects, however, do not influence the resulting backward migration probabilities, which is seen when (6) is substituted into (5).

For neighboring subpopulations and, at least, for dispersal within subpopulation, the expectation in (6) must be computed numerically. For the sea beet data used in this article, for which the positions of all individual plants were available, this is straightforward to do. When the distance between two populations is large, the expectation can be computed using the first-order approximation f(v)z(EZj - EZj) .

Seed and pollen dispersal distribution in sea beet:
In sea beet, pollen is wind dispersed, and the resulting dispersal distributions are thus two dimensional. TUFTO et al. 1997 Down discuss several plausible forms that f(p)z(z) may take in such cases, arising from different sets of assumptions about the underlying physical movement process. If we assume that wind conditions change slowly over time, that pollen are released from the anthers when the wind velocity exceeds a certain threshold, that the movement of each individual pollen grain is a diffusion in three dimensions, and that there is no wind directionality in the long run, the various parameters collapse into a single parameter, {lambda}p, and the dispersal distribution function takes the form

(7)
where 1/2{lambda}p is the expectation of R = (for details see TUFTO et al. 1997 Down). Compared with several alternative models, an extension of (7) gave the best fit to experimental data on airborne pollen dispersal in meadow fescue (Festuca pratensis L.) using genetic markers (NURMINIEMI et al. 1997 Down).

The populations of sea beet considered in this article were located along the shoreline, and seed dispersal was known to occur mostly by the help of tidal currents moving back and forth along the shoreline in approximately one dimension, w. If we assume that seeds are equally likely to be released into the water at any point in time, so that dispersals in both directions have equal probability, and also assume that the probability of deposition in a small length interval w, w + dw is {lambda}sdw, then the random displacement W of each seed will follow the Laplace distribution

(8)
also known as the double exponential distribution. This distribution also arises by assuming that the time to deposition is exponential and that the movement path W(t) is a Brownian motion without drift (KENDALL et al. 1983 Down, p. 191).

Note that the seed and pollen dispersal distributions (7) and (8) differ greatly in their degree of kurtosis, i.e., to what extent probability is concentrated around the mean and in the tails of the distribution. This difference is in part a result of the assumptions that pollen dispersal occurs in two dimensions whereas seeds only disperse in approximately one dimension (along the shoreline).

We also assume that the backward probabilities of migration from the outside world are equal to u for all subpopulations. This may be a questionable assumption in part because the subpopulations differ in their density and in their distance from the outside world. However, if these backward migration rates are small, this assumption may not be very critical. The parameter u can also be thought of as a mutation rate, in which case, it would be the same for all subpopulations.


*  BOOTSTRAPPING METHODS
*TOP
*ABSTRACT
*THE DATA
*COMPUTING THE LIKELIHOOD
*MIGRATION MATRICES IN PLANT...
*BOOTSTRAPPING METHODS
*RESULTS
*DISCUSSION
*APPENDIX 1
*LITERATURE CITED

Simulation procedure:
Several strategies can be used to simulate the distribution of the data. One strategy would be to reconstruct the distribution from the data itself by resampling gene frequency vectors with equal probability from each of the observed loci. This, however, would not take advantage of the knowledge we have about the process generating the data. A better strategy is to stimulate the process directly by iterating (1) for a sufficient number of generations until the equilibrium distribution is attained. It must be kept in mind, however, that it is the distribution conditional on the observed gene frequency means at each locus that is relevant. This conditional distribution can be generated approximately by first simulating, say, 500 generations, then keeping track of the gene frequency mean for each iteration of the process, and stopping the iterations just after (or before) the gene frequency mean passes the target mean. This simulation technique eliminates the extra variance in the parameter estimates that would arise if the unconditional distribution, which very likely would contain nonpolymorphic data, was used.

Bias correction:
The sampling distribution of an estimator usually has a peak near or at the unknown parameter {theta} of interest. We say that an estimator is mean unbiased if E{theta}ˆ = {theta}. By simulating the distribution of the estimator, provided that the expectation of the estimator exists, one can always compute a less biased estimate by subtracting the estimated amount of bias from the original first estimate (EFRON and TIBSHIRANI 1993 Down, p. 138).

This procedure, however, is not invariant with respect to transformations of the parameter, and in cases where the distribution of the estimator is very skewed, it can often make more sense to construct a so-called median bias-corrected estimate (CABRERA and WATSON 1997 Down). This is done by estimating the median, M, of the estimates as a function g({theta}) = M of the parameter. The improved estimate is then the value of the unknown parameter for which the median of the bootstrap estimates matches the observed estimate, i.e., = g-1() (see Figure 2 and Figure 4). As shown by CABRERA and WATSON 1997 Down, provided that g({theta}) is a monotonic function, it then follows that the median bias-corrected estimate itself is median unbiased because M = M(g-1()) = g-1(M()) = g-1(g({theta})) = {theta} .



View larger version (23K):
In this window
In a new window
Download PPT slide
 
Figure 2. The sampling distribution of û for different values of the long-range migration rate u under the island model. The solid line is the median of û as a function of u, estimated using minimum absolute residual regression (function l1fit in S-Plus) with a third-order polynomial as regressor. The value of the improved, bias-corrected estimate corresponding to the observed value of û is indicated by the dashed line on the graph.



View larger version (16K):
In this window
In a new window
Download PPT slide
 
Figure 3. Histogram of 100 simulations of the log likelihood ratio, {Lambda}, between models (11) and (9). The observed value of {Lambda} is indicated by the dashed line.



View larger version (23K):
In this window
In a new window
Download PPT slide
 
Figure 4. The sampling distribution of {sigma}ˆ for different values of the seed and pollen dispersal standard deviation {sigma} under the complete isolation model (12). For further explanation, see Figure 2.

Likelihood ratio tests:
Several tests between different hypothesis about the unknown migration pattern will be of interest. In general, in tests of a model H0 against some other model H, the ratio {Lambda} of the maximum likelihood under H and H0 is usually used as a test statistic. If there is a small probability, given that H0 is true, that {Lambda} is larger than the value of {Lambda} computed from the observed data, {Lambda}*, H0 can be rejected in favor of H. This probability, P({Lambda} > {Lambda}*|H0), will also generally depend on the true value of the nuisance parameters {theta}0 under H0. A reasonable choice of {theta}0 to simulate from is the value of {theta}0 that is in best agreement with the observed data, which in our case is the median bias-corrected estimate obtained by fitting H0. This choice of {theta}0 will in many (EFRON and TIBSHIRANI 1993 Down, p. 232) but not all cases make the test conservative; i.e., the test is less likely to falsely reject the null hypothesis.


*  RESULTS
*TOP
*ABSTRACT
*THE DATA
*COMPUTING THE LIKELIHOOD
*MIGRATION MATRICES IN PLANT...
*BOOTSTRAPPING METHODS
*RESULTS
*DISCUSSION
*APPENDIX 1
*LITERATURE CITED

Model selection:
Four different restrictions of the general model of the migration pattern were considered. We first fitted the model under the constraint that the variances of the seed and pollen dispersal displacements, Var(X) = Var(Y) = , and Var(W) = , are both equal, using u as the other free parameter, i.e.,

(9)

The uncorrected point estimates of the seed and pollen dispersal standard deviation and the long-range migration rate obtained under this model were = 60.4m and û = 0.062, and the maximum likelihood was 122.57.

We then tested this model against the alternative extended model using both {lambda}s and {lambda}p as free parameters in addition to u:

(10)

This gave only a very slight increase amounting to 0.05 in the maximum likelihood, which is clearly statistically nonsignificant if relying on asymptotic theory as a rough approximation. Two hypotheses nested within (9) were then considered, the first defined by the constraint

(11)
corresponding to the classical island model (CROW and KIMURA 1970 Down). Note that we have assumed that the rate of migration, and not the effective number of migrants, Niu, is constant. The bias-corrected estimate of u obtained by fitting (11) was = 0.277 (Figure 2). Simulating the distribution of {Lambda} at this point in the parameter space, we could reject the island model in favor of model (9) (Figure 3).

The second hypothesis nested within (9) is defined by the constraints

(12)
corresponding to the hypothesis that the metapopulation under study is completely isolated from the outside world. The distribution of the data (and estimators and test statistics) under this null hypothesis can be simulated approximately by replacing u = 0 in the iterations of (1) with some small positive value, say, 10-4. This is necessary to avoid letting the iterations go into an infinite loop, but if a sufficiently small number is used, the approximation will have the necessary accuracy.

Fitting (12) to the data gave an uncorrected estimate of the seed and pollen dispersal standard deviation of {sigma}ˆ = 92.0 m and a maximum likelihood of 116.8. Simulating the distribution of {sigma}ˆ for different values of {sigma} (Figure 4), we obtained a bias-corrected estimate, = 74.56 m . On the basis of the simulation of the distribution of the likelihood ratio between models (9) and (12), using the bias-corrected estimate for {sigma} , we could not reject the hypothesis of complete isolation (12). A histogram of these log likelihood ratios are shown in Figure 5.



View larger version (56K):
In this window
In a new window
Download PPT slide
 
Figure 5. Histogram of 100 simulations of the log likelihood ratio, {Lambda}, between models (12) and (9). The observed value of {Lambda} is indicated by the dashed line.

Observed and predicted covariances:
To assess the fit of the selected model, we will compare the (conditional) standardized covariances predicted by the model with those computed from the data directly. These are shown in Figure 6. Some interesting points can be noted. First, there appears to be reasonably good agreement between the model predictions and the observations. The main level of isolation is between the two transects (see Figure 1). This can be seen in the predicted values as "steps" in the covariances between subpopulations 16 and 17. Within the less dense transect 2 (subpopulations 17–21), there is also a steeper decline in the covariances with increasing geographic distance than within transect 1, where there is only a slight effect of isolation by distance. There also appears to be reasonably good agreement between predicted and observed variances, in that relatively small and/or isolated subpopulations, e.g., subpopulations 15 and 16, have larger predicted and observed variances.



View larger version (55K):
In this window
In a new window
Download PPT slide
 
Figure 6. Observed covariances (dotted lines) and covariances predicted by the selected model (12) (solid lines). Each subplot represents single rows in the respective matrices. The first plot in the third column, e.g., is the predicted and observed covariances of subpopulation number 15 with all other subpopulations (including the variance with itself).

The estimated migration matrix:
The migration matrix computed at the bias-corrected estimate of {sigma} is shown in Figure 7. On average, there is ~70% migration of genes from neighboring populations. Large, isolated, dense subpopulations, such as subpopulation number 20, receive a smaller proportion from neighboring subpopulations, whereas very small subpopulations (e.g., number 10 consisting of 4 individuals) receive a larger proportion of genes from its neighbors than from itself. It is also apparent in several subplots that large subpopulations (e.g., subpopulation number 4 consisting of 29 individuals) also make a large contribution to distant subpopulations because of their size.



View larger version (42K):
In this window
In a new window
Download PPT slide
 
Figure 7. The estimated migration matrix under the model of complete isolation computed at the bias-corrected estimate of {sigma}. Each subplot represents a single row in the backward migration matrix.

Having estimated the variances in the seed and pollen dispersal displacements under the assumption that these are equal, the total variance of the gene displacements, measured along the shoreline, if we assume independence between X and W, becomes Var(W) + 1/2Var(X) (CRAWFORD 1984 Down), which gives a gene displacement standard deviation equal to 91.31 m for model (12).

Dependencies on Ne/N:
Because the data are nontemporal, we do not expect much information about the ratio between the effective and absolute population size of each subpopulation, Ne/N, to be available in the data, and this ratio has, therefore, so far been treated as a known parameter in the analysis described above. Because our choice of = is clearly rather arbitrary, we will, however, look at how our estimate of {sigma} under the selected model (12) depends on the choice of Ne/N. The maximum likelihood estimate of {sigma} obtained by fitting the model for each of several different Ne/N values is shown in Figure 8.



View larger version (15K):
In this window
In a new window
Download PPT slide
 
Figure 8. The (uncorrected) maximum likelihood estimate of {sigma} for different values of Ne/N. The dotted reference line has a slope equal to -1/2.

First note that on a log–log scale, for values of Ne/N larger than ~0.2, there appears to be a linear relationship with {sigma}ˆ, with a slope of ~-1/2; i.e., the estimate of the dispersal standard deviation, {sigma}, is proportional to (Ne/N)-1/2 or, equivalently, the dispersal variance is inversely proportional to the effective population sizes. This is analogous to the situation under the island model, where only the product between the effective size and the rate of migration into each subpopulation, or the "effective number of migrants," is possible to estimate.

Also note that the estimate of {sigma} appears to tend to infinity when Ne/N becomes smaller than ~0.15. The estimate of {sigma} is then about the same order of magnitude as the size of the study area. Any further increase in {sigma} then has little influence on the migration matrix, and hence the likelihood, which means that the maximum likelihood estimate of {sigma} tends to infinity.

We expected the data to contain almost no information about Ne/N, making the maximum likelihood almost independent of this parameter. To verify this, we plotted the maximum likelihood obtained by fitting the model for each of the different Ne/N values. This is shown in Figure 9. Surprisingly, the likelihood is strongly dependent on Ne/N and has its maximum at ~0.43.



View larger version (12K):
In this window
In a new window
Download PPT slide
 
Figure 9. The maximum likelihood as a function of Ne/N.


*  DISCUSSION
*TOP
*ABSTRACT
*THE DATA
*COMPUTING THE LIKELIHOOD
*MIGRATION MATRICES IN PLANT...
*BOOTSTRAPPING METHODS
*RESULTS
*DISCUSSION
*APPENDIX 1
*LITERATURE CITED

We have shown how a quite realistic model based on the underlying genetic multivariate drift process in a subdivided population can be fitted to gene frequency data by approximate maximum likelihood methods. The model uses information known about absolute and effective population sizes, as well as geographic distances within and between subpopulations. Assumptions about the underlying mechanisms of seed and pollen movement are also incorporated. A data set consisting of allele frequencies at 12 isozyme and RFLP loci in 21 subpopulations of sea beet was used, and inferences about at least one parameter, the seed and pollen displacement variance, could be made. We were also able to reject the island model in favor of our more general model. It could also be concluded that no significant evidence for migration from the outside world or for mutation is present in the data.

There appears to be quite good agreement between the observed and predicted covariances. However, the fact that the observed covariances are dependent, also given that the selected model is true, makes comparison with the covariances predicted by the fitted model dangerous. If we look at Figure 6, there appear to be some discrepancies between the observed and predicted correlations with subpopulation number 15, in that subpopulation 15 appears to be more similar to subpopulations 16–21 and less similar to subpopulations 1–14 than predicted by the fitted model. This may suggest that some isolating barrier is present between subpopulations 14 and 15. Because the correlations are dependent, however, this discrepancy may very well be an effect of chance, and it would be desirable to do some more formal test of goodness of fit. As suggested by FELSENSTEIN 1982 Down, this can be done by testing the selected model against a "full" model in which all the n(n - 1)/2 covariances of the n - 1 contrasts are used as free parameters. Such a test would require data on a large number of loci, however, partly because the number of parameters under the full model is very large. The observed covariance matrix will also be singular if the number of loci is less than n - 1, which excludes the possibility of doing this test in most cases.

If the degree of kurtosis of the gene displacements has an effect on the genetic structure, we could hope to be able to obtain separate estimates of the seed and pollen dispersal variances because these distributions, (7) and (8), differ greatly in their degree of kurtosis. However, the fact that the extended model (10) gave almost no increase in the likelihood suggests that the data contain very little information about this.

In a previous, more traditional analysis of the data than the one used here, pairwise genetic distances were regressed against geographic distances (RAYBOULD et al. 1996 Down). These authors also found a "step" in the genetic distances between the two transects by including dummy variables representing transect membership in the regression, and they concluded that this suggested continuous extinction and recolonization of the subpopulations. However, the analysis used here shows that the "step" in the covariances between subpopulations 16 and 17 can be fully explained by the geographic location of the subpopulations only (Figure 1) and the assumed pattern of migration.

We still need to be concerned about the assumption that population sizes remain constant over time, however. If there are large autocorrelated fluctuations in local population sizes, this is likely to constantly keep the gene frequencies away from their equilibrium distribution if the number of generations necessary to reach equilibrium is large. It is, however, only relevant to consider the rate of convergence of the covariances of the contrast, and these will converge much faster than the covariance matrix of the entire distribution (on the order of only several generations; HARPENDING and WARD 1982 Down, p. 245).

It is also very interesting that there appears to be information in the data about the ratio between effective and absolute subpopulation sizes. This is quite suprising when considering the fact that the data are purely nontemporal. It is well known that in continuous time diffusion approximations of, e.g., the island model, the rate of migration and the effective population size of each island are completely confounded. However, in our study population and in the model, generations are discrete. The effective subpopulation sizes are also quite small, between 1 and 14.5 individuals, and the rates of migration between neighboring populations are relatively high, which suggest that there will be quite large changes in the equilibrium covariances at different points in the life cycle. The magnitude of these changes depends on Ne/N. Even though the maximum likelihood estimate of = 0.43 obtained seems plausible, it is not clear how reliable this estimate is, however. One potential problem is that the model has nonoverlapping generations, whereas generations overlap in the study population, possibly also with age-dependent mortalities and fecundities. It is also not entirely clear if the point in time in the life cycle of the predicted covariances coincides with the point in time at which the actual sampling occurred.

There has been some concern in the literature that direct and indirect methods have produced very different estimates of gene flow. There is a general tendency for indirect estimates to be much larger than estimates obtained by direct methods (e.g., KOENIG et al. 1996 Down). According to these authors, the discrepancy results from systematic biases in direct estimates caused by finite study areas. We note here that our estimate of seed and pollen dispersal standard deviations of 74.56 m are comparable to estimates from direct studies of seed dispersal that range between 27 and 100 m (SLATKIN 1985 Down) and to estimates of pollen dispersal of 30 m (NURMINIEMI et al. 1997 Down).

Availability of software:
The S-Plus and C code used to do this analysis and accompanying documentation is available on the internet at http://www.math.ntnu.no/~jarlet/migration/


*  ACKNOWLEDGMENTS

We thank RALPH CLARKE for helpful discussions. This study was financially supported by the Norwegian Research Council and the Norwegian Institute for Nature Research A.F.R. also received financial support from the Department of Environment's Genetically Modified Organisms Research Program.

Manuscript received August 27, 1997; Accepted for publication March 31, 1998.


*  APPENDIX 1
*TOP
*ABSTRACT
*THE DATA
*COMPUTING THE LIKELIHOOD
*MIGRATION MATRICES IN PLANT...
*BOOTSTRAPPING METHODS
*RESULTS
*DISCUSSION
*APPENDIX 1
*LITERATURE CITED

APPROXIMATIONS OF THE CONDITIONAL COVARIANCE MATRIX
In TUFTO et al. 1996 Down, an approximation based on (2) directly was shown to break down as the fluctuations around q become large. On the basis of the idea that the amount of genetic drift in the neighborhood of the observed mean gene frequency is most important in determining Cy| , and in an attempt to also incorporate the fact that the genetic drift depends on the gene frequencies, TUFTO et al. 1996 Down developed a second approximation of Cy| by introducing a linear matrix transformation of the state vector in the recurrence Equation 1, changing the dynamics of the process and forcing it to move within the subspace where {sum}iwipi = only. This was done by letting

(A1)

This leads to a matrix equation in C (TUFTO et al. 1996 Down, Eq. 18) similar to (2). As long as there is a limited amount of differentiation, maximum likelihood estimates of the parameters of a finite stepping stone model based on this approximation using simulated data were shown to be reasonably unbiased.

TUFTO et al. 1996 Down, however, did not examine the behavior of the approximation based on (A1) as between-population differentiation increases in any great detail. We have now found the solution of (18) in TUFTO et al. 1996 Down, in situations with large between-population differentiation, to sometimes result in matrices C having negative determinants; i.e., matrices obtained using the proposed method are not always proper covariance matrices. Even though the method was originally intended to work only in situations with limited differentiation, this deficiency of the method is clearly unsatisfactory, as it makes the log likelihood undefined in some parts of the parameter space. We have therefore developed a third, somewhat less complicated, approximation.

As noted in TUFTO et al. 1996 Down, it is the dynamics of the process in the neighborhood of the observed mean gene frequency that is important in determining the conditional covariances. The third approximation is constructed by assuming that the stochastic elements of the column vector e in (1) have constant variances, equal to . This leads to the same matrix Equation 2 as for the original process, except that the elements of E are replaced by

(A2)

If we also assume that the elements of et are distributed normally, we have a multivariate autoregressive process that is known to have the multivariate normal as its stationary distribution. The unconditional and conditional covariance matrix of the vector y = Kp, where yi = pi - , i = 1,...,n - 1 , is then given by the simple transformation

(A3)

By computing the conditional likelihood based on an approximation that replaces the original process with a process having constant genetic drift, we hope to obtain reasonably unbiased estimates, at least in situations with a limited amount of genetic differentiation. The performance of the method in any other particular situation will have to be checked by bootstrapping techniques.

It might be mentioned that matrix Equation 2 using (A2) can be solved straightforwardly in S-plus after diagonalization of M and MT and some rearrangement, with operations involving only (n x n) matrices, not suffering from the rapid increase with n in computing time required to solve (2) and (18) in TUFTO et al. 1996 Down.


*  LITERATURE CITED
*TOP
*ABSTRACT
*THE DATA
*COMPUTING THE LIKELIHOOD
*MIGRATION MATRICES IN PLANT...
*BOOTSTRAPPING METHODS
*RESULTS
*DISCUSSION
*APPENDIX 1
*LITERATURE CITED

BODMER, W. F. and L. L. CAVALLI-SFORZA, 1968  A migration matrix model for the study of random genetic drift. Genetics 59:565-592[Free Full Text].

BOUTIN, V., R. JEAN, M. VALERO, and P. VERNET, 1989  Gynodioecy in Beta maritima.. Acta Oecologia 9:61-66.

BRUUN, L., A. HALDRUP, S. G. PETERSEN, L. FRESE, and T. DE BOCK et al., 1995  Self-incompatibility reactions in wild species of the genus beta and their relation to taxonomical classification and geographical origin. Genet. Resourc. Crop Evol. 42:293-301.

CABRERA, J. and G. S. WATSON, 1997  Simulation methods for mean and median bias reduction in parametric estimation. J. Stat. Plan. Infer. 57:143-152.

COURGEAU, D., 1974 Migration, pp. 351–387, in The Genetic Structure of Populations, edited by A. JACQUARD. Springer-Verlag, Berlin.

CRAWFORD, T. J., 1984 What is a population? pp. 135–173, in Evolutionary Ecology, edited by B. SHORROCKS. Blackwell Scientific Publications, Oxford.

CROW, J. F., and M. KIMURA, 1970 An Introduction to Population Genetics Theory. Harper & Row, New York.

EFRON, B., and R. J. TIBSHIRANI, 1993 An Introduction to the Bootstrap. Chapman & Hall, London.

FELSENSTEIN, J., 1982  How can we infer geography and history from gene frequencies? J. Theor. Biol. 96:9-20[Medline].

FRANKHAM, R., 1995  Conservation genetics. Annu. Rev. Genetics 29:305-327[Medline].

HARPENDING, H. C., and R. H. WARD, 1982 Chemical systematics and human populations, pp. 213–256, in Biochemical Aspects of Evolutionary Biology, edited by M. H. NITECKI. University of Chicago Press, Chicago.

KENDALL, M., A. STUART and J. K. ORD, 1983 Kendall's Advanced Theory of Statistics, Vol. 1. Charles Griffin and Company Ltd., London.

KIMURA, M. and G. H. WEISS, 1964  The stepping stone model of population structure and the decrease of genetic correlation with distance. Genetics 49:561-576[Free Full Text].

KOENIG, W. D., D. V. VUREN, and P. N. HOOGE, 1996  Detectability, philopatry, and the distribution of dispersal distances in vertebrates. Trends Ecol. Evol. 11:514-517.

NUNNEY, L., 1993  The influence of mating system and overlapping generations on effective population size. Evolution 47:1329-1341.

NURMINIEMI, M., J. TUFTO, N.-O. NILSSON, and O.-A. ROGNLI, 1997  Spatial models of pollen dispersal in the forage grass meadow fescue. Evol. Ecol. 12:487-502.

RAYBOULD, A. F., J. GOUDET, R. J. MOGG, C. J. GLIDDON, and A. J. GRAY, 1996  Genetic structure of a linear population of Beta vulgaris ssp. maritima (sea beet) revealed by isozyme and RFLP analysis. Heredity 76:111-117.

SLATKIN, M., 1985  Gene flow in natural populations. Ann. Rev. Ecol. Syst. 16:393-430.

TUFTO, J., S. ENGEN, and K. HINDAR, 1996  Inferring patterns of migration from gene frequencies under equilibrium conditions. Genetics 144:1911-1921[Abstract].

TUFTO, J., S. ENGEN, and K. HINDAR, 1997  Stochastic dispersal processes in plant populations. Theor. Pop. Biol. 52:16-26[Medline].

VAN DIJK, H., P. BOUNDRY, H. MCCOMBIE, and P. VERNET, 1997  Flowering time in wild beet (Beta vulgaris ssp. maritima) along a latitudinal cline. Acta Oecologia 18:47-60.

WRIGHT, S., 1943  Isolation by distance. Genetics 28:114-138[Free Full Text].