- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Hardy, O. J.
- Articles by Heuertz, M.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Hardy, O. J.
- Articles by Heuertz, M.
Microsatellite Allele Sizes: A Simple Test to Assess Their Significance on Genetic Differentiation
Olivier J. Hardya, Nathalie Charbonnelb, Hélène Frévillec, and Myriam Heuertza,da Laboratoire de Génétique et Ecologie Végétales, Université Libre de Bruxelles, 1160 Brussels, Belgium,
b CEFE-CNRS, 34293 Montpellier Cedex 5, France,
c Department of Biology, Open University, Milton Keynes MK7 6AA, United Kingdom
d CRP-Gabriel Lippmann, CREBS Research Unit, 1511 Luxembourg, Luxembourg
Corresponding author: Olivier J. Hardy, Chaussée de Wavre 1850, B-1160 Brussels, Belgium., ohardy{at}ulb.ac.be (E-mail)
Communicating editor: M. W. FELDMAN
| ABSTRACT |
|---|
The mutation process at microsatellite loci typically occurs at high rates and with stepwise changes in allele sizes, features that may introduce bias when using classical measures of population differentiation based on allele identity (e.g., FST, Nei's Ds genetic distance). Allele size-based measures of differentiation, assuming a stepwise mutation process [e.g., Slatkin's RST, Goldstein et al.'s (
µ)2], may better reflect differentiation at microsatellite loci, but they suffer high sampling variance. The relative efficiency of allele size- vs. allele identity-based statistics depends on the relative contributions of mutations vs. drift to population differentiation. We present a simple test based on a randomization procedure of allele sizes to determine whether stepwise-like mutations contributed to genetic differentiation. This test can be applied to any microsatellite data set designed to assess population differentiation and can be interpreted as testing whether FST = RST. Computer simulations show that the test efficiently identifies which of FST or RST estimates has the lowest mean square error. A significant test, implying that RST performs better than FST, is obtained when the mutation rate, µ, for a stepwise mutation process is (a)
m in an island model (m being the migration rate among populations) or (b)
1/t in the case of isolated populations (t being the number of generations since population divergence). The test also informs on the efficiency of other statistics used in phylogenetical reconstruction [e.g., Ds and (
µ)2], a nonsignificant test meaning that allele identity-based statistics perform better than allele size-based ones. This test can also provide insights into the evolutionary history of populations, revealing, for example, phylogeographic patterns, as illustrated by applying it on three published data sets.
MICROSATELLITE genetic markersalso called short tandem repeats (STRs) or simple sequence repeats (SSRs) because their polymorphism is based on the variation in the number of repeats of a simple DNA sequence (26 bases long)are nowadays a tool of choice to address population genetics and demographic questions (e.g., ![]()
Microsatellite loci are typically characterized by high mutation rates and hence a high level of polymorphism as well as by a mutation process that causes preferentially stepwise changes of the number of repeats [stepwise mutation model (SMM), Table 1] and thus allele size (e.g., ![]()
![]()
|
Most statistics that describe genetic differentiation from genetic markers (e.g., F-statistics) rely solely on allele identity information. This information is often used to infer phylogenetic relationships or to obtain indirect estimates of gene flow. In the first case, studied populations are assumed to have diverged by drift and mutation without gene flow, so that genetic differentiation informs on the time since the beginning of divergence (e.g., ![]()
![]()
1/(1 + 4Nm) (![]()
(Qw - Qb)/(1 - Qb), where Qw (Qb) is the probability that two genes from the same population (different populations) are identical in state (![]()
![]()
1/(1 + 4N(m + µ)) (![]()
![]()
![]()
![]()
Alternative solutions to this problem have been proposed using statistics accounting for allele size information, such as R-statistics (![]()
![]()
![]()
(Sb - Sw)/Sb, where Sw (Sb) is the mean square difference in allele size for two genes from the same population (different populations; ![]()
![]()
1/(1 + 4Nm) in an island model] but without assumption on the mutation rate so that, contrary to FST, the relationship remains valid for µ
m in an island model (![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
On the basis of simulation results, ![]()
![]()
![]()
![]()
Comparing FST and RST values computed on the same data can provide valuable insights into the main causes of population differentiation, i.e., drift vs. mutation, because these statistics share equal expectations when differentiation is caused solely by drift, whereas RST is expected to be larger than FST under a contribution of stepwise-like mutations (e.g., ![]()
![]()
![]()
![]()
This article proposes a simple testing procedure based on allele size randomizations to determine if mutations following a SMM-like process contribute to genetic differentiation. The test can reveal whether allele identity-based or allele size-based statistics should be most adequate to analyze microsatellite data sets. A nonsignificant test suggests then that FST should be preferred over RST or, more generally, that statistics based on allele identity are likely to perform better than counterparts based on allele size information. When mutations are known to follow a SMM-like process, the test can also assess the relative importance of the mutation rates vs. the migration rate or vs. the reciprocal of the divergence time in the case of isolated populations. This procedure can be interpreted as testing whether RST = FST and could therefore be used to reveal phylogeographic patterns.
In the following, we present the test, validate it by simulations, explore its power in different contexts by simulations again, and apply it on three data sets from published experimental studies. Emphasis is given to the usefulness of the test to determine the efficiency of FST vs. RST for inferential purposes. Its usefulness to assess the efficiency of other statistics based on allele identity vs. allele size is addressed in the DISCUSSION, together with other potential applications.
| A SIMPLE TEST ON ALLELE SIZE INFORMATION CONTENT |
|---|
The test indicates whether allele sizes provide information on population differentiation given a data set, that is, whether shifts in allele sizes resulting from stepwise-like mutations contribute to population differentiation. Contribution of stepwise-like mutations to genetic differentiation requires (1) that the mutation process is at least partially SMM-like and (2) that the mutation rate, µ, is large enough relative to the effect of drift and migration (e.g., µ
m; otherwise new mutations are quickly spread beyond their native population by migration). Table 2 outlines the null hypotheses that can be tested, presenting a general null hypothesis as well as specific null hypotheses holding under particular prior assumptions.
|
The principle of the test is based on obtaining a distribution of a statistic under the null hypothesis (H0) that differences in allele sizes do not contribute to population differentiation. Therefore, we use a randomization procedure whereby the different allele sizes observed at a locus for a given data set are randomly permuted among allelic states. To better figure out the procedure, one may dissociate allelic state, identified, for example, by a letter (e.g., a, b, c, d, and e if there are five different alleles), and allele size, identified by a number (e.g., 4, 5, 7, 8, and 11, each representing the number of sequence repeats), given that there is a one-to-one correspondence between allelic state and allele size. Before randomization, the allele size attributed to each allelic state is the actual allele size (e.g., a, 4; b, 5; c, 7; d, 8; and e, 11). Throughout the randomization procedure, genotypes are defined in terms of allelic states and are not modified, but allele sizes are randomly reassigned among allelic states (e.g., a, 7; b, 4; c, 11; d, 5; and e, 8). After such a randomization, any two genes originally having the same allele size remain identical, although it can be for another allele size, whereas any two genes originally bearing different alleles of small size difference may bear alleles of large size difference, or reciprocally. Hence, the allele identity information is kept intact but not the allele size information. Under the null hypothesis (Table 2, case 1), the randomization procedure should not affect the expectation of a measure of differentiation such as RST. On the contrary, if allele sizes contribute to genetic differentiation, the RST computed after allele size permutation (hereafter called pRST) would depend solely on allele identity/nonidentity and hence have a smaller expectation than the value computed before randomization. The test can thus be designed by comparing the observed RST value (before randomization) to the distribution of pRST values obtained for all possible configurations of allele size permutations (or a representative subset of them, as the total number of different configurations quickly becomes enormous when the number of alleles exceeds 7 or 8). From this comparison, a probability that the null hypothesis holds can be estimated as the proportion of pRST values larger than the observed RST (one-tailed test). Note that the mean pRST should equal in expectation the FST computed on the same data (not accounting for potential statistical bias), as is confirmed later.
On a single locus, such a test can be applied only if a sufficient number of different alleles (n) are in the data set, as the number of different permutation configurations is equal to n!. Hence, five alleles (120 different configurations) appear to be a minimum to carry out such test at a type I error rate criterion of 5 or 1%. On a multilocus RST estimate, the test can be carried out by permuting allele sizes within each locus. It is noteworthy that the test makes no assumptions on the mutation model: A significant result (RST significantly >pRST) suggests that mutations contributed to genetic differentiation (e.g., because µ
m in an island model) and that the mutation process follows at least partially a SMM (the test remains valid under deviations from the SMM). Neutrality with respect to natural selection is, however, assumed. When the test is significant, FST is likely to provide a biased estimate of gene flow parameters, but it cannot be concluded a priori that RST would necessarily perform better given its larger variance (which is even more pronounced when mutations of more than one step can occur; ![]()
![]()
![]()
Which hypotheses can be tested and with which statistics?
Simulations permit validation of the allele size permutation test and assess its power. But it is first necessary to insist on what can be tested (Table 2).
Randomizing allele sizes creates replicates of a data set for a mutation process following a KAM (or IAM) because, under this model, allele size is irrelevant and interchanging them is like replicating the past mutation processes leading to the present data set but with other randomly chosen alleles after each mutational event. Hence, one possible application of the allele size randomization procedure is to test whether the mutation process follows a KAM (Table 2, case 3). For this purpose, randomizing allele sizes can be applied on any statistic based on allele size, not only R-statistics but also various genetic distances for stepwise mutation models such as (
µ)2 (e.g., ![]()
![]()
![]()
A second application of the allele size permutation procedure, here assuming a priori that mutations follow at least partially a SMM-like process, is to test whether mutation has contributed to population divergence (Table 2, case 2). In other words, we can test whether the migration rate (m) among populations, or the reciprocal of the number of generations (t) since population divergence, is large compared to the mutation rates (µ << m or µ << 1/t, respectively; Table 2, cases 2a and 2b). The allele size permutation test is the most interesting to address this question, because there is enough evidence that most microsatellites follow a SMM-like process (e.g., ![]()
![]()
![]()
![]()
![]()
µ)2 statistic, which is a between-populations component of allele size variance. The reason is that random permutations of allele sizes not only remove the within-population covariance between allele sizes for different alleles, but also modify the allele size variance under SMM or GSM, because the expected frequency distribution of allele sizes is not uniform (![]()
µ)2 statistic, will always be affected by a change of the allele size variance, no matter whether or not mutations contributed to differentiation. On the contrary, statistics based on a ratio of variance components, such as RST, will not be affected if the within- and among-populations components of variance are multiplied by factors having the same expectations. The simulations presented hereafter show that this is what occurs when there is no within-population covariance between allele sizes for different alleles (i.e., differentiation due to drift and not stepwise mutations).
To show that the allele size permutation test is adequate for the RST statistic but not the (
µ)2 statistic when testing m >> µ or 1/t >> µ (under the a priori assumption that the mutation process is stepwise-like; Table 2, cases 2), we simulated a random-mating population of diploid individuals (population size N = 1000 individuals) at mutation-drift equilibrium (µ = 0.001) under the SMM. The allele size permutation test (1000 randomizations) was then applied on RST and (
µ)2 computed between two independent samples (sample size n = 100 individuals) from that population for each of 200 simulated loci (the two samples thus represent undifferentiated subpopulations). The computer programs used for simulations and computations are described below. We report the percentage of loci for which the tests were significant (%RHo) according to the type I error rate criterion (
, the probability of rejecting the null hypothesis when it is true). Because the null hypothesis to be tested (1/t >> µ) is met by simulations, a valid testing procedure must ensure that %RHo =
; otherwise it means that the procedure is not adequate to test this null hypothesis. Fig 1 shows that the allele size randomization testing procedure is indeed valid when applied on RST but not on (
µ)2.
|
Power of the test under SMM:
To investigate the power of the test when testing if mutations contributed to population differentiation under the SMM (Table 2, cases 2), we checked the procedure on artificial data sets with realistic sample sizes derived from Monte Carlo simulations of populations made of diploid hermaphrodites. Three sets of demographic situations were simulated: (1) an island model at drift-migration-mutation equilibrium, (2) a model of two isolated populations having diverged from a common ancestral population at mutation-drift equilibrium, and (3) a linear stepping-stone model (gene flow restricted to adjacent populations) at drift-migration-mutation equilibrium. The island model was composed of 10 populations, consisting of 100 individuals each, and new generations were obtained by drawing genes at random from the population with probability 1 - m or from the other populations with probability m. The isolated population model was composed of two random-mating populations, consisting of 500 individuals each, and having diverged for t generations. The stepping-stone model was composed of 30 aligned populations, consisting of 50 individuals each, and new generations were obtained by drawing genes at random from the population with probability 1 - m or from the two adjacent populations with probability m.
The genetic parameters simulated were the following: At the initial stage all populations were fixed for one allele; 10 loci were simulated with mutations following a SMM and µ = 10-3 at all loci without size constraints. Simulations were run for a sufficient time to reach a steady state for total- and within-population gene diversity parameters, and then a sample of individuals representative of common experimental studies was extracted and analyzed. To obtain accurate estimates, 200 replicates were run for each set of conditions. Simulations were carried out using the software EASYPOP ver. 1.7.4 (![]()
![]()
![]()
![]()
ST by ![]()
![]()
![]()
by these authors) and for demographic parameter estimations (![]()
2) of allele identity per locus and per allele (FST), or the variance of allele size per locus (RST), is partitioned into three components (random effects): among populations (
2a), among individuals within population (
2b), and between genes within individual within population (
2c). FST and RST are then estimated as
2a/(
2a +
2b +
2c) (single-locus RST) or 
2a/
(
2a +
2b +
2c), where the summations apply over all loci (multilocus RST), all alleles of a locus (single-locus FST), or all alleles and loci (multilocus FST; ![]()
For the island model, simulations were run for 5000 generations with migration rates among populations varying from 10-4 to 10-1 (i.e., m = 0.1100µ) according to the runs. Global RST, FST, and pRST (for 1000 randomizations) were computed on a total sample of 300 individuals (30 individuals from each population). For the isolated populations model, a single population of 1000 individuals was simulated for 5000 generations, and then it was divided into two isolated subpopulations of 500 individuals that were run for 3010,000 additional generations (i.e., 1/t = 0.133µ). RST, FST, and pRST (for 1000 randomizations) were computed on a total sample of 100 individuals (50 individuals from each subpopulation). For the stepping-stone model, 10,000 generations were simulated with a migration rate of 0.1 (0.05 between any two adjacent populations). Analyses were carried out on a sample of 20 individuals from each of the 30 populations (total sample size of 600 individuals). Pairwise FST/(1 - FST) and RST/(1 - RST) ratios were computed for each pair of populations, and these values were averaged over all pairs separated by 1, 2, 3, ... , 20 steps (20 distance classes). Allele size permutation tests were applied on averaged pairwise RST/(1 - RST) ratios per distance class to provide pRST/(1 - pRST) values per distance class (1000 permutations). Here, pairwise FST/(1 - FST) and RST/(1 - RST) ratios were computed because theory predicts an approximate linear relationship with the linear distance between populations in one-dimensional isolation-by-distance models (![]()
The validity of some of the simulation results could be verified by comparing them to theoretical expectations. For example, after 5000 generations of simulation of a single population of N = 1000 individuals (for the isolated population model), the average heterozygosity and average variance of allele size were equal to He = 0.68 and V = 1.96, respectively, with a mean number of alleles per locus of 5.8 (range, 311 alleles). These values are close to their expectations at mutation-drift equilibrium (![]()
![]()
![]()
![]()
![]()
|
Results from all simulations confirm that mean pRST values (i.e., mean value computed after random permutations of allele size) are very close, though not exactly equal, to the FST values (Fig 2). For example, in the island model, the mean and standard deviation of the difference between FST and mean pRST values per locus were equal to 0.003 ± 0.007, 0.008 ± 0.012, and 0.010 ± 0.110 for m = 10-2, 10-3, and 10-4, respectively. Hence, mean pRST values were on average slightly lower than FST values although, for a given locus, the difference between the two could be quite substantial, especially under very low migration rates. For the other simulations, mean pRST values were generally slightly higher than FST (Fig 2B and Fig C). We also observed that the discrepancy between FST and mean pRST was much lower for multilocus than for single-locus estimates.
As expected, RST values are similar to FST values whenever m >> µ = 0.001 (island model), 1/t >> µ (diverging populations model), or populations are close (stepping-stone model with m >> µ). On the contrary, RST becomes considerably larger than FST when m
µ (island model), 1/t
µ (diverging populations model), or when populations are separated by more than five steps (stepping-stone model; Fig 2).
To assess the power of the allele size permutation test, we present in Fig 2 (graphs on the right) the percentage of statistically significant tests (%RHo) among 200 simulation replicates (using
= 5%) according to (1) the migration rate m (island model), (2) the divergence time t in number of generations since isolation (isolated two-population model), and (3) the distance d in number of steps between populations (stepping-stone model). This is done for tests applied to each locus as well as to a multilocus estimate based on 10 loci.
In the island model, %RHo approaches
for relatively high migration rates (i.e., m = 10-110-2 = 10100µ), in accordance with our a priori expectation that we should not detect a significant effect when m >> µ (Fig 2A). On the contrary, for lower migration rates, mutation is no longer negligible compared to migration and the proportion of significant tests increases above
, reaching 88 and 100% when m = 10-4 (m = 0.1µ) for tests on a single locus or 10 loci, respectively (Fig 2A). Tests based on 10 loci seem actually quite powerful for typical sample sizes encountered in experimental studies (300 individuals here), as 100% of the tests were significant when m = µ and already 24% when m = 10µ. Results of the two isolated population models are very similar to those of the island model if m is replaced by 1/t (Fig 2B). Here, however, tests seem less powerful than in the simulated island model (e.g., for 10 loci, %RHo > 50% when 1/t
µ in the isolated population model, and m
0.3µ in the island model), which is likely due to the smaller sample size (100 vs. 300 individuals) and the lower number of populations sampled (2 vs. 10). ![]()
60% for estimates based on 10 loci and only 20% for single-locus estimates (Fig 2C). Surprisingly, %RHo is already significantly larger than
for populations separated by just one step and exchanging migrants at a high rate (m/2 = 0.05) relative to the mutation rate (µ = 0.001).
Usefulness of the test to determine the most appropriate statistics:
To verify whether the test provides an adequate guideline to choose between RST and FST when assessing population differentiation, mean square errors (MSEs) of FST and RST were computed. The MSE is a synthetic measure of the efficiency of an estimator combining bias and variance (MSE = bias2 + variance). It has already been used to compare the efficiency of FST and RST estimators (![]()
![]()
(i - e)2/n, where i is the FST or RST estimate of the ith replicate, n is the number of replicates (n = 200), and e is the expected value given the demographic parameters. The expected value is e = 1/(1 + 4Nmd/(d - 1)) in the case of the island model (with N = 100 and d = 10), and e = t/(2N + t) in the case of the isolated population model (with N = 500). These are the values expected for RST under SMM and for FST under IAM (or KAM) and a low mutation rate (![]()
![]()
For the island model and µ = 0.001 (SMM), with migration rate varying from 0.0001 to 0.1, the ratio MSE(RST)/MSE(FST) varied, respectively, from 0.06 to 2.1 for single-locus estimates and from 0.02 to 2.3 for multilocus estimates based on 10 loci. The migration rate at which MSE(RST) = MSE(FST) was between m = 0.001 and 0.002 for single-locus estimates and between m = 0.003 and 0.005 for multilocus estimates. As can be observed in Fig 2A, these migration rate limits under which RST performs better than FST, and above which the reverse occurs, closely match the migration rate under which the allele size permutation test becomes often significant (i.e., %RHo
30%). The same pattern is observed for the isolated populations model: For t varying from 30 to 10,000 generations, MSE(RST)/MSE(FST) varied from 2.37 to 0.41 and from 4.00 to 0.01 for single-locus and multilocus estimates, respectively, and MSE(RST) = MSE(FST) for t = 2000 (i.e., 2/µ) and t = 500 (i.e., 0.5/µ) for single-locus and multilocus estimates, respectively. Hence, the test becomes frequently significant when MSE(RST) is close to MSE(FST) (Fig 2B).
These results strongly suggest that the allele size permutation test is well suited to determine which of FST or RST is the most adequate for demographic parameters inferences, at least on the basis of the lowest MSE criterion. However, it must be pointed out that the statistic with lowest MSE is not necessarily the statistic that will provide the lowest MSE in the demographic estimate, because demographic estimates are usually not linear functions of FST or RST. For example, in the isolated population model, the
= t/N estimates that can be derived using
F = 2FST/(1 - FST) and
R = 2RST/(1 - RST) give MSE(
R) > MSE(
F) for all simulated divergence time with single-locus estimates [
F can also be estimated as -ln(1 - FST) (![]()
quickly takes enormous values, so that the impact of the larger variance of RST relative to FST is greatly amplified in the inferred
, although
R is much less biased than
F for
1. The good news is that for multilocus estimates we obtained MSE(
R) = MSE(
F) for t = 500 and MSE(
R) < MSE(
F) for t > 500, as previously found for MSE(RST) = MSE(FST). Similarly, for the island model, where Nm can be estimated as NmF = (1/FST - 1)/4 and NmR = (1/RST - 1)/4, the m values corresponding to MSE(NmF) = MSE(NmR) were exactly equal to these obtained for MSE(RST) = MSE(FST) for both single- and multilocus estimates. Thus, the usefulness of the allele size permutation test to determine which of FST or RST is the most adequate for inferential purposes seems to be quite general, except probably with low sample size and/or low number of loci, when inferences are in any case doubtful because associated variances are too large.
Application examples:
To illustrate the utility and power of the allele size permutation test with real data we present three examples of published data sets that we reanalyzed. These data were collected to assess population differentiation and check for isolation by distance in three different organisms. We computed global or pairwise FST and RST statistics as described above and applied the allele size permutation tests to obtain pRST values. These analyses were performed with SPAGeDi.
Biomphalaria pfeifferi, a selfing snail recently introduced in Madagascar:
Biomphalaria pfeifferi, an intermediate host of a parasitic trematode causing intestinal bilharziasis, is a hermaphroditic freshwater snail distributed over most of Africa, the Middle East, and Madagascar. Madagascar was relatively recently invaded by this snail, probably as a result of human occupation a few hundred years ago (![]()
![]()
![]()
![]()
![]()
![]()
![]()
In this particular context, we can formulate a hypothesis regarding the information content that microsatellite allele sizes could bear. Given the postulated recent introductions of this snail in Madagascar, we expect that mutation has not contributed to differentiation among populations originating from the same introduction but has contributed to differentiation among populations originating from different introductions (at least if the source populations had diverged over enough time). The places and timing of the introductions are not known, but populations from a single watershed are likely to originate from a single introduction or, if genotypes from different introductions mixed in a watershed, migration within the watershed is likely to have prevented the buildup of a phylogeographical pattern at this scale. Therefore, we can expect RST to be close to FST for populations belonging to the same watershed and significantly larger than FST for populations from different watersheds when the latter were originally colonized by individuals from independent introductions.
To test this hypothesis, we reanalyzed data from small-scale and large-scale studies by ![]()
![]()
33 pairs of populations). One thousand random permutations of the allele sizes provided a distribution of pRST values, 95% confidence intervals covering the 25th to the 975th ordered values, and P values testing if RST > pRST.
Multilocus RST values are significantly higher than mean pRST at a broad scale but not at a local scale (Table 3). Applied to each locus, these tests were also significant for four out of eight loci at the broad scale but for none at the local scale.
|
The analysis of average pairwise multilocus FST and RST values per distance class at the broad scale shows the following (Fig 3):
- VALUE="1">Differentiation between populations occupying the same watershed is much lower than that between populations from different watersheds, even for populations separated by the same spatial distance. This is in line with the higher migration rate detected within watersheds than among them (CHARBONNEL 2002b).

View larger version (13K):
In this window
In a new window
Download PPT slide
Figure 3. Average pairwise FST (
and ), RST (
and
), and mean pRST (
and
) values among populations of Biomphalaria pfeifferi throughout Madagascar for a set of distance classes, distinguishing comparisons between populations within watersheds (,
,
) and among watersheds (
,
,
). The dotted lines represent the range of the 95% central ordered pRST values (i.e., after allele size randomization). Each distance class contains 3235 pairs of populations. - A pattern of isolation by distance is detected within watersheds for both FST and RST (Mantel tests: P = 0.007 and 0.021, respectively). Among watersheds, such a pattern is not detected for FST but is for RST (Mantel tests: P = 0.18 and 0.002, respectively).
- Within watersheds, RST's are not significantly higher than pRST's, whereas among watersheds, RST's are significantly higher than pRST's for all distance classes but the first one.
- Average pairwise pRST values are always somewhat lower than pairwise FST values but they follow closely their pattern of variation with spatial distance.
In conclusion, at a local scale, RST values are close to FST values, and allele size permutation tests do not reveal any significant contribution of stepwise mutations to population differentiation. On the contrary, at a large scale, RST values are substantially higher than FST values and allele size permutation tests demonstrate that shifts in average allele sizes contribute significantly to population differentiation. Significant tests on RST values are expected if populations had diverged for a sufficiently long time and/or if populations exchanged migrants at a rate similar or inferior to the mutation rate. The results are thus very consistent with a priori expectations given that (1) at a large scale, both these conditions are probably met because populations far apart in Madagascar probably originated from relatively recent and independent introductions from source continental populations isolated for a long time, and migration rate is low among watersheds, and (2) at a local scale, particularly within watersheds, none of these conditions are likely to be met.
Fraxinus excelsior, a widespread European tree:
Fraxinus excelsior (Oleaceae, common ash) is a widespread European wind-pollinated tree species found mostly in floodplain locations and with a scattered distribution within natural forests. The distribution of chloroplastic DNA (cpDNA) haplotypes throughout Europe suggests that F. excelsior was located in at least three different refuges during the last ice age, one putative refuge being the Balkan area (G. G. VENDRAMIN, unpublished data). ![]()
In the absence of evidence of long-term divergence between Bulgarian populations (no evidence of different refuges), and given that gene flow should be relatively extended in a wind-pollinated species, we may expect that stepwise-like mutations have not contributed significantly to population differentiation in Bulgaria. The data set of ![]()
Mean pairwise multilocus estimates were equal to FST = 0.074, RST = 0.091 within regions and FST = 0.097, RST = 0.180 among regions (Fig 4). Hence, whereas differentiation increases slightly from small to large geographical scales according to FST, it nearly doubles according to RST. Moreover, average pairwise RST is much larger than FST among regions, but only slightly larger than FST within regions. Within regions, observed RST's are always within the 95% range of central pRST, but among regions, the multilocus RST estimate as well as the estimate for locus FEM19 is larger than the 95% range of pRST (Fig 4), demonstrating that stepwise-like mutations contributed to population differentiation at the large geographical scale for at least one locus.
|
Several causes may account for the significant allele size effect on population differentiation among regions in Bulgaria, for example:
- The pattern may reflect isolation by distance. However, it seems unlikely that migration rate among regions is weak compared to the mutation rate given that pollen is wind dispersed.
- The pattern may be due to postglacial recolonization from different refuges. There is, however, no evidence of different refuges from the maternally inherited cytoplasmic DNA as the same unique haplotype occurs in all three regions (M. HEUERTZ, unpublished data).
- The pattern may reflect human-mediated introduction of Fraxinus from remote regions.
- The pattern may reflect locally occurring hybridization between F. excelsior and a related species such as F. angustifolia or F. pallisiae. Given that a total of four ash species (the former three and F. ornus) are found in Bulgaria and that different species occur in the same forests (M. HEUERTZ, personal observation), this latter hypothesis merits further investigation. In any case, the observation that a significant effect of stepwise-like mutations is observed on a large scale but not on a small one remains very consistent with a priori expectations, as nearby populations should exchange genes at a relatively high rate.
Centaurea corymbosa, a rare and narrow-ranged cliff-dwelling herb:
Centaurea corymbosa (Asteraceae) is a short-lived perennial herb species distributed over a very narrow range (within a 3-km2 area of a calcareous massif along the French Mediterranean coast), where it occurs in only six small populations (![]()
![]()
![]()
![]()
![]()
In this context it is interesting to question whether gene flow among populations is sufficiently low to permit divergence by mutations. The higher observed FST value at allozyme loci than at microsatellite loci could indeed be caused by high mutation rates of microsatellites, provided that µ
m. ![]()
|
The allele size randomization procedure is adequate to address this question. Therefore, global RST, pRST, and FST were computed for microsatellite loci as described above, and RST was compared against the distribution of 1000 pRST values. Permutation tests did not detect any RST value significantly >pRST (Table 4). This suggests thus that differentiation is caused mainly by drift and that gene flow, m, and/or the reciprocal of divergence time, 1/t, are large compared to the mutation rate, µ. This result also implies that FST should be a better estimator than RST of population differentiation for this species. Actually, given the small population sizes (![]()
![]()
100 individuals (there is actually much variance among populations) and conformed to an island model (there are actually some isolation-by-distance effects), a value of m = 0.006 would account for the observed FST, a value larger than typical microsatellite mutation rates (10-310-4). Assuming that these populations have been in place for a sufficiently long time to potentially permit differentiation by mutations (shifting allele sizes), the absence of such mutation-driven differentiation also suggests that the migration rate is larger than the mutation rate, so that new mutation variants spread over all populations.
Nonsignificant tests could also be due to a lack of power, so the test should be applied to additional microsatellite loci to confirm these results (presently, only four out of six loci had a sufficient number of alleles to carry out permutation tests). Deviation from a SMM at some loci could also reduce the power of the test. For example, the dinucleotide locus 28A7 has six alleles with sizes following a sequence of one repeat step plus one allele at least six repeats smaller than the other ones. Although this pattern is not necessarily incompatible with a pure SMM (e.g., ![]()
| DISCUSSION |
|---|
Comparison between measures of differentiation:
Comparisons of FST with RST values on microsatellite data have already been suggested for checking the importance of mutation vs. migration rates (e.g., ![]()
![]()
![]()
![]()
To compare multilocus FST and RST estimates, ![]()
![]()
Comparison between FST and RST is similar to comparing GST with NST on haplotypes (i.e., DNA sequences or other nonrecombinant DNA variants, such as mitochondrial or chloroplastic DNA; ![]()
(hT - hw)/hT and NST
(
T -
w)/
T, where h and
are measures of genetic diversity and subscripts T and w refer to diversity measured over the total set of populations and within population, respectively (see ![]()
pi2, where pi is the ith allele frequency, which is equivalent to h =
i
j
ijpipj, where
ij = 0 if i = j and
ij = 1 otherwise. The diversity measures
depend also on haplotype divergence and are of the form
=
i
j
ij pi pj, where
ij now represents a degree of divergence between haplotypes i and j (
ij = 0 if i = j but otherwise
ij varies, being, for example, proportional to the number of site differences between i and j). NST is expected to be >GST when similar haplotypes (i.e., haplotype pairs with low
ij) are associated geographically; otherwise they should have identical expectations. Thus, when comparing RST with FST or NST with GST, measures of differentiation based on ordered vs. unordered alleles are compared, and the importance of mutation relative to other causes of genetic differentiation (i.e., gene flow and divergence time) can be assessed. ![]()
![]()
![]()
Impact of deviations from a pure SMM on the power of the test:
In all the simulations realized to assess the power of the test, a strict SMM was considered. However, the microsatellite mutation process is known to deviate from a strict SMM (![]()
![]()
![]()
![]()
![]()
![]()
![]()
|

) statistics computed between two samples from a population at mutation-drift equilibrium under the SMM. The percentage of loci with the null hypothesis rejected (%RHo) is shown as a function of the type I error rate criterion (
