- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Baer, C. F.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Baer, C. F.
Among-Locus Variation in Fst: Fish, Allozymes and the Lewontin-Krakauer Test Revisited
Charles F. Baeraa Department of Biological Science, Florida State University, Tallahassee, Florida 32306
Corresponding author: Charles F. Baer, Colorado State University, Ft. Collins, CO 80523-1878., cbaer{at}lamar.colostate.edu (E-mail)
Communicating editor: M. SLATKIN
| ABSTRACT |
|---|
Variation among loci in the distribution of allele frequencies among subpopulations is well known; how to tell when the variation exceeds that expected when all loci are subject to uniform evolutionary processes is not well known. If locus-specific effects are important, the ability to detect those effects should vary with the level of gene flow. Populations with low gene flow should exhibit greater variation among loci in Fst than populations with high gene flow, because gene flow acts to homogenize allele frequencies among subpopulations. Here I use Lewontin and Krakauer's k statistic to describe the variance among allozyme loci in 102 published data sets from fishes. As originally proposed, k >> 2 was considered evidence that the variation in Fst among loci is greater than expected from neutral evolution. Although that interpretation is invalid, large differences in k in different populations suggest that locus-specific forces may be important in shaping genetic diversity. In these data, k is not greater for populations with expected low levels of gene flow than for populations with expected high levels of gene flow. There is thus no evidence that locus-specific forces are of general importance in shaping the distribution of allele frequencies at enzyme loci among populations of fishes.
BIOLOGISTS using molecular or biochemical markers to infer patterns of gene flow from population genetic structure commonly observe substantial variation among loci in the distribution of allele frequencies within and among populations, even among loci with similar levels of overall variation (e.g., ![]()
![]()
![]()
The key fact is that different evolutionary forces act in characteristic ways with respect to the genome. Genetic drift, migration, and inbreeding are statistical sampling processes that, on average, affect all loci equally, whereas mutation, natural selection, meiotic drive, and assortative mating differ among loci. This fact provides a useful null hypothesis against which various evolutionary hypotheses may in principle be tested. If a set of loci is evolving neutrally and is subject to the same mutational processes, then the distribution of allele frequencies or trait values among subpopulations should have the same average value across loci or across traits. In particular, a set of loci subject only to drift and migration is expected to have the same average inbreeding coefficient, F, and by extension, the same average partitioning of the inbreeding coefficient into within- and among-population components, Fis and Fst (![]()
![]()
![]()
![]()
![]()
![]() |
(1) |
is the mean inbreeding coefficient, averaged across loci, n is the number of subpopulations sampled, and k is a constant specific to the underlying distribution of allele frequencies among subpopulations. For example, for immutable loci the expected variance in F is 0 and therefore k = 0.
To determine the value of k, Lewontin and Krakauer simulated several distributions of allele frequencies among subpopulations and reached the conclusion that for neutral loci governed only by drift, k is
2 and is a decreasing function of F. Using the value k = 2 to establish the expected variance in F among loci, they demonstrated that the expected variance is distributed as a mean chi-square (chi-square/degrees of freedom), where the degrees of freedom are equal to the number of loci. They proposed that the ratio of the observed variance to the expected variance be compared to the critical chi-square value with the appropriate degrees of freedom; if the ratio of observed to expected variance is significantly large, the hypothesis of neutral evolution at all loci in the sample must be rejected. That is, by their argument, at least one locus in the sample must be under natural selection.
Unfortunately, the Lewontin-Krakauer test is not generally valid as a test of natural selection for several reasons. ![]()
![]()
![]()
![]()
![]()
![]()
![]()
However, the problem attacked by Lewontin and Krakauer remains important, because it has become standard practice to use the variation in allele frequencies among populations (i.e., Fst) as an indirect estimator of gene flow (![]()
![]()
![]()
![]()
![]()
![]()
![]()
Even in the absence of natural selection, it is possible that locus-specific processes are important in cases of large variation among loci. Specifically, it may be that some loci mutate more or less according to an infinite alleles model (i.e., all alleles of a given type are identical by descent), whereas recurrent mutation to identical allelic states is the rule at other loci (i.e., some alleles are identical in state but not identical by descent). If such variation in mutational properties among loci is present, it should be more apparent in populations with low levels of migration (large
st) than in populations with high levels of migration (low
st). Locus-specific effects can artificially hide real population subdivision, but they cannot artificially create the appearance of subdivision in a panmictic population except in the case of strong directional selection. That is, in a population with low gene flow, one set of loci (the "infinite alleles" loci) appears highly differentiated among subpopulations whereas another set of loci (the "recurrent mutation" loci) may not. In a population with high gene flow, however, all loci appear relatively homogeneous among subpopulations. Another way in which mutation can artificially hide the presence of real population structure is if there are undetected null alleles segregating; again, populations that are actually differentiated can appear homogeneous (although a high frequency of nulls may be indicated by a deviation from Hardy-Weinberg equilibrium). Finally, in populations with low levels of gene flow, the assumption of µ << m may be violated at some but not all loci, which leads to large variance among loci.
Herein, I use the Lewontin-Krakauer test to describe the variance among allozyme loci in the literature on fishes and then draw conclusions from the pattern of results. My approach is to initially assign an expected level of gene flow to a population (data set) from a priori biological and geographical considerations; if locus-specific effects are of general importance, populations with expected low levels of gene flow should generally exhibit greater variation in Fst among loci for the reasons noted above. Note that it is important to assign expected levels of gene flow a priori, because in any given case a small value of Fst may be due to locus-specific effects rather than high gene flow. It is also important to realize what the Lewontin-Krakauer test can and cannot do. Although it is not a valid test of selection per se for the reasons noted above, it can provide a one-sided test for locus-specific effects that is of some value to the biologist interested in inferring gene flow from Fst. Specifically, if the variance among loci is greater than expected by the Lewontin-Krakauer criterion, it cannot confidently be attributed to selection; any of the other explanations may hold. However, if a Lewontin-Krakauer test is not significant, then there is some theoretical justification for making the assumptions implicit in relating Fst to Nem, the effective number of migrants, i.e., effective neutrality, weak mutation, and approximate migration-drift equilibrium (e.g., ![]()
![]()
![]()
![]()
| MATERIALS AND METHODS |
|---|
The data analyzed here are taken from a subset of the literature on studies of gene flow and population structure in fishes (102 data sets from 77 publications; raw data are presented in an appendix available online at http://www.colostate.edu/Depts/Biology/Research/baer-1999-genetics.htm/). Fishes are particularly suitable for this study because they encompass almost the entire range of possible degrees of population structuring, from the essentially panmictic (large, pelagic marine species) to essentially isolated populations (lacustrine species or species endemic to springs). To be included, a data set had to have at least three natural subpopulations (e.g., hatchery populations or those known to be stocked were omitted) and at least three loci. If a study reported values of Fst and average heterozygosity for each locus, those values were taken directly as published. If a study did not report either of those two quantities, the raw allele frequency data were entered into BIOSYS I (![]()
i (1 -
i), where
i is the mean frequency of the ith allele; this is equivalent to NEI's (1973) Gst. In my analysis I use the weighted average Fst (of alleles within a locus) because that is the value usually reported in published studies. An alternative possibility would have been to use the original procedure of Lewontin and Krakauer, which is to calculate Fst for each allele at a locus and subtract a degree of freedom for each multi-alleleic locus. The Lewontin-Krakauer procedure provides greater statistical power, but because most authors report weighted average values of Fst, I chose to do so as well; the result is a conservative test.
The first step of the analysis was to assign to each population an expected level of gene flow, from low (i.e., high expected
st) to high (low expected
st), from a priori biological considerations such as habitat, behavior, geographic distribution, etc. For example, my expectation is that yellowfin tuna will exhibit high levels of gene flow and the Leon Springs pupfish will exhibit low levels of gene flow. It is very important to realize that the expected level of gene flow of a population is often determined more by the geographical milieu in which a data set was collected than by the biological properties (e.g., swimming ability) of the species. For example, consider the large, vagile largemouth bass and the small, sedentary madtom. Within a river drainage, I expect bass to exhibit greater gene flow than madtoms. However, I expect there to be less gene flow between populations of bass in different drainages than between subpopulations of madtoms in the same drainage. Some species are included more than once, and of those, some are assigned different levels of expected gene flow from geographic considerations (see appendix at website). For example, the Atlantic salmon is included four times and appears in all three categories of expected gene flow due to the geographic properties of the samples. The unweighted mean value of Fst (among loci) was then used to calculate from Equation 1 the expected variance in a given study; the observed variance was calculated as usual.
These data were then used in two ways. First, I assumed that the theoretically expected variance in Fst in a study was in fact equal to the observed variance and calculated a value of k, substituting the observed variance for the expected in Equation 1 (see ![]()
![]()
st, number of subpopulations, number of loci, taxon, ecological niche, etc.).
Second, I did two Lewontin-Krakauer tests on each data set, using the original criterion of k = 2 and EWENS' (1977, p. 120) skewed (
= 0.9) ß-distribution criterion of k = 7.6. These tests yield categorical data; either the hypothesis of "neutral evolution" was not rejected (i.e., the variance was not larger than expected given the particular criterion of a test) or it was. Because the focus of this study concerns the strength of inferences about gene flow drawn from allele frequency data and not the inference of natural selection from those data, a conservative test in this case necessitates minimizing the possibility of type II error; accordingly, the level of significance was not corrected for multiple tests.
These analyses were done first for all loci at which the frequency of the common allele in at least one subpopulation was <0.95. I then repeated the analyses and included only loci with an expected heterozygosity [HT in NEI's (1973) terminology, or the "total limiting variance" of ![]()
![]()
![]()
![]()
![]()
![]()
The distributions of k and k20 were approximately lognormal; means and 95% confidence limits (CL) were calculated from back-transformation of natural-log-transformed data. The distributions of Fst and Fst,20 could not be satisfactorily normalized by transformation; median values and ranges are thus presented.
In any comparative study, the potential effects of phylogenetic nonindependence need to be considered. A formal comparative treatment of the data in this study would be problematic for two reasons. First, because of the sampling-dependent nature of the character "expected level of gene flow," mapping character-state changes onto a tree would be meaningless for most clades. Second, although one could in principle use independent contrasts of the relationship of k with Fst (![]()
| RESULTS |
|---|
The prediction that locus-specific effects should lead to greater variance in Fst among loci in populations with low expected levels of migration (high e.g.,
st) compared to those with high expected levels of migration (low
st) was not borne out. Lewontin and Krakauer's k statistic did not differ among the three classes of expected levels of gene flow (one-way ANOVA, F2,99 = 0.822, P = 0.442; Table 1). The criteria by which populations were assigned an expected level of gene flow proved reliable; the average
st was highest for "low" gene flow populations and lowest for "high" gene flow populations, with "medium" populations intermediate (Table 1). Regression of log(k) against
st revealed no relationship between k and
st (F1,100 = 1.443, P = 0.232, R2 = 0.014; Figure 1a). When only highly polymorphic loci were considered, there was again no difference in variance in Fst among loci among the different expected levels of gene flow (one-way ANOVA, F2,87 = 2.111, P = 0.127; Table 1). Regression of log(k20) against
st,20 revealed a significant negative relationship between the variance among loci, and
st; populations with low expected levels of gene flow (high
st) had smaller values of k20 than did those with high expected levels of gene flow (low
st; log[k20] = -2.101[
st,20] + 1.500; F1,88 = 12.027, P = 0.001, R2 = 0.094; Figure 1b). This result is consistent with the expectation that k is be a decreasing function of
st under the Lewontin-Krakauer model (![]()
![]()
|
|
The mean value of k averaged over all 102 data sets is 5.92 (95% CL = 4.81, 7.29), >2 but <7.6 predicted under the beta distribution of allele frequencies with a median allele frequency of
= 0.9 (Table 1). When only highly variable loci are considered, the mean value of k20 averaged over all data sets is 2.82 (95% CL = 2.08, 3.81), which is close to the value of 2.57 predicted from the ß-distribution with the median allele frequency of
= 0.73. In both cases, k is smaller in populations with low expected levels of gene flow than in populations with medium or high expected levels of gene flow (Table 1).
When the Lewontin-Krakauer test is used to assess the pattern of variation, the general pattern of a decline in variation among loci with decreasing level of gene flow remains. When the original value of k = 2 is used and all loci are included, the test is significant in a substantial majority of cases and the pattern is consistent across classes of expected levels of gene flow (Table 2). When the Ewens' (
= 0.9) criterion of k = 7.6 is used, the pattern is reversed; the observed value of k is no greater than expected in a large majority of cases, again consistent across classes. When only highly polymorphic loci are considered, the pattern is more complicated. For populations in which high gene flow is expected, the Lewontin-Krakauer test with k = 2 is significant a majority of the time (Table 2) but it is not significant in populations with low expected levels of gene flow; populations with intermediate levels of expected gene flow are intermediate. When the Ewens' criterion of k = 2.57 is used, the Lewontin-Krakauer test is significant in slightly fewer cases in all three gene flow categories, as expected (Table 2).
|
There are weak but highly significant relationships between k and both the number of subpopulations (log[k] = 0.032[n pops] + 1.421; F1,100 = 14.419, P = 0.000) and the number of loci included in a data set (log[k] = 0.074[n loci] + 1.172; F1,100 = 10.491, P = 0.002). However, these relationships disappear when only highly polymorphic loci are considered (n pops, F1,88 = 0.799, P = 0.374; n loci, F1,88 = 1.167, P = 0.283). There is no relationship between
st and either number of subpopulations (F1,100 = 1.607, P = 0.208) or number of loci (F1,100 = 1.571, P = 0.213) included in a dataset; the results for highly polymorphic loci are essentially identical.
When k was averaged within families without regard to expected level of gene flow, the mean k was 5.45 (95% CL = 4.12, 7.16; n = 39), which is very close to the uncorrected mean of 5.92. When averaged over families within individual categories of expected levels of gene flow, the results were again very similar to the uncorrected results (mean k, high E[gf] = 6.68, medium E[gf] = 4.68, low E[gf] = 4.71; see Table 1 for comparison). For highly variable loci, k20 averaged over families without regard to expected level of gene flow was 2.82 (95% CL = 1.45, 4.27; n = 38), exactly the same as the uncorrected value. The family averages of k20 within categories of expected gene flow were again very similar to the uncorrected results (mean k20, high E[gf] = 3.35, medium E[gf] = 2.45, low E[gf] = 1.71; see Table 1 for comparison). The family means were distributed approximately lognormally, the same as the full data set. These results suggest that there is no important confounding effect of phylogenetic nonindependence on the initial results.
| DISCUSSION |
|---|
Most importantly, the results of this study lead to the conclusion that there is no general tendency for locus-specific effects to artificially mask real population structure. This is illustrated by the random (when all loci are considered) or negative (when only highly polymorphic loci are considered) relationship between k and
st. This is good news for biologists interested in inferring patterns of gene flow from allozyme allele frequency data; it means that the assumptions necessary for that inference (effective neutrality, weak mutation, and approximate migration-drift equilibrium) seem in general to be valid, especially when only highly polymorphic loci are considered. Obviously, there are individual cases when those assumptions apparently are violated, as evidenced by the large values of k seen in some data sets (e.g., approximately an order of magnitude greater than even a liberally calculated expected value).
The fact that k calculated over all loci is greater than k calculated over only highly polymorphic loci is almost certainly due to the effect of differences in allele frequency per se, with skewed values of
tending to inflate the value of k (![]()
![]()
![]()
![]()
![]()
![]()
st averaged over highly polymorphic loci is ~40% greater than
st averaged over all loci (Table 1) also argues strongly against the possibility of pervasive balancing selection at highly polymorphic loci, at least within subpopulations.
The random/negative relationship of k with
st is perhaps surprising for another reason. As first pointed out by ![]()
![]()
st is that the distribution of allele frequencies among subpopulations in populations with low gene flow is governed almost solely by drift and that there is little historical information left in the data, a possibility that seems unlikely given what is known about the general utility of allozyme frequencies for phylogenetic reconstruction. Any correlations that are present would then occur primarily at short geographic distances due to stepping-stone migration (e.g., ![]()
st. There is at least anecdotal evidence for just such an effect of Ne. In a study of three species of Cyprinids (![]()
1; for the more abundant species k
7. Likewise, in a study of three endangered Cyprinodontids (![]()
1 in all cases, whereas for two more abundant Cyprinodontids (![]()
A potential criticism of these conclusions is the fact that in some cases a high frequency of Lewontin-Krakauer tests are statistically significant, particularly when all loci are considered in populations with high expected gene flow. However, given what is known about the behavior of k under a variety of drift-only (![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
Finally, there is a pattern that emerges from the data that is worthy of comment, which is that
st averaged only over highly polymorphic loci is sometimes substantially greater than when all loci are included in the analysis (averaged over all 102 data sets, the median
st,20 is
40% greater than
st calculated over all loci). This is not a novel observation (e.g., ![]()
(1 -
) (![]()
![]()
![]()
| ACKNOWLEDGMENTS |
|---|
I thank Mike Antolin, Bill Black IV, Mike Hellberg, Tom Turner, Mike Whitlock, and two anonymous reviewers for discussions and/or comments on the manuscript. I am especially indebted to Steve Karl for a conversation in which my thinking on the subject crystallized and for sharing an unpublished manuscript and to Joe Travis for particularly insightful comments. Support was provided by a Florida State University Dissertation Fellowship and Colorado Agricultural Experiment Station Hatch Project no. 697 to M. Antolin.
Manuscript received November 16, 1998; Accepted for publication February 22, 1999.
| LITERATURE CITED |
|---|
BAER, C. F., 1998a Species-wide population structure in a southeastern US freshwater fish, Heterandria formosa: gene flow and biogeography. Evolution 52:183-193.
BAER, C. F., 1998b Population structure in a south-eastern US freshwater fish, Heterandria formosa. II. Gene flow and biogeography within the St. Johns River drainage. Heredity 81:404-411.
BOSSART, J. L. and D. P. PROWELL, 1998 Genetic estimates of population structure and gene flow: limitations, lessons, and new directions. Trends Ecol. Evol. 13:202-206.
CAVALLI-SFORZA, L., 1966 Population structure and human evolution. Proc. R. Soc. Lond. Ser. B 164:362-379[Medline].
CROW, J. F. and K. AOKI, 1984 Group selection for a polygenic behavioral trait: estimating the degree of population subdivision. Proc. Natl. Acad. Sci. USA 81:6073-6077
DUGGINS, C. F., JR., A. A. KARLIN, K. G. RELYEA, and R. W. YERGER, 1983 Systematics of the genus Floridichthys.. Biochem. Syst. Ecol. 11:283-294.
ECHELLE, A. A., A. F. ECHELLE and D. R. EDDS, 1987 Population structure of four Pupfish species (Cyprinodontidae: Cyprinodon) from the Chihuahuan desert region of New Mexico and Texas: allozymic variation. Copeia 1987: 668681.
EWENS, W. J., 1977 Population genetics theory in relation to the neutralist-selectionist controversy, pp. 67134 in Advances in Human Genetics, Vol. 8, edited by H. HARRIS and K. HIRSHHORN. Plenum Press, New York.
EWENS, W. J., and M. W. FELDMAN, 1976 The theoretical assessment of selective neutrality, pp. 303337 in Population Genetics and Ecology, edited by S. KARLIN and E. NEVO. Academic Press, New York.
FELSENSTEIN, J., 1985 Phylogenies and the comparative method. Am. Nat. 125:1-15.
KARL, S. A. and J. C. AVISE, 1992 Balancing selection at allozyme loci in oysters: implications from nuclear RFLPs. Science 256:100-102
LEWONTIN, R. C. and J. KRAKAUER, 1973 Distribution of gene frequency as a test of the theory of the selective neutrality of polymorphisms. Genetics 74:175-195
MORITZ, C., 1994 Defining `Evolutionarily Significant Units' for conservation. Trends Ecol. Evol. 9:15-20.
NEI, M., 1973 Analysis of gene diversity in subdivided populations. Proc. Natl. Acad. Sci. USA 70:3321-3323
NEI, M. and A. CHAKRAVARTI, 1977 Drift variances of Fst and Gst statistics obtained from a finite number of isolated populations. Theor. Popul. Biol. 11:307-325[Medline].
NEI, M. and T. MARUYAMA, 1975 Lewontin-Krakauer test for neutral genes. Genetics 80:395
NEI, M., A. CHAKRAVARTI, and Y. TATENO, 1977 Mean and variance of Fst in a finite number of incompletely isolated populations. Theor. Popul. Biol. 11:291-306[Medline].
POGSON, G. H., K. A. MESA, and R. G. BOUTILIER, 1995 Genetic population structure and gene flow in the Atlantic cod Gadus morhua: a comparison of allozyme and nuclear RFLP loci. Genetics 139:375-385[Abstract].
ROBERTSON, A., 1975a Remarks on the Lewontin-Krakauer test. Genetics 80:396
ROBERTSON, A., 1975b Gene frequency distributions as a test of selective neutrality. Genetics 81:775-785
SLATKIN, M., 1985 Gene flow in natural populations. Annu. Rev. Ecol. Syst. 16:393-430.
SLATKIN, M., 1991 Inbreeding coefficients and coalescence times. Genet. Res. 58:167-175[Medline].
SLATKIN, M., 1993 Isolation by distance in equilibrium and non-equilibrium populations. Evolution 47:264-279.
SLATKIN, M. and N. H. BARTON, 1989 A comparison of three indirect methods for estimating average levels of gene flow. Evolution 43:1349-1368.
SWOFFORD, D. L., and R. B. SELANDER, 1989 BIOSYS 1, release 1.7. University of Illinois Press, Urbana, IL.
TIBBETS, C. A. and T. E. DOWLING, 1996 Effects of intrinsic and extrinsic factors on population fragmentation in three species of North American minnows (Teleostei: Cyprinidae). Evolution 50:1280-1292.
TREXLER, J. C., 1988 Hierarchical organization of genetic variation in the sailfin molly, Poecilia latipinna (Pisces: Poeciliidae). Evolution 42:1006-1017.
WEIR, B. S., 1990 Genetic Data Analysis. Sinauer, Sunderland, MA.
WEIR, B. S. and C. C. COCKERHAM, 1984 Estimating F-statistics for the analysis of population structure. Evolution 38:1358-1370.
WRIGHT, S., 1978 Evolution and the Genetics of Populations, Vol. 4. University of Chicago Press, Chicago.
This article has been cited by other articles:
![]() |
W. S. Grant, I. B. Spies, and M. F. Canino Biogeographic Evidence for Selection on Mitochondrial DNA in North Pacific Walleye Pollock Theragra chalcogramma J. Hered., November 1, 2006; 97(6): 571 - 580. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. Le Corre and A. Kremer Genetic Variability at Neutral Markers, Quantitative Trait Loci and Trait in a Subdivided Population Under Selection Genetics, July 1, 2003; 164(3): 1205 - 1219. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Kayser, S. Brauer, and M. Stoneking A Genome Scan to Detect Candidate Regions Influenced by Local Natural Selection in Human Populations Mol. Biol. Evol., June 1, 2003; 20(6): 893 - 900. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Launey, C. Ledu, P. Boudry, F. Bonhomme, and Y. Naciri-Graven Geographic Structure in the European Flat Oyster (Ostrea edulis L.) as Revealed by Microsatellite Polymorphism J. Hered., September 1, 2002; 93(5): 331 - 351. [Abstract] [Full Text] [PDF] |
||||
- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Baer, C. F.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Baer, C. F.





