Genetics, Vol. 151, 1211-1216, March 1999, Copyright © 1999

Estimating the Effective Number of Breeders From Heterozygote Excess in Progeny

Gordon Luikarta and Jean-Marie Cornuetb
a Laboratoire de Biologie des Populations d'Altitude, Université Joseph Fourier, CNRS, F-38041 Grenoble, Cedex 9, France
b Laboratoire de Modèlisation et de Biologie Evolutive, INRA/URLB, 34090 Montpellier Cedex, France

Corresponding author: Gordon Luikart, Laboratoire de Biologie des Populations d’Altitude, CNRS, UMR 5553, Université Joseph Fourier, F-38041 Grenoble, Cedex 9, France., gluikart{at}ujf-grenoble.fr (E-mail)

Communicating editor: G. B. GOLDING


*  ABSTRACT
*TOP
*ABSTRACT
*PRINCIPLE AND METHODS
*RESULTS AND DISCUSSION
*LITERATURE CITED

The heterozygote-excess method is a recently published method for estimating the effective population size (Ne). It is based on the following principle: When the effective number of breeders (Neb) in a population is small, the allele frequencies will (by chance) be different in males and females, which causes an excess of heterozygotes in the progeny with respect to Hardy-Weinberg equilibrium expectations. We evaluate the accuracy and precision of the heterozygote-excess method using empirical and simulated data sets from polygamous, polygynous, and monogamous mating systems and by using realistic sample sizes of individuals (15–120) and loci (5–30) with varying levels of polymorphism. The method gave nearly unbiased estimates of Neb under all three mating systems. However, the confidence intervals on the point estimates of Neb were sufficiently small (and hence the heterozygote-excess method useful) only in polygamous and polygynous populations that were produced by <10 effective breeders, unless samples included > ~60 individuals and 20 multiallelic loci.


THE effective population size (Ne) is an important parameter in evolutionary genetics and conservation biology because it influences the rate of inbreeding and loss of genetic variation. It also influences the efficiency of natural selection in maintaining beneficial alleles and purging deleterious ones. For example, when Ne is very small, genetic drift will often be too strong for natural selection to efficiently maintain or purge alleles. Unfortunately, Ne has proven very difficult to estimate in natural populations (WAPLES 1989 Down). Thus, any method with potential for improving our ability to estimate Ne deserves thorough evaluation. Ne can be estimated using demographic or genetic data. However, demographic methods require information such as variance in reproductive success, which is difficult to obtain for many species. Further, demographic estimates of Ne are often higher than the true Ne because demographic methods seldom incorporate all of the factors (e.g., skewed sex ratios, change in population size, etc.) that can make Ne smaller than the population census size (Nc) (FRANKHAM 1995 Down).

The (current) effective population size can be estimated using genetic data and the four following statistical methods: the loss of heterozygosity method (e.g., HARRIS and ALLENDORF 1989 Down); the temporal method (KRIMBAS and TSASKAS 1971 Down; WAPLES 1989 Down); the linkage disequilibrium method (HILL 1981 Down; WAPLES 1991 Down); and, most recently, the heterozygote-excess method (PUDOVKIN et al. 1996 Down). PUDOVKIN et al. 1996 Down demonstrated that the heterozygote-excess method estimates the effective number of breeding parents (Neb) with no bias and fair precision when the sample size of progeny is infinite and when gametes combine completely at random, i.e., when all male gametes have an equal probability of combining with all female gametes, as in some polygamous, random-mating species (e.g., marine invertebrates such as shellfish).

Here, we extend the evaluation of PUDOVKIN et al. 1996 Down to include finite samples of individuals (n = 15–120), reasonable numbers of polymorphic loci (5–30) with a wide range of allele frequencies, monogamous mating systems where only pair-mating occurs, and polygynous mating systems where only a few males mate with many females (i.e., skewed sex ratios). Our goal is to delineate the conditions under which the heterozygote-excess method will be useful for estimating Neb in natural populations (or in captive populations such as fish hatcheries). In this article, we (i) explain the importance of using an unbiased estimator of the expected heterozygosity (Hexp) for calculating eb from finite samples of progeny, (ii) quantify the bias of eb, (iii) determine the number of loci and individuals that must be sampled to achieve precise estimates of Neb, and (iv) test if monogamy generates an interfamily Wahlund effect (i.e., heterozygote deficiency) that counteracts the heterozygote excess generated by small Neb. To conduct these evaluations, we use data from simulations and natural populations. We focus on populations with a small Neb (4–100) because a heterozygote excess is generated only when Neb is small.


*  PRINCIPLE AND METHODS
*TOP
*ABSTRACT
*PRINCIPLE AND METHODS
*RESULTS AND DISCUSSION
*LITERATURE CITED

The principle of the heterozygote-excess method is as follows: When the number of breeders is small, the allele frequencies in males and females will be different due to binomial sampling error. This difference generates an excess of heterozygotes in the progeny relative to the proportion of heterozygotes expected under Hardy-Weinberg equilibrium (ROBERTSON 1965 Down; RASMUSSEN 1979 Down). The proportion of heterozygotes expected in progeny produced by a small and equal number of females and males can be calculated as (FALCONER 1989 Down, p. 67)

where n is the number of haploid genomes in the mothers or fathers and p and q are the frequencies of alleles at a locus, in the population from which the parents were drawn. Because the excess of heterozygotes expected in progeny increases as the number of parents decreases, PUDOVKIN et al. 1996 Down suggested using the magnitude of heterozygote excess to estimate the effective number of breeding parents.

PUDOVKIN et al. 1996 Down called H' the proportion of heterozygotes expected to be observed (Hobs) in the progeny (given a limited number of parents), whereas the expected proportion of heterozygotes in the base population under Hardy-Weinberg proportions, 2pq, was designated as Hexp.

Now eb can be estimated as follows:

The ratio Hexp/(Hobs - Hexp) is the reciprocal of SELANDER 1970 Down index, D, for excess or deficiency of heterozygotes; thus, an estimate of Neb is (PUDOVKIN et al. 1996 Down)

The following more exact equation was derived by PUDOVKIN et al. 1996 Down:

(1)

The above expression is for two alleles. For a multiallelic locus, one should average D over all k alleles as

(2)
where Di is the excess of heterozygotes for the ith allele,

Hobs[i] is the total proportion of heterozygotes having allele i, and Hexp[i] is simply 2pi(1 - pi) · 2N/(2N - 1), where i is the ith allele.

We note that when the sample size of individuals is finite, Hexp must be estimated using the following unbiased estimator of 2pq (NEI 1987 Down): 2N(2pq)/2N - 1, where N is the number of progeny sampled. If 2pq is used instead of the unbiased estimator, eb will give a biased estimate Neb (especially when N is small). In nature, Neb can range from only two to near infinity and eb can be negative (e.g., in the case of an overall deficit of heterozygotes). In our analyses, we considered Neb values as infinite if eb was negative or >10,000 (an arbitrary but reasonably large value). If eb is infinite, it simply means that the heterozygote-excess signal is obscured by the noise (i.e., sampling error) associated with small samples of loci or individuals.

The simulation model that we used to evaluate the heterozygote-excess method has three main steps. First, it assigns genotypes to the parental generation using random numbers and a predefined allele frequency distribution. We modeled loci with 2, 3, and 5 alleles and the following three allele frequency distributions: equal (e.g., H = 0.8, for 5 alleles), triangular (e.g., 0.33, 0.30, 0.20, 0.13, 0.07, and H = 0.74; see PUDOVKIN et al. 1996 Down), and rare (e.g., 0.52, 0.2, 0.10, 0.07, 0.04, with H = 0.67). Second, the simulation model randomly picks a male and female parent and one gamete from each. Under the polygamy model, the gametes from males and females are randomly combined such that one male can potentially mate with several females and vice versa. Under monogamy, the same random male is always paired with the same female. For example, when Neb = 4, one random male is paired with one random female, and then a second random male is paired with a second random female. Then one of the two pairs is randomly chosen and gametes are drawn to make an offspring. This is repeated until 15–120 offspring have been generated. Here, the variance in reproductive success among the four parents follows a Poisson distribution, thus Neb = 4. Under polygyny, only one or two males mate with all the females. For example, if one male mates with 99 females, the number of breeders is 100 but the effective number is only ~4 [Ne = , where Nm and Nf are the number of breeding males and females, respectively; CROW and KIMURA 1970 Down]. Third, the program computes eb from a random sample of 15–120 progeny using Equation 1 and Equation 2. These three steps were repeated 500–2000 times per combination of the following parameters: Neb, sample size of individuals and loci, allele number, allele frequency distribution, and mating system.


*  RESULTS AND DISCUSSION
*TOP
*ABSTRACT
*PRINCIPLE AND METHODS
*RESULTS AND DISCUSSION
*LITERATURE CITED

Bias:
Our simulations suggest that the heterozygote-excess method slightly overestimates the Neb when using finite samples and Nei's unbiased estimator of Hexp. When Neb was 4, 20, and 100, the harmonic mean estimates of Neb (from 500 simulations) were 4.1, 22.2, and 103.4, respectively, when sampling 30 individuals and 10 loci (five alleles/locus) with triangular allele frequency distributions and a polygamous breeding system (Table 1). When more individuals were sampled, the bias was slightly lower. Harmonic mean estimates of Neb were nearly identical for loci with two, three, and five alleles and with widely different allele frequency distributions (Table 1; Figure 1, horizontal bars inside box plots). The largest bias occurred under the polygynous mating system (e.g., = eb 5.0 when true Neb {cong} 4; Figure 1). This bias was not surprising given the assumption of the heterozygote-excess method that there are equal numbers of male and female parents. Still, the bias was small enough not to substantially diminish the usefulness of the heterozygote-excess method.



View larger version (26K):
In this window
In a new window
Download PPT slide
 
Figure 1. Harmonic mean eb and the distribution of upper and lower 95% confidence limits on eb computed from 500 independent simulation estimates using 10 loci and samples of 30 individuals. The dotted horizontal line (between boxes) shows the true Neb. The solid horizontal line inside each box shows the harmonic mean eb. The top of each box is the 80th percentile of the distribution of 500 estimates of the upper 95% confidence limit on eb. Each line extending upward from a box is the 95th percentile of the distribution of the upper 95% confidence limit. Bottoms of boxes and lines extending downward from boxes represent the 20th and 5th percentiles, respectively, of the distribution of the lower 95% confidence limit. (*) Infinity is the 95th percentile of the distribution of the upper confidence limit. (**) Both the 95th and 80th percentiles of the distribution of upper confidence limits are infinity. Distributions are compared for a polygamous breeding system (i.e., complete random union of gametes between males and females) and for loci with (i) five alleles at equal frequencies (Even-5), (ii) five alleles at triangular frequencies (Tri-5, as in Table 1), (iii) five alleles including rare alleles (Rare-5, see PRINCIPLE AND METHODS), (iv) three alleles with triangular frequencies (Tri-3), (v) two alleles with triangular frequencies (Tri-2), (vi) Tri-5 allele frequencies for a monogamous breeding system in which males and females are pair-mated, and, finally, (vii) Tri-5 for extreme polygyny where one male breeds all 99 females (and thus Neb {approx} 4).


 
View this table:
In this window
In a new window

 
Table 1. Harmonic mean eb and distribution of 95% confidence limits for eb from 500 simulations

The harmonic mean Neb estimates were similar for the monogamous and polygamous mating systems (eb was 4.5 and 4.1, respectively, when Neb was 4.0; Figure 1). This suggests that there is little impact of an interfamily Wahlund effect on the accuracy of Neb estimates. When only approximately two to three large families exist (e.g., under monogamy with Neb equaling 4–6), sampling across families does not generate a large heterozygote deficiency (i.e., Wahlund effect). However, a Wahlund heterozygote deficiency is expected when many families exist (A. PUDOVKIN, personal communication). Such a deficiency would at least partially cancel the heterozygote excess caused by small Neb, and thereby cause biased (high) estimates of Neb. Although monogamy did not cause a large bias, it did substantially increase the variance among Neb estimates (see below and Figure 1).

Precision:
To quantify the precision of the Neb estimates, we used the Student's t-distribution to compute a 95% confidence interval for each eb (as in PUDOVKIN et al. 1996 Down). The confidence interval on eb contained the true Neb in 92–96% of the simulation estimates of Neb when using loci with five alleles (Table 1). As expected, approximately half of the confidence intervals that did not contain the true Neb were too low (L) and half were too high (H). This suggests that the Student's t-distribution is useful for computing confidence intervals, even though Neb is not exactly normally distributed. For loci with three alleles or for monogamous mating systems, the confidence intervals also contained the true Neb ~92–96% of the time. However, when using loci with only two alleles, the confidence intervals were generally too narrow and contained the true Neb in only 83–89% of the simulation estimates of Neb (Table 1). Thus, confidence intervals must be interpreted with caution or computed by alternative methods (e.g., bootstrap resampling) when using loci with only two alleles (e.g., many allozyme loci).

Under extreme polygyny (e.g., one male mating with 99 females), the confidence intervals were often too high. For example, when Neb was four, ~25% of the 500 simulated confidence intervals were slightly higher than the true Neb, and none were lower than Neb. Although the confidence intervals were often too high, they were also much narrower under polygyny than under monogamy or polygamy (Figure 1). This narrowness substantially increases the usefulness of the heterozygote-excess method under polygyny. Thus, under extreme polygamy, the heterozygote-excess method will be useful for detecting a small Neb but will be less useful for quantifying the exact size of Neb.

To determine the minimum number of loci and individuals that must be sampled to achieve a high probability of obtaining narrow confidence intervals, we plotted the distribution of the (upper and lower) 95% confidence limits obtained from 500 simulation estimates of Neb. When the true Neb is only 4, at least 10 loci (with five alleles) and 30 individuals must be sampled to achieve an 80% probability of obtaining an upper 95% confidence limit < ~20 (Figure 2B). In other words, the statistical power is 0.80 when testing the null hypothesis that the true Neb != 20 (and when the true Neb is actually only 4). The power will be slightly higher when using a one-tailed test and the null hypothesis that true Neb >= 20 (the alternative hypothesis is Neb < 20).



View larger version (33K):
In this window
In a new window
Download PPT slide
 
Figure 2. Distributions of 500 (95%) confidence limits on eb computed from 500 independent simulation estimates, as in Figure 1. Dotted horizontal lines represent the true Neb being estimated (4 and 10 for a–c and d–f, respectively). (b and e) "80%" shows the 80th percentile of the distribution of the upper 95% confidence limits. This distribution was generated from 500 independent simulation estimates of Neb. In e, 460 is the 95th percentile of the distribution of the upper 95% confidence limits.

These results show that the heterozygote-excess method is sufficiently powerful for detecting a small Neb when sampling reasonable numbers of individuals and loci with five alleles. Such results are important for conservation biology and the management of captive and natural populations. The precision of eb is often increased more by analyzing a larger number of loci than by sampling more individuals. Doubling the number of loci from 10 to 20 (compare the first box plot in Figure 2B and Figure C) generally reduces confidence intervals more than doubling the number of individuals sampled from 15 to 30 (compare the first two box plots in Figure 2B). However, the benefit from doubling the number of loci depends on the number and frequency of alleles (Figure 1).

When true Neb is 10, we must sample >20 polymorphic loci and 60 individuals to have an 80% probability of obtaining confidence intervals that are <50 (and to have a 95% probability of obtaining confidence intervals <100; Figure 2E). When the true Neb is 100, >80% of the confidence intervals include infinity, even when sampling 120 individuals and 20 loci with five alleles (data not shown). Clearly, when Neb is >10, very large samples of loci and individuals are required to achieve a high probability of obtaining reasonably small confidence intervals. Thus, the main limitation of the usefulness of the heterozygote-excess method is its poor precision, i.e., its wide confidence intervals. The confidence intervals are generally too wide for the method to be useful when using diallelic loci, loci with mostly rare alleles, or when studying strictly monogamous species (Figure 1).

When applied to data from natural populations, the heterozygote-excess method often gave estimates of Neb equal to infinity. For example, eb was infinity in 5 of 10 cohorts for which the total number of parents was known (or estimated) to be small (i.e., three to a few dozen). Further, only 2 of the 10 estimates gave 95% confidence intervals that did not include infinity as an upper limit (Table 2). This poor precision is not surprising in that only 5–9 polymorphic loci were analyzed, and only 11–25 progeny were sampled. Additional empirical evaluations are needed, but it is extremely difficult to find large data sets containing individuals produced from a known number of parents.


 
View this table:
In this window
In a new window

 
Table 2. Estimates of Neb from empirical data sets containing progeny from a known number of parents

One potential limitation of the method is the requirement for random, representative sampling. For example, if a sample contains only one or few families (due to sampling error) then we could obtain a very low Neb estimate, even though many families actually exist and Neb is large. Another obvious limitation is that the method will work only in species with separate sexes. The method will work for haplo-diploid species (e.g., Hymenopterans), but will require the derivation of equations different from those presented here.

Four approaches may help circumvent the problem of poor precision. First, one can compute 80% confidence intervals (in addition to 95% confidence intervals). This will reduce the likelihood that the upper confidence limit will include infinity and be uninformative. Second, one could explore alternative methods for computing confidence intervals (e.g., nonparametric methods such as bootstrap resampling of loci). Third, one could combine estimates of Neb from several generations or cohorts by computing the harmonic mean of eb over the multiple generations or cohorts. This can reduce the probability of obtaining infinity for eb because, when computing a harmonic mean, the low estimates carry far more weight than high ones (e.g., infinity). Finally, one can potentially combine estimates of Ne obtained from several independent Ne estimators by computing the harmonic mean of the Ne estimates. Other promising Ne estimators include those based on gametic disequilibrium (HILL 1981 Down) and on temporal variance in allele frequencies (KRIMBAS and TSASKAS 1971 Down; WAPLES 1991 Down). These two estimators also suffer from low precision (WAPLES 1989 Down, WAPLES 1991 Down; LUIKART 1997 Down; LUIKART et al. 1998 Down). However, two or more of the estimators may be independent (WAPLES 1991 Down; PUDOVKIN et al. 1996 Down) and thus could potentially be used simultaneously to achieve a more precise estimate of Ne. More research is needed to evaluate the precision and accuracy that can be achieved by using several Ne estimators simultaneously. Any improvement in our ability to estimate Ne would be significant in light of both the difficulties in assessing Ne and the importance of Ne in population genetics and in conservation biology.


*  ACKNOWLEDGMENTS

We thank I. Till-Bottraud and two anonymous reviewers for helpful comments on earlier versions of this manuscript, M. Schwartz for sharing unpublished simulation data, and especially P. Spruell, F. W. Allendorf, A. Estoup, and M. Brown for providing data sets. Support was provided by the French "Bureau Ressources Génétiques," a postdoctoral fellowship (for G.L.) from National Science Foundation/North Atlantic Treaty Organization, and the Laboratoire de Biologie des Populations d'Altitude.

Manuscript received June 17, 1998; Accepted for publication November 20, 1998.


*  LITERATURE CITED
*TOP
*ABSTRACT
*PRINCIPLE AND METHODS
*RESULTS AND DISCUSSION
*LITERATURE CITED

CROW, J. F., and M. KIMURA, 1970 An Introduction to Population Genetics Theory. Burgess Publishing, Minneapolis.

ESTOUP, A., F. ROUSSET, Y. MICHALAKIS, J.-M. CORNUET, and M. ADRIAMANGA et al., 1998  Comparative analysis of microsatellite and allozyme markers: a case study investigating microgeographic differentiation in brown trout (Salmo trutta). Mol. Ecol. 7:339-353[Medline].

FALCONER, D. S., 1989 Introduction to Quantitative Genetics, Ed. 3. Longman Scientific & Technical with John Wiley & Sons, New York.

FRANKHAM, R., 1995  Effective population size/adult population size ratios in wildlife: a review. Genet. Res. 66:95-106.

HARRIS, R. B. and F. W. ALLENDORF, 1989  Genetically effective population size of large mammals: an assessment of estimators. Conserv. Biol. 3:181-191.

HILL, W. G., 1981  Estimation of linkage disequilibrium in randomly mating populations. Heredity 33:229-239.

KRIMBAS, C. B. and S. TSASKAS, 1971  The genetic of Dacus oleae V. Changes of esterase polymorphism in a natural population following insecticide control—Selection or drift? Evolution 25:454-460.

LUIKART, G., 1997 Usefulness of molecular markers for detecting population bottlenecks and monitoring genetic change. Ph.D. Thesis, University of Montana, Missoula, MT.

LUIKART, G., J.-M. CORNUET, and F. W. ALLENDORF, 1998  Temporal changes in allele frequencies provide estimates of population bottleneck size. Conserv. Biol. 89:238-247.

NEI, M., 1987 Molecular Evolutionary Genetics. Columbia University Press, New York.

PUDOVKIN, A. I., D. V. ZAYKIN, and D. HEDGECOCK, 1996  On the potential for estimating the effective number of breeders from heterozygote-excess in progeny. Genetics 144:383-387[Abstract].

RASMUSSEN, D. I., 1979  Sibling clusters and gene frequencies. Am. Nat. 113:948-951.

ROBERTSON, A., 1965  The interpretation of genotypic ratios in domestic animal populations. Anim. Prod. 7:319-324.

SELANDER, R. K., 1970  Behavior and genetic variation in natural populations. Am. Zool. 10:53-66[Medline].

WAPLES, R. S., 1989  A generalized approach for estimating effective population size from temporal changes in allele frequency. Genetics 121:379-391[Abstract/Free Full Text].

WAPLES, R. S., 1991 Genetic methods for estimating the effective size of Cetacean populations, pp. 279–300 in Genetic Ecology of Whales and Dolphins, Special Issue 13, edited by A. R. HOELZEL. International Whale Commission, London.




This article has been cited by other articles:


Home page
J HeredHome page
O. L. Zhdanova and A. I. Pudovkin
Nb_HetEx: A Program to Estimate the Effective Number of Breeders
J. Hered., November 1, 2008; 99(6): 694 - 695.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
M. A. Beaumont
Estimation of Population Growth or Decline in Genetically Monitored Populations
Genetics, July 1, 2003; 164(3): 1139 - 1160.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
J. Wang and M. C. Whitlock
Estimating Effective Population Size and Migration Rates From Genetic Samples Over Space and Time
Genetics, January 1, 2003; 163(1): 429 - 446.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
P. Berthier, M. A. Beaumont, J.-M. Cornuet, and G. Luikart
Likelihood-Based Estimation of the Effective Population Size Using Temporal Changes in Allele Frequencies: A Genealogical Approach
Genetics, February 1, 2002; 160(2): 741 - 751.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
R. Vitalis and D. Couvet
Estimation of Effective Population Size and Migration Rate From One- and Two-Locus Identity Measures
Genetics, February 1, 2001; 157(2): 911 - 925.
[Abstract] [Full Text]