Genetics, Vol. 164, 1071-1085, July 2003, Copyright © 2003

Deleterious Mutations and the Genetic Variance of Male Fitness Components in Mimulus guttatus

John K. Kellya
a Department of Ecology and Evolutionary Biology, University of Kansas, Lawrence, Kansas 66045

Corresponding author: John K. Kelly, University of Kansas, 1200 Sunnyside Ave., Lawrence, KS 66045., jkk{at}ku.edu (E-mail)

Communicating editor: J. B. WALSH


*  ABSTRACT
*TOP
*ABSTRACT
*BIOMETRIC QUANTITIES AND THEIR...
*STUDY SPECIES AND METHODS
*HYPOTHESIS TESTING AND PARAMETER...
*RESULTS
*DISCUSSION
*APPENDIX A
*APPENDIX B
*LITERATURE CITED

Deleterious mutations are relevant to a broad range of questions in genetics and evolutionary biology. I present an application of the "biometric method" for estimating mutational parameters for male fitness characters of the yellow monkeyflower, Mimulus guttatus. The biometric method rests on two critical assumptions. The first is that experimental inbreeding changes genotype frequencies without changing allele frequencies; i.e., there is no genetic purging during the experiment. I satisfy this condition by employing a breeding design in which the parents are randomly extracted, fully homozygous inbred lines. The second is that all genetic variation is attributable to deleterious mutations maintained in mutation-selection balance. I explicitly test this hypothesis using likelihood ratios. Of the three deleterious mutation models tested, the first two are rejected for all characters. The failure of these models is due to an excess of additive genetic variation relative to the expectation under mutation-selection balance. The third model is not rejected for either of two log-transformed male fitness traits. However, this model imposes only "weak conditions" and is not sufficiently detailed to provide estimates for mutational parameters. The implication is that, if biometric methods are going to yield useful parameter estimates, they will need to consider mutational models more complicated than those typically employed in experimental studies.

The long-debated question of whether or not genetic variation in fitness primarily reflects contributions of low-frequency deleterious alleles maintained by the balance between selection and mutation, or has a substantial contribution from variants maintained at intermediate frequencies by selection, is still unanswered.
CHARLESWORTH and HUGHES 2000 Down


DELETERIOUS mutations have been a focus of study in both genetics and evolutionary biology for most of the history of each field (CROW 1993 Down). MULLER 1950 Down argued that modern medicine would allow deleterious mutations to eventually accumulate to intolerable levels within the human species unless countered by "a rationally directed guidance of reproduction." Muller's claims initiated a prolonged debate with other geneticists, most notably Theodosius Dobzhansky (see CROW 1987 Down). Since the Muller-Dobzhansky dispute, biologists have hypothesized that harmful mutations may be critical to a wide range of ecological and evolutionary phenomena. These include the evolution of sex, mating systems, and mate choice (PAMILO et al. 1987 Down; KONDRASHOV 1988 Down; CHARLESWORTH et al. 1990B Down; UYENOYAMA et al. 1992 Down); the evolution of recombination rates (KONDRASHOV 1984 Down; PALSSON 2002 Down); the persistence of small populations (LANDE 1994 Down; LYNCH et al. 1995 Down; KONDRASHOV 1995 Down); and the maintenance of both molecular and quantitative trait variation (LEWONTIN 1974 Down; TURELLI 1984 Down; CHARLESWORTH et al. 1993 Down). The validity of these hypotheses depends not only on the rate that deleterious mutations occur, but also on the magnitude of their effects, on dominance relations, and on how deleterious alleles at different loci combine to influence the overall fitness of an individual.

The primary mutational parameters are U, h, and s. The genomic deleterious mutation rate, U, equals 2{sum}µi, where µi is the deleterious mutation rate at the ith locus, and the summation is taken over all loci affecting fitness. The selection coefficient, s, is the proportional fitness reduction caused by the mutation when in homozygous form. The dominance coefficient, h, characterizes mutational effects in heterozygotes (see Table 1). A mutation with s = 0.05 and h = 0.2 reduces fitness by 1% in heterozygotes and 5% in homozygotes. As mutational effects are likely to vary among loci (and among different mutations at the same locus), estimates for s and h may actually represent average values for mutational effects. However, because the values of h and s for a given allele are assumed to be constant, this is essentially a model of unconditionally deleterious mutations. Given the range of applications outlined above, accurate estimates for U, h, and s will shed light on a number of unsolved problems in evolutionary biology.


 
View this table:
In this window
In a new window

 
Table 1. The basic parameterization of a single locus in which the deleterious allele (a) has population frequency q

At least three different approaches have been developed to estimate the rate and effects of deleterious mutations: mutation-accumulation experiments, molecular evolution surveys, and biometric analyses. These methods, briefly summarized below, each have their respective strengths and weaknesses. They are applicable to different sorts of data. The procedures also differ in the specific parameters that are estimated and in their underlying assumptions regarding genetics and evolution.

Mutation-accumulation experiments estimate mutational parameters from the observed genetic divergence of initially identical lines (MULLER 1928 Down; BATEMAN 1959 Down; MUKAI et al. 1972 Down). These lines are maintained at small population sizes and diverge due to the random accumulation of new mutations. Estimates for mutational parameters can then be extracted from the rate and overall pattern of divergence through a variety of statistical procedures (KEIGHTLEY 1994 Down; GARCIA-DORADO 1997 Down; LYNCH and WALSH 1998 Down; SHAW et al. 2002 Down, and references therein). Mutation accumulation is the most direct of the estimation methods and these experiments have provided most of our information regarding mutational parameters (reviewed by SIMMONS and CROW 1977 Down; LYNCH and WALSH 1998 Down, pp. 343–348). Unfortunately, mutation accumulations are highly labor intensive. The parameter estimates are subject to large statistical uncertainty. Perhaps most importantly, mutation-accumulation experiments may fail to detect mutations with small effects on fitness (KEIGHTLEY and EYRE-WALKER 1999 Down; LYNCH et al. 1999 Down). As a consequence, U may be substantially underestimated and s substantially overestimated.

The molecular survey method is based on the rates of synonymous and nonsynonymous gene sequence evolution within an evolutionary lineage (KONDRASHOV and CROW 1993 Down; EYRE-WALKER and KEIGHTLEY 1999 Down). If synonymous mutations can be treated as selectively neutral, an estimate for U can be obtained from the deficiency of nonsynonymous substitutions (relative to synonymous substitutions) within the genome as a whole. Molecular surveys measure the time-integrated effects of selection (as opposed to the direct effects of mutations on viability and fertility) and can thus potentially detect very weakly selected mutations. A second advantage is that inferences are based on the actual historical results of natural selection (genetic substitutions). In contrast, the other methods are based on measurements of fitness surrogates (e.g., viability, female fertility, male mating success, etc.) that are usually measured under controlled laboratory conditions. On the negative side, the molecular survey analysis depends on several unrealistic assumptions. In particular, the method assumes that adaptive amino acid substitutions do not occur and that all synonymous changes are strictly neutral; i.e., there is no selection on codon usage (AKASHI 1997 Down). EYRE-WALKER and KEIGHTLEY 1999 Down argue that deviations from these assumptions will lead to underestimates for U. As a consequence, molecular surveys may provide a robust lower bound for this quantity.

Biometric methods use patterns of phenotypic variation in fitness components to infer mutational parameters (MORTON et al. 1956 Down; CHARLESWORTH et al. 1990A Down; DENG and LYNCH 1996 Down). This approach is grounded in population genetic models of mutation-selection balance (e.g., HALDANE 1927 Down). Biometric quantities, such as the amount of genetic variation in fitness and the magnitude of inbreeding depression, can be expressed as functions of U, h, and s. Estimates of inbreeding depression or the genetic variance can thus be used to estimate mutational parameters by inverting the population genetic equations. For example, DENG and LYNCH 1996 Down derive simple estimators for U, h, and s from the means and genetic variances of inbred and outbred individuals within an experimental population.

Biometric methods are generally less labor intensive than mutation-accumulation experiments. More importantly, these experiments can potentially yield estimators that are relatively low in bias for a wide range of mutational parameters (DENG 1998 Down; DENG et al. 2002 Down). However, like the two previous methods, biometric methods are also encumbered with potentially limiting assumptions. Two particularly important assumptions are (1) that all genetic variation in fitness is maintained through mutation-selection balance and (2) that experimental inbreeding changes genotype frequencies without changing allele frequencies. Assumption 1 will break down if balancing selection of some form (e.g., heterozygote advantage, frequency-dependent selection, spatial and/or temporal variation in selection, genotype-by-environment variation, etc.) maintains a substantial fraction of the genetic variance in fitness. If this is so, biometric quantities may deviate substantially from their expected values given U, h, and s. Assumption 2 breaks down if deleterious alleles are eliminated or "purged" from the experimental population so that the frequencies of deleterious mutations differ between inbred and outbred individuals. This will bias parameter estimates.

This article describes a biometric analysis of male fitness measures in Mimulus guttatus, a wildflower commonly known as yellow monkeyflower. My purpose is to address both of the potential pitfalls associated with the biometric method. Mutational parameters are estimated from a breeding experiment that uses randomly extracted, highly inbred lines as parents. This effectively eliminates the problem of purging within the experiment. Allele frequencies should not differ (consistently) between outbred and inbred plants. Second, the experiment is designed so that the number of "empirical comparisons" is greater than the number of unknown parameters. This provides the degrees of freedom necessary to test the adequacy of a model prior to parameter estimation. Following the suggestion of DENG and LYNCH 1996 Down(p. 358), I use likelihood ratios to test the assumption that all genetic variation is maintained through mutation-selection balance.

Biometric methods depend on specific features of the experimental design such as the type and levels of inbreeding, the sorts of experimental crosses, etc. In the following sections, I first describe the experimental design and define the biometric quantities that can be estimated from this design. I then derive the expected values for these quantities under various forms of the deleterious mutation model (DMM). Next, I describe the likelihood-based methods for parameter estimation. A parametric bootstrapping procedure is developed, both to assess the adequacy of the experimental design and to perform hypothesis tests (obtain an appropriate P value for the various likelihood ratios). Finally, the statistical model is applied to phenotypic data from M. guttatus. The characters are flower size and two male fitness measures, pollen production and the pollen size index (PSI). The PSI is strongly correlated with pollen viability in M. guttatus (KELLY et al. 2002 Down).


*  BIOMETRIC QUANTITIES AND THEIR EXPECTED VALUES
*TOP
*ABSTRACT
*BIOMETRIC QUANTITIES AND THEIR...
*STUDY SPECIES AND METHODS
*HYPOTHESIS TESTING AND PARAMETER...
*RESULTS
*DISCUSSION
*APPENDIX A
*APPENDIX B
*LITERATURE CITED

Consider a breeding design in which the parents are randomly extracted, fully homozygous inbred lines. Such lines can be formed in several different ways. For select species, completely homozygous lines can be formed in a single generation via chromosomal manipulations, gametophytic self-fertilization, or double-haploid formation. Repeated rounds of inbreeding provide a more generally applicable, albeit slower, method of line formation. Successive generations of self-fertilization can yield lines that are almost fully inbred in only five to six generations. The parental lines used in this study were developed by John Willis through six successive generations of self-fertilization (WILLIS 1999A Down, WILLIS 1999B Down).

Regardless of the method, genetic purging is likely to accompany the formation of the inbred lines. Lines that become homozygous for mutations that are lethal or cause sterility in homozygous form will go extinct and eliminate these alleles from the experimental population. In contrast, deleterious mutations with smaller effects should fix randomly within lines (WILLIS 1999A Down, WILLIS 1999B Down). As a consequence, a genetic analysis using these lines as parents is informative only about the nonlethal and nonsterile fraction of mutations. Parameter estimates are thus specific to this class of mutations, termed "detrimentals" by SIMMONS and CROW 1977 Down.

A major advantage of using fully inbred lines as parents in a biometric breeding design is that they are already fully purged. Subsequent inbreeding or outcrossing should result in predictable changes in homozygosity without changes in allele frequency (apart from the random changes due to finite sample sizes). Consider the breeding design described by Fig 1. Each "extended family" is constituted from the progeny of three randomly selected inbred lines (labeled L1, L2, and L3). One line (L1) is assigned as the "sire" and is mated to the other two lines, the "dams." The progeny of these crosses comprise two distinct but related outbred families (the families in Fig 1, bottom). Each line is also self-fertilized to produce three inbred families. The inbred families are unrelated to each other, but each is related to one or both of the outbred families. This same mating scheme is applied to other sets of lines generating a large number of extended families.



View larger version (17K):
In this window
In a new window
Download PPT slide
 
Figure 1. The relationships among individuals within an extended family derived from a particular set of lines (L1, L2, and L3). Individual progeny, denoted by solid circles, are contained within families. Dashed lines denote lines of descent (a double line for selfed progeny). Of the three parental lines, L1 is the sire for the outbred families, while L2 and L3 are the dams.

The measurement of progeny from this design provides data sufficient to estimate the mean phenotype of both inbred and outbred plants as well as a number of (co)variance components. Let ß denote the difference in mean phenotype between inbred and outbred plants. Four different comparisons among relatives can be quantified as covariances (Fig 1). CSS is the covariance among plants within selfed families (selfed siblings). CFS is the covariance among plants within outbred families (full siblings). COS is the covariance between outbred plants and their inbred siblings (related through a single parent). Finally, CHS is the covariance of plants from different outbred families (half-siblings related through the sire line). These C terms are "observational variance components" in the terminology of FALCONER 1989 Down. The observational components can be expressed as a function of the "causal components" VA, CAD, VD, and VDI (see COCKERHAM and WEIR 1984 Down; SHAW et al. 1998 Down; KELLY and ARATHI 2003 Down). I do not use these relationships here. Instead, the observational components are expressed directly as functions of the mutational parameters (seeEquation 1aEquation 1b HREF="#FD1c">Equation 1cEquation 1dEquation 1eEquation 3Equation 4Equation 5 HREF="#FD7">Equation 7 below).

Estimates for ß and the observational components are used to assess the sufficiency of a particular deleterious mutation model to explain genetic variability. If demonstrated as sufficient, the same data can then be used to estimate the mutational parameters of that model. For both purposes, it is necessary to express ß, CSS, CFS, COS, and CHS as functions of the mutational parameters. This requires a model for genotypic effects on both phenotype and fitness. Table 1 summarizes the standard fitness model for an individual locus: the deleterious allele (a) exists at population frequency q. Following CHARLESWORTH and HUGHES 2000 Down, I distinguish the evolutionary fitness of a genotype (determined by h and s) from its contribution to a measurable fitness component. The quantity {alpha} "translates" between phenotype and fitness.

Predicted values for ß, CSS, CFS, COS, and CHS can be obtained by applying quantitative genetic models developed for inbreeding populations (COCKERHAM and WEIR 1984 Down; see also HARRIS 1964 Down; JACQUARD 1974 Down). As these quantities depend on the contributions of many loci, it is necessary to subscript the various quantities in Table 1. As shown in Appendix A,

(1a)


(1b)


(1c)


(1d)


(1e)

where qi, {alpha}i, hi, and si are the allele frequency and effects associated with the ith locus (see also CHARLESWORTH and HUGHES 2000 Down). The summations are taken over all loci harboring deleterious alleles. These equations define the genetic covariances among relatives. Environmental factors may also affect the phenotypic resemblance of relatives. In this treatment, these effects are characterized by other variables (see HYPOTHESIS TESTING AND PARAMETER ESTIMATION).

Equation 1aEquation 1bEquation 1cEquation 1dEquation 1e depend on two important assumptions. The first is that the loci contribute additively to the phenotype. Oftentimes, it is assumed that the effects of deleterious mutations at different loci combine multiplicatively. In this situation, trait values should be log-transformed prior to analysis. As the appropriate scale is not obvious a priori, here I apply the methods to both log-transformed and untransformed data (see RESULTS and DISCUSSION). Second, each component is linear in qi because I have neglected terms of order qi2 (see Appendix A). This approximation should be quite accurate because deleterious alleles will not usually obtain high frequencies in a large population.

The frequency of deleterious alleles at a given locus (q) depends on the mutational parameters. I now assume that the natural population is large, randomly mating, and in mutation-selection balance equilibrium. I further assume that deleterious mutations have some phenotypic effect in heterozygotes (h > 0), an assumption supported by genetic data (SIMMONS and CROW 1977 Down). As shown by HALDANE 1927 Down, the expected frequency qi of a partially recessive mutation is equal to µi/(hisi). Substitution of this expression intoEquation 1aEquation 1bEquation 1cEquation 1dEquation 1e yields predictions for the biometric quantities in terms of mutational parameters.

The final step is to characterize the mutational spectrum, the joint density function of {alpha}i, hi, and si across loci. I consider three alternative models. In model I, mutational effects are fixed and do not vary across loci. In model II, mutational effects vary according to an exponential distribution. Model III considers variable mutational effects without imposing distributional assumptions.

Model I:
All new mutations have fixed effects {alpha}, h, and s. With constant mutational effects,

(2a)


(2b)


(2c)


(2d)


(2e)

where U = 2{sum}µi as defined previously.

The five estimable biometric quantities are functions of only three unknown quantities: U{alpha}, h, and s{alpha}. Because both U and s are always included in a product with {alpha}, neither parameter can be estimated individually (unless a particular value for {alpha} is assumed). Only the composite parameters, U{alpha} and s{alpha}, can be estimated. The fact that the five measurable statistics are a function of only three unknowns allows us to test the sufficiency of model I. The model imposes the constraints that CFS = 2COS2/CSS and CHS = 1/2CFS.

Model II:
New mutations vary in their effects according to a simple probability model. Following DENG and LYNCH 1996 Down, model II posits an exponential density for s,

(3)

where is the average homozygous effect. Since mutations with larger effects tend to be more recessive (MACKAY et al. 1992 Down; CABALLERO and KEIGHTLEY 1994 Down), the dominance coefficient is a function of s,

(4)

where (-1/) < K < 0. DENG and LYNCH 1996 Down suggest a value of K = -13 from a consideration of genetic data. Here, it is a parameter to be estimated from the data.

The parameters of model II are U, K, , and {alpha}. I assume that {alpha}i is the same across loci. The average dominance of new mutations, , is

(5)

Integrating over the distributions of mutational effects (see Appendix B), we obtain predicted values for the biometric quantities:

(6a)


(6b)


(6c)


(6d)


(6e)

Again, our five observable quantities are functions of three unknowns. These three estimable quantities are composite functions of the parameters U{alpha}, K, and {alpha}. Model II constrains the value of CFS,

(7)

where h* = COS/CSS.

Model III (weak conditions):
Model II represents only one way in which mutational effects may vary across loci (or between different mutations at the same locus). A range of alternative models could be considered. For example, CABALLERO and KEIGHTLEY 1994 Down consider a model that allows variable dominance of mutations, even if they have the same value for s. Other generalizations are also possible (ZHANG et al. 2002 Down and references therein). Given the range of possibilities, it is fortunate that we can make a few statements about the relative magnitudes of the biometric quantities without an explicit model of mutational effects. As long as deleterious alleles are rare and hi < 0.5 at all loci, thenEquation 1aEquation 1b HREF="#FD1c">Equation 1cEquation 1dEquation 1e imply that COS < 1/2CSS, CFS < COS, and CHS = 1/2CFS. Model III is not sufficiently detailed to allow parameter estimation. However, a model with these constraints can be tested against the unconstrained model. The conditions associated with model III are "weak" because they may often hold even if a substantial portion of the variance in fitness is maintained by balancing selection.


*  STUDY SPECIES AND METHODS
*TOP
*ABSTRACT
*BIOMETRIC QUANTITIES AND THEIR...
*STUDY SPECIES AND METHODS
*HYPOTHESIS TESTING AND PARAMETER...
*RESULTS
*DISCUSSION
*APPENDIX A
*APPENDIX B
*LITERATURE CITED

M. guttatus (2n = 28; Scrophulariaceae) is a small wildflower, increasingly used as a model organism for genetic studies of floral variation and the evolution of plant mating systems (MACNAIR and CUMBES 1989 Down; RITLAND 1989 Down; CARR and FENSTER 1994 Down; FENSTER and RITLAND 1994 Down; ROBERTSON et al. 1994 Down). It occurs throughout western North America and local populations may be either annual or perennial. The plants of this study are derived from Iron Mountain, which is a large, annual (or winter annual) population located in the Cascade Mountains of central Oregon (see WILLIS 1996 Down, WILLIS 1999A Down, WILLIS 1999B Down). M. guttatus is a self-compatible, but the Iron Mountain population is predominantly outcrossing (WILLIS 1993B Down).

John H. Willis initiated ~1200 independent lines of M. guttatus in August of 1995. Each line was founded from the seed set of a separate field-collected plant from Iron Mountain. Each line was subsequently maintained in the greenhouse by single-seed descent (self-fertilization) for six generations. Six successive generations of self-fertilization yield an inbreeding coefficient of >0.98 and I treat the parents as fully inbred (f = 1) for the purpose of calculating covariances among relatives. As expected, these lines are almost completely homozygous at highly polymorphic microsatellite loci with different lines fixed for different alleles (WILLIS 1999A Down).

Three hundred of these sixth-generation lines were used to found 100 extended families following the experimental design of Fig 1. A single plant from the sire line was grown to maturity, self-fertilized, and used as a pollen source for a single plant from each of the dam lines. Each of the dams was also self-fertilized. A maximum of eight pots were seeded per family (40 per extended family) on March 9, 2001 in the University of Kansas greenhouses. At 2 weeks after seeding, all pots were thinned to a single plant (randomly selected) and were subsequently fertilized once per week. A series of morphological measurements were taken on the first flower produced by each plant (see KELLY and ARATHI 2003 Down) and the anthers were collected into a microcentrifuge tube. Flowering was monitored daily until 60 days postseeding, by which time the great majority of plants had flowered. In total, 2345 plants were measured (1113 outbred and 1232 inbred). The distribution of plants across families was not balanced and many extended families were completely missing one or more of their component families. This is relevant to the analysis. Maximum likelihood accommodates unbalanced designs better than least-squares or "ANOVA-style" methods do (SEARLE et al. 1992 Down).

Pollen number and the PSI were estimated for each sample using a Coulter counter model Z1 dual (Coulter, Miami). The microcentrifuge tubes were left open for 1 week after pollen collection to allow the anthers to dry and release their pollen. Each sample was then diluted into 20 ml of electrolyte solution and run through the Coulter counter. The machine counts the number of particles between two size thresholds (10 and 25 µM) and the number of particles that are larger than the upper threshold (>25 µM). The PSI is the proportion of grains in the upper size category. The size thresholds were set on the basis of results from a previous study showing a strong positive correlation between PSI and pollen viability as measured through standard staining techniques (KELLY et al. 2002 Down).

The distributions of pollen number and PSI within both inbred and outbred plants are given in Fig 2. A genetic analysis of the floral morphology is presented elsewhere (KELLY and ARATHI 2003 Down). One floral trait, corolla width, is included here as a comparison to the male fitness measures (outbred plants, mean = 19.13 mm, SD = 2.65; inbred plants, mean = 17.32, SD = 2.95). For analysis, the original measurements were divided by the outbred mean value for that trait. This linear transformation is commonly used in biometric studies as it allows comparison of parameter estimates for traits that differ in the magnitudes of their measurement values (HOULE 1992 Down; CHARLESWORTH and HUGHES 2000 Down). We also considered log-transformed values for pollen number and PSI, as this may be more appropriate if loci combine multiplicatively.






View larger version (123K):
In this window
In a new window
Download PPT slide
 
Figure 2. The distributions of male fitness components (untransformed) among outbred and inbred plants, respectively.


*  HYPOTHESIS TESTING AND PARAMETER ESTIMATION
*TOP
*ABSTRACT
*BIOMETRIC QUANTITIES AND THEIR...
*STUDY SPECIES AND METHODS
*HYPOTHESIS TESTING AND PARAMETER...
*RESULTS
*DISCUSSION
*APPENDIX A
*APPENDIX B
*LITERATURE CITED

Maximum likelihood is used for both testing the various models and estimating parameters. By methods described below, I find the parameter values for each of the mutational models (I, II, and III) that produce the highest likelihood for the data. These values are the maximum-likelihood estimates (MLEs) for the parameters of that model. The sufficiency of that model is tested by comparing the maximum-likelihood value for the model with the maximum likelihood of the "unconstrained model." The latter is based on direct estimates for the biometric quantities without the particular restrictions imposed by the various mutational models (see section on constraints below).

The resemblance of relatives may depend on environmental factors as well as genetic factors. These factors must be accounted for in order to infer genetic parameters from phenotypic covariances. Maternal effects are a potentially important source of resemblance among relatives. The variance of maternal effects, VM, can be estimated from this design because the comparison between inbred and outbred sibs (estimating COS) involves both male and female parents (Fig 1). Also, the magnitude of environmental deviations may be greater for inbred individuals than for outbred individuals (LERNER 1954 Down; KELLY and ARATHI 2003 Down). I thus estimate different environmental variances for outbred and inbred plants, VE(o) and VE(i).

With regard to hypothesis testing, it is important to distinguish universal constraints from model-specific constraints. The universal constraints apply to all of the models including the unconstrained model. These follow from basic features of inheritance, independent of allele frequencies or the nature of genetic effects. The universal constraints are that VE(o) and VE(i) must be positive; VM, CSS, CFS, COS, and CHS must be greater than or equal to zero; CHS <= 1/2CFS; and the magnitude of COS is limited by CSS and CFS (KELLY and ARATHI 2003 Down).

Subject to these universal constraints, MLEs were obtained for the unconstrained model by finding the set of values for MO, ß, CSS, CFS, COS, CHS, VM, VE(o), and VE(i) that maximize l, the log-likelihood function. Here, MO is the mean phenotype of outbred plants. The predicted mean phenotype of inbred plants is MO + ß. I use the standard multivariate normal model for the log-likelihood function,

(8)

where V is the variance-covariance matrix of individual measurements, z is the vector of trait values, X is an "incidence matrix" for fixed effects, {eta} is the vector of fixed effects, and C is a constant determined by the total sample size (SHAW 1987 Down; SEARLE et al. 1992 Down). The (co)variance parameters determine the numerical values of elements in V and {eta} = [MO, ß]T.Equation 8 assumes a multivariate normal distribution of trait values, but is quite robust to deviations from this distribution (SEARLE et al. 1992 Down; J. K. KELLY, unpublished simulation results).

The maximum of l was determined iteratively by using the first derivatives of l with regard to each parameter and the asymptotic dispersion matrix. Following SEARLE et al. 1992 Down, the vector of derivatives of l with regard to each of the fixed effects is

(9)

The derivative of l with regard to a particular variance component {theta} is

(10)

where ({partial}/{partial}{theta})(V) is the derivative of the variance-covariance matrix with regard to {theta} and tr[*] is the trace of a matrix.

Optimization was performed using two methods simultaneously, steepest ascent and scoring (SEARLE et al. 1992 Down, Chap. 6; ELIASON 1993 Down). By steepest ascent, the direction of movement within the parameter space is simply the vector of first derivatives. The direction for scoring is the product of the dispersion matrix and the derivatives vector. A log-likelihood value was calculated at 10 points along each directional vector usingEquation 8. The highest of these 20 likelihood values was chosen to determine the set of parameters for the next iteration. In most cases, the optimization routine used the scoring vector in the first few iterations and the steepest ascent vector close to the optimum. These calculations, as well as the simulations described below, were performed by a series of computer programs similar to those described in KELLY and ARATHI 2003 Down. These programs, written in the C programming language, are available from the author upon request.

The maximum log-likelihoods for each mutation model were also obtained usingEquation 8, although with different values in the variance-covariance matrix, V. The V matrix is a function of four genetic parameters in the unconstrained model (CSS, CFS, COS, and CHS), in addition to the environmental components. For model I, V contains only two genetic parameters, CSS and h, in addition to the same set of environmental components. The various comparisons between relatives within V are functions of these two parameters: COS = hCSS, CFS = 2h2CSS, and CHS = h2CSS. For model II, V also depends on only two genetic parameters: COS = h*CSS, CFS is given byEquation 7, and CHS = CFS/2. For model III, the elements of V are functions of three genetic parameters (CSS, CFS, and COS) with the genetic covariance of half-sibs equated to CFS/2. However, two of these parameters, CFS and COS, are constrained in value: COS < CSS/2 and CFS < COS. These model-specific constraints affect not only the numerical values within V, but also the derivatives of V with regard to each parameter [the ({partial}/{partial}{theta})(V) matrices ofEquation 10].

Models I, II, and III were tested by comparing the maximum likelihood obtained under that hypothesis (l0) to the maximum likelihood obtained from the unconstrained model (l1). The model was rejected if the likelihood-ratio statistic 2(l1 - l0) was greater than the appropriate critical value. It is generally assumed that, under the null hypothesis, 2(l1 - l0) follows a chi-square distribution with the number of degrees of freedom equal to the number of parameters distinguishing the two models. This would suggest a critical value 5.99 (for P < 0.05) for models I and II because the unconstrained model has two more parameters than either deleterious mutation model. There is no obvious choice for model III because only one parameter is eliminated (CHS = 1/2CFS) while other parameters are constrained in value (COS < 1/2CSS, CFS < COS). These inequalities do not translate in a simple way into degrees of freedom. Even the critical values for models I and II may be suspect because the universal constraints apply to the unconstrained model (see SHAW et al. 1998 Down). For these reasons, I employ a parametric bootstrapping routine to evaluate the significance of likelihood-ratio tests.

The parametric bootstrap involves repeated simulation of the data using the parameter estimates from the relevant mutational model (I, II, or III). Genotypic values for the parental lines are first sampled (from a normal distribution with variance CSS) and the progeny groups are then formed conditional on these parental values (and the values of the various parameter estimates). The numbers of individuals in each family are equivalent to the actual sample sizes from the experiment. Once the simulated data set is created, l1 and l0 are determined by applying the maximum-likelihood programs for the unconstrained and deleterious mutation models, respectively. I simulated 1000 data sets per test and the various models were fitted to each data set. The distribution of values for 2(l1 - l0), across simulated data sets, estimates the sampling distribution of the likelihood-ratio statistic. The P value associated with a particular test is (M + 1)/(R + 1), where M is the number of simulated values that exceed the actual likelihood-ratio value (calculated from the original data) and R is the number of simulations (DAVISON and HINKLEY 1997 Down, p. 148).

In addition to their use in hypothesis testing, the parametric bootstrapping programs can be used to estimate the joint sampling distribution of our ML estimators for any mutational model in this experimental design. These simulations thus provide a means to evaluate properties such as bias and sampling variance. The distribution of simulated data sets demonstrates the range of possible experimental outcomes, including values for the likelihood-ratio statistic, when the null hypothesis is correct (when genetic variation conforms to the mutational model under consideration). I conducted simulation studies of both models I and II for a range of parameter sets to investigate (1) the null distribution of the likelihood-ratio statistic and (2) the statistical properties of the estimators. Each simulated data set was equivalent in terms of sample sizes to the experimental study.


*  RESULTS
*TOP
*ABSTRACT
*BIOMETRIC QUANTITIES AND THEIR...
*STUDY SPECIES AND METHODS
*HYPOTHESIS TESTING AND PARAMETER...
*RESULTS
*DISCUSSION
*APPENDIX A
*APPENDIX B
*LITERATURE CITED

Simulation study:
Table 2 summarizes a representative set of results from the simulation study. Cases 1–5 are simulations from models I, while cases 6–10 are from model II. Each simulated data set was first fit to the appropriate mutational model and then to the unconstrained model. The second column of Table 2 gives an empirically determined "critical value" for that set of parameters. Ninety-five percent (950 of 1000) of the simulations yielded likelihood-ratio statistics that were less than this value. For model I, the empirical critical values differed among parameter sets but were always less than the chi-square value of 5.99. For model II, there was also variation among parameter sets, but empirical critical values were somewhat higher than those for model I.


 
View this table:
In this window
In a new window

 
Table 2. Simulation results for selected parameter values using the parametric bootstrapping programs

The maximum-likelihood parameter estimates for both models are approximately median unbiased (Table 2). They exhibit relatively low sampling variances. The procedures do occasionally fail to yield estimates when certain variance components approach boundary values. However, this occurred in only a small fraction of simulations (not at all for most parameter combinations) and these cases are excluded from Table 2.

Mimulus study:
Parameter estimates with standard errors (SEs) are given in Table 3 for each trait under the unconstrained model. The SEs are derived from the asymptotic dispersion matrix. These are only approximate and significantly nonzero estimates are routinely within 2 SE of zero (SHAW et al. 1998 Down; KELLY and ARATHI 2003 Down). The inbred genetic variance (CSS) is greater than the outbred variance (CFS) for each trait, although the magnitude of the difference varies greatly. For each trait, CFS > COS. This implies that the point estimates for these components violate even the weak conditions (model III). Finally, maternal effects are evident for flower size but have no apparent effect on the male fitness traits. The VM parameter is thus eliminated from the model for subsequent analyses of male fitness traits.


 
View this table:
In this window
In a new window

 
Table 3. Maximum-likelihood estimates for each trait for the unconstrained model

Maximum-likelihood values for each trait under model I are far lower than their associated values under the unconstrained model (Table 4). Each likelihood-ratio test is highly significant by both the parametric bootstrap and {chi}2. A P value <0.001 for the parametric bootstrap implies that the observed likelihood-ratio (LR) statistic was greater than any of the 1000 simulated values. The MLEs for each trait are also given. Confidence intervals could be established for these estimates by resampling (bootstrapping) over extended families. I do not bother to justify the parameter estimates here because the model on which they are based is rejected.


 
View this table:
In this window
In a new window

 
Table 4. Likelihood-ratio tests and parameter estimates for model I (fixed mutational effects) applied to each trait

There is generally better fit of the data to model II, although the model is still rejected for all traits (Table 5). As expected, the likelihood ratios are much lower for model III (Table 6). In fact, genetic variation in both Ln(pollen number) and Ln(PSI) are statistically consistent with the weak conditions. For all of the models, the likelihood ratios are lower for Ln-transformed male fitness values than for untransformed trait values.


 
View this table:
In this window
In a new window

 
Table 5. Likelihood-ratio tests and parameter estimates for model II applied to each trait


 
View this table:
In this window
In a new window

 
Table 6. Likelihood-ratio values and their associated P values for model III applied to each trait


*  DISCUSSION
*TOP
*ABSTRACT
*BIOMETRIC QUANTITIES AND THEIR...
*STUDY SPECIES AND METHODS
*HYPOTHESIS TESTING AND PARAMETER...
*RESULTS
*DISCUSSION
*APPENDIX A
*APPENDIX B
*LITERATURE CITED

This study clearly illustrates both the strengths and the faults of biometric estimates for mutational parameters. On the positive side, I was able to calibrate several different mutational models with an experiment of only moderate size (2345 plants). These models are surprisingly general, allowing variable mutational effects, maternal effects, and distinct environmental deviations for inbred and outbred plants. The parametric bootstrap simulations demonstrate that, if genetic variation is maintained according to one of the mutational models, the MLEs are surprisingly accurate (Table 2). Taken together, these observations suggest that biometric methods have the potential to yield accurate estimates for mutational parameters with much lower effort than is typically necessary for mutation-accumulation experiments.

The primary weakness associated with our parameter estimates is that they are based on models insufficient to explain the data. Likelihood-ratio tests reject models I and II for each of the traits (Table 4 and Table 5). This is not surprising for flower size, as a previous study demonstrated that this trait is inconsistent with a deleterious mutation model of genetic variation (KELLY and WILLIS 2001 Down). However, it is unexpected that models I and II were rejected for the male fitness traits. Evolutionary analyses of life-history (fitness) traits frequently assume that genetic variation is maintained by mutation-selection balance (CROW 1993 Down; HOULE et al. 1996 Down; DENG and LYNCH 1997 Down). Our results call this assumption into question.

The rejection of mutation-selection balance for male fitness traits is clearly tentative. Model III was not rejected for either Ln(pollen number) or Ln(PSI). This may reflect the fact that these "weak conditions" are consistent with a range of genetic architectures, at least within the scope of estimation error. However, it may also mean that variation is governed by a deleterious mutation model that is simply more complicated than models I or II (e.g., CABALLERO and KEIGHTLEY 1994 Down; ZHANG et al. 2002 Down). While our experiment is not sufficiently complex to both calibrate and test a more complicated mutational model, it is straightforward to expand the design to include more comparisons.

Models I and II fail because there is too much additive genetic variation in both flower size and male fitness traits. The genetic covariance of full siblings (CFS) is equal to the additive variance when the parents are fully inbred. The genetic covariance of half siblings (CHS) is equal to one-half the additive variance (Appendix A). Under the deleterious mutation models, the amount of additive variation relative to other components depends on the average dominance of deleterious mutations. For example, with h = 0.2 in model I, the variance among fully homozygous genotypes (CSS) should be 12.5 times greater than the variance among outbred genotypes (CFS) and COS should be 2.5 times greater than CFS.

The CSS estimates are substantially greater than CFS estimates, particularly for log-transformed male fitness traits. However, COS is invariably less than CFS when the unconstrained model is fit to the data (Table 3). The low estimates of COS relative to CSS indicate that, if variation is attributable to rare alleles, these alleles must be fairly recessive on average. However, the more recessive they are, the less additive variation there should be. Estimating h just from COS and CSS in Table 3, we would expect the additive variances of the male fitness traits (as measured by CFS and CHS) to be only ~20% of their actual estimated values. An excess of additive genetic variation has also been documented for several life-history traits of Drosophila melanogaster and used to reject mutation-selection balance as a sufficient explanation of variation (MUKAI and NAGANO 1983 Down; TAKANO et al. 1987 Down; CHARLESWORTH and HUGHES 2000 Down).

The higher-than-expected estimates for CFS and CHS also explain why the estimated dominance coefficients for male fitness traits under models I and II (Table 4 and Table 5) are greater than those obtained previously. WILLIS 1999A Down used a subset of these same inbred lines to estimate the average dominance of deleterious mutations by the "regression method" (MUKAI et al. 1972 Down; MUKAI and YAMAGUCHI 1974 Down; CABALLERO et al. 1997 Down). He randomly paired lines and mated them to produce F1 families. The regression of F1 family mean trait values onto the corresponding sum of mean values for their parental lines provides an estimate for the average dominance of deleterious alleles. By this method, WILLIS 1999A Down estimated the average dominance to be ~0.1 for both pollen number and pollen viability (both measurements log-transformed). In contrast, our estimates of h (from model I) are about twice are large (Table 4) and the estimates of (from model II) are 3.5 times greater (Table 5).

The comparison of F1 families with parental lines is actually contained within this design (Fig 1) and it is straightforward to show that the regression method is actually estimating COS/CSS. If we calculate COS/CSS from the estimates in Table 3, the values are quite close to those obtained by WILLIS 1999A Down: 0.13 for Ln(pollen) and 0.14 for Ln(PSI). In contrast, the maximum-likelihood estimates for h and (Table 4 and Table 5) depend not only on COS and CSS but also on CFS and CHS. Maximum likelihood finds the best "compromise" given the relative magnitudes of all four variance components. This requires dominance coefficients to be adjusted upward to account for the higher-than-expected values for CFS and CHS. If intermediate frequency alleles make a substantial contribution to variation in fitness, the regression method (COS/CSS) may actually provide a more accurate estimate for the dominance of rare alleles than the procedure used here (J. K. KELLY, unpublished results).

Assumptions:
Models I and II employ an equilibrium model to predict the frequencies of deleterious alleles as a function of mutational parameters. These frequencies are determined by the action of selection in nature (if the model is accurate), but the effects are estimated from variation measured in the greenhouse. This is noteworthy given that both the relationship between trait values and evolutionary fitness and the relationship between genotype and phenotype are likely to be different in the greenhouse than in the wild. Character heritabilities and genetic correlations may change when organisms are assayed in a novel environment (SERVICE and ROSE 1985 Down). In addition, the magnitude of inbreeding depression may be substantially greater under natural conditions than in the relatively benevolent greenhouse environment (e.g., DUDASH 1990 Down; CARR and EUBANKS 2002 Down). This may lead to biased estimates of mutational parameters. This difficulty is abrogated, to some extent, by the inclusion of {alpha} in the model (Table 1). This parameter effectively translates between the measured phenotype and evolutionary fitness.

The important question for this study is whether the change in growth environment is responsible for the poor fit of models I and II to the data. Could the transfer cause rejection of the mutational models even if the natural population is in mutation-selection equilibrium? It is difficult to provide a definitive answer to this question, but it seems unlikely given that the hypothesis tests do not impose stringent constraints on the values of mutational parameters. Instead, parameters such as h and s of model I are estimated from the data. Thus, we should not falsely reject the null model, even if mutational effects are different in the lab.

The allele frequencies at QTL are the key to the constraints. A false rejection of the mutational models might occur if alleles that are deleterious in nature become advantageous in the lab environment. Such alleles might increase in frequency and the resulting lab-adapted population would exhibit an excess of additive genetic variation relative to the field population. However, the opportunity for this kind of selection was greatly limited in this study. The inbred lines that constitute the parents of the breeding design were generated rapidly by self-fertilization, with random selection of progeny within families (WILLIS 1999A Down, WILLIS 1999B Down). Each line was initiated from a single family (see STUDY SPECIES AND METHODS). A rare allele, present within a single family, might fix within the line founded by that family. However, it could not spread within the laboratory population as a whole because each family contributed to one and only one line. Adaptation to lab conditions might be a more serious concern for randomly mating populations that propagate themselves for a number of generations prior to estimation of genetic parameters (see MATOS et al. 2000 Down; HOFFMANN et al. 2001 Down; LINNEN et al. 2001 Down; SGRO and PARTRIDGE 2001 Down).

An alternative approach would be to let the natural population adapt to the lab environment prior to biometric analyses. This has the advantage that population allele frequencies will settle to values dictated by the selective conditions of the lab environment (the relevant values of U, h, and s if variation in fitness is maintained in mutation-selection balance). Of course, any parameter estimates would be specific to the lab environment. Moreover, a satisfactory fit of the data to a mutational model would not conclusively indicate that natural variation is maintained by mutation-selection balance. Variation maintained by variable selection, in either space or time, or genotype-by-environment interaction, would likely be lost in the process of lab adaptation.

A second potential concern with this study is that the deleterious mutation models assume random mating in the ancestral population. While the Iron Mountain population of M. guttatus is primarily outcrossing (WILLIS 1993B Down), some self-fertilization does occur. Inbreeding should reduce the frequency of deleterious mutations relative to the random-mating expectation, µ/(hs). This would reduce the absolute value of the variance components but not their relative values (to a first approximation when deleterious alleles are rare). Because the relative values of variance components determine the sufficiency of the models in the likelihood-ratio tests, inbreeding is not the cause of failures for models I and II. In this vein, it is noteworthy that M. guttatus does exhibit large amounts of inbreeding depression in fitness-related traits despite its mating system (WILLIS 1993A Down, WILLIS 1993B Down; DUDASH and CARR 1998 Down). Moreover, most of this inbreeding depression was not purged during the formation of the lines used for this study (WILLIS 1999B Down).

Scale of measurement is another important concern for analyses based on patterns of variation. Scale transformations, such as the logarithm or square root of measurements, are routinely used in quantitative genetics to "stabilize" the variance (WRIGHT 1952 Down; LYNCH and WALSH 1998 Down, Chap. 11). In the present study, log-transformation substantially improved the fit of male fitness traits to each of the deleterious mutation models (Table 4 Table 5 Table 6). This may mean that loci affecting these traits combine in a way that is more nearly multiplicative than additive. However, it could also mean that log-transformation simply obscures deviations from the models.

The most appropriate scale for the application of the present methods is the one in which loci contribute in the most nearly additive way to trait variation. On this scale, the mean phenotype should decline linearly with the inbreeding coefficient (WRIGHT 1951 Down; KEMPTHORNE 1957 Down). Several studies have attempted to determine the functional relationship between mean trait values and the level of inbreeding in M. guttatus (WILLIS 1993A Down; CARR and DUDASH 1997 Down). However, in each of these studies, the effects of changing homozygosity are confounded with the effects of genetic purging. As a consequence, it is difficult to determine the best scale of measurement for estimating mutational parameters. More data in this area are clearly necessary.

Concluding comments:
More than anything, this study illustrates the need to combine model evaluation (hypothesis testing) with parameter estimation in biometric studies. Three distinct deleterious mutation models were tested. The first two models were rejected by the data for all characters, primarily because of an excess of additive genetic variation relative to other variance components. Thus, while most of the estimates in Table 4 and Table 5 are reasonable, they cannot be treated as unbiased. However, model III was not rejected for either of the log-transformed male fitness components (Table 6). This suggests that a more elaborate deleterious mutation model may prove sufficient.

How generally will mutation-selection balance prove sufficient as an explanation for variation in fitness components? This is clearly the issue that will confront future applications of the biometric method for estimating mutational parameters. The ubiquity of genetic variation is one of our most general observations of natural populations. The question of how evolutionary processes maintain this variation is one of the oldest in evolutionary biology. It remains one of the most important.


*  ACKNOWLEDGMENTS

This article benefited from comments by Norm Slade, Scott Williamson, Miles Stearns, Liza Holeski, Tara Marriage, Bruce Walsh, and two anonymous reviewers. Tyler Ternes, Scott Jablonski, and H. S. Arathi helped to collect many of the measurements. This research was supported by funds from National Science Foundation grant DEB-9903758 and the Murphy Scholarship fund.

Manuscript received November 11, 2002; Accepted for publication March 10, 2003.


*  APPENDIX A
*TOP