Genetics, Vol. 164, 1129-1138, July 2003, Copyright © 2003

Stochastic Search Variable Selection for Identifying Multiple Quantitative Trait Loci

Nengjun Yia,b, Varghese Georgea,b, and David B. Allisona,b,c
a Department of Biostatistics, University of Alabama, Birmingham, Alabama 35294-0022
b Section on Statistical Genetics, University of Alabama, Birmingham, Alabama 35294-0022
c Clinical Nutrition Research Center, University of Alabama, Birmingham, Alabama 35294-0022

Corresponding author: Nengjun Yi, Ryals Public Health Bldg., 1665 University Blvd., University of Alabama, Birmingham, AL 35294-0022., nyi{at}ms.soph.uab.edu (E-mail)

Communicating editor: J. B. WALSH


*  ABSTRACT
*TOP
*ABSTRACT
*METHODS
*SIMULATION STUDIES AND REAL...
*DISCUSSION
*LITERATURE CITED

In this article, we utilize stochastic search variable selection methodology to develop a Bayesian method for identifying multiple quantitative trait loci (QTL) for complex traits in experimental designs. The proposed procedure entails embedding multiple regression in a hierarchical normal mixture model, where latent indicators for all markers are used to identify the multiple markers. The markers with significant effects can be identified as those with higher posterior probability included in the model. A simple and easy-to-use Gibbs sampler is employed to generate samples from the joint posterior distribution of all unknowns including the latent indicators, genetic effects for all markers, and other model parameters. The proposed method was evaluated using simulated data and illustrated using a real data set. The results demonstrate that the proposed method works well under typical situations of most QTL studies in terms of number of markers and marker density.


MOST complex traits important to evolution, animal and plant breeding, and medical genetics are influenced by the segregation of multiple genes [quantitative trait loci (QTL)] and environmental factors. There is strong interest in inferring the number, genomic locations, and genetic effects of QTL. Recently, the most widely used methods are interval mapping (LANDER and BOTSTEIN 1989 Down; HALEY and KNOTT 1992 Down). These methods are developed on the basis of a single-QTL model and detect QTL effects at different genomic locations separately. Although these methods have been successfully applied to detect QTL for a number of traits in a number of organisms, they may result in biased estimates for QTL locations and effects when the traits are actually controlled by multiple, especially linked, QTL (e.g., HALEY and KNOTT 1992 Down).

For complex traits governed by multiple QTL, it is necessary to take the whole genome into account for estimating the number, locations, and genetic effects of QTL. It has been recently shown both theoretically and empirically that multiple-QTL methods can improve power in detecting QTL and eliminate biases in estimates of QTL locations and genetic effects that can be introduced by using a single-QTL model (e.g., HALEY and KNOTT 1992 Down). Composite interval mapping creates a relatively simple and systematic procedure to map multiple QTL (JANSEN and STAM 1994 Down; ZENG 1994 Down). This method detects and estimates each individual QTL by conditioning the test on other selected markers to absorb effects of other QTL. In the last decades, several statistical methods have been developed to detect multiple QTL and estimate their locations and effects simultaneously, including the multiple-interval mapping approach (KAO et al. 1999 Down; ZENG et al. 2000 Down), variable selection methods (BALL 2001 Down; PIEPHO and GAUCH 2001 Down; BROMAN and SPEED 2002 Down), and Bayesian methodology with the reversible-jump Markov chain Monte Carlo algorithm (SATAGOPAN and YANDELL 1996 Down; SATAGOPAN et al. 1996 Down; HEATH 1997 Down; SILLANPAA and ARJAS 1998 Down; STEPHENS and FISCH 1998 Down; XU and YI 2000 Down; YI and XU 2000 Down, YI and XU 2001 Down; HOESCHELE 2001 Down). These methods treat mapping multiple QTL as a problem of model determination and variable selection (SILLANPAA and CORANDER 2002 Down).

In this study, we propose an alternative Bayesian method for identifying multiple quantitative trait loci in experimental designs. Our method is based on a variable selection method, called stochastic search variable selection (SSVS), developed by GEORGE and MCCULLOCH 1993 Down. SSVS was originally introduced for linear regression models and has been adopted for more complex models such as generalized linear models (GEORGE and MCCULLOCH 1997 Down), log-linear models (NTZOUFRAS et al. 1997 Down), and multivariate regression models (BROWN et al. 1998 Down). The difference between SSVS and other variable selection approaches is that the dimensionality is kept constant across all possible models by limiting the posterior distribution of nonsignificant terms in a small neighborhood of zero instead of removing them from the model as is usually done. Due to this unique property, SSVS is able to (1) be easily implemented via the Gibbs sampler, (2) evaluate each variable effect on the dependent response, and (3) provide the posterior probability that each variable should be included in the model.

In most QTL studies, a large number of markers are available across the genome, and these markers are usually closely related. It has been hypothesized that the genetic variation of most quantitative traits is actually controlled by a few loci with large effects and a large number of loci with small effects (e.g., LYNCH and WALSH 1998 Down). Therefore, only a small number of markers are expected to have large effects on the trait because of being linked to large-effect loci, and most of the markers have nonsignificant effects. Our method considers all markers simultaneously and is able to evaluate not only marker effects of the entire genome, but also the posterior probability of each marker having significant effects.


*  METHODS
*TOP
*ABSTRACT
*METHODS
*SIMULATION STUDIES AND REAL...
*DISCUSSION
*LITERATURE CITED

Linear model:
We describe the method primarily for a mapping population with only two segregating genotypes, e.g., a backcross, double-haploid lines (DHLs), or recombinant inbred lines (RILs). Assume that we observe K markers along the genome. Among the K markers, some may be tightly linked to genes with large effects and therefore have large effects, and others may have only weak effects. Our aim here is to identify which markers are tightly linked to genes with large effects and to estimate the magnitude of their effects. For a continuously distributed trait, the observed phenotypic value of individual i, yi, can be described by the linear model,

(1)

where µ is the population mean, xij denotes the genotype of marker j for individual i and is defined by 0.5 or -0.5 for the two genotypes in the mapping population, {alpha}j is the effect size associated with marker j, and ei is the residual error assumed to follow N(0, {sigma}2e).

In practice, some marker data may be missing. Two methods deal with missing marker data. The first method is to replace the missing genotype xij by its conditional expectation E(xij|Mi) = 0.5p(xij = 0.5|Mi) - 0.5p(xij = -0.5|Mi), where Mi is observed marker data for individual i, and p(xij = 0.5|Mi) and p(xij = - 0.5|Mi) are the conditional probabilities that marker j for individual i takes the two genotypes, respectively, and can be calculated using the multipoint method (JIANG and ZENG 1997 Down). The second method is to impute the missing marker genotypes by sampling from the corresponding fully conditional probability distribution. If xij is missing, the fully conditional distribution can be derived as

and

(2)

where {alpha} = ({alpha}1, ... , {alpha}K), xi(-j) = (xi1, ... , xi(j-1), xi(j+1), ... , xiK), the conditional probability p(xij = 0.5|xi(j-1), xi(j+1)) depends on the recombinant rates between marker j and its flanking markers (j - 1) and (j + 1), and is a normal density function with mean and variance {sigma}2e. Obviously, the first method ignores the probability distributions of the missing genotypes and provides approximate estimates of the missing genotypes. In contrast, the second method can take the probability distributions into account. In this study, we use the second method to describe our Bayesian approach.

Stochastic search marker selection:
In variable selection problems, statistical models can be naturally represented by a set of binary indicator variables {gamma} = ({gamma}1, ... , {gamma}K), where {gamma}j = 1 or 0 represents the presence or absence of covariate j in the model, respectively. The difference between SSVS and other variable selection approaches is that the dimension of parameter space remains unchanged so that the Gibbs sampler can be easily used to explore both the model space and the parameter space (GEORGE and MCCULLOCH 1993 Down).

The SSVS constructs the prior distribution for ({gamma}, {alpha}) in two stages. The prior distribution of the model indicator variables p({gamma}) is chosen to reflect prior belief in whether particular markers are linked to QTL. A simple choice might have the {gamma}j's independent, so that

(3)

When no information is available, a uniform prior is chosen for each {gamma}j, i.e., p({gamma}j = 0) = p({gamma}j = 1) = 0.5.

The marker effects {alpha}j (j = 1, ... , K) are given normal prior distributions conditional on the corresponding indicators {gamma}j:

(4)

The prior parameters {tau}2j and c2j are chosen so that {tau}2j is "small" and c2j{tau}2j is "large." Hence, if {gamma}j = 0, the magnitude of the effect {alpha}j is small and then the prior distribution for {alpha}j forces this parameter to be close to zero. If {gamma}j = 1, the magnitude of the effect {alpha}j is large and then a nonzero estimate of {alpha}j should be included in the model and its posterior distribution will largely be determined by the data. On the basis of the above prior specification, a multivariate normal distribution can be used as the joint prior distribution for {alpha} conditional on {gamma}, given by

(5)

where R is the prior correlation matrix that is usually assigned to be R = I or R {propto} (xxT)-1, and D{gamma} = diag[a1{tau}1, ... ,aK{tau}K] with ai = 1 if {gamma}i = 0 and ai = ci if {gamma}i = 1. The prior distribution for µ is assumed to be Normal N({eta}, {tau}2) with prespecified prior mean {eta} and prior variance {tau}2. The prior for {sigma}2e is chosen to be of a scaled inverse -{chi}2 distribution, Inv - {chi}2({nu}0, {sigma}20), with known hyperparameters {nu}0 and {sigma}20.

On the basis of the prior specifications described above, we can use the Gibbs sampler to generate samples from the posterior distribution p(µ, {alpha}, {sigma}2e, {gamma}|y, M). Starting with an initial value (µ(0), {alpha}(0), {sigma}2e(0), {gamma}(0)), the Gibbs sampler proceeds as follows:

  1. Sample the missing marker genotypes from the full conditional posterior distributions described inEquation 2.

  2. Sample µ from the full conditional posterior distribution:

  3. The full conditional posterior distribution of {alpha} is multivariate normal, NK((xxT + {sigma}2e(D{gamma}RD{gamma})-1)-1xT(y - µ), {sigma}2e(xxT + {sigma}2e(D{gamma}RD{gamma})-1)-1). Sampling from this distribution requires recomputing (xxT + {sigma}2e(D{gamma}RD{gamma})-1)-1 on the basis of new values of {sigma}2e and {gamma} and thus may be costly. To avoid computing (xxT + {sigma}2e(D{gamma}RD{gamma})-1)-1, we sample {alpha}j (j = 1, ... , K) from the full conditional posterior distribution p({alpha}j|y, x, µ, {alpha}(-j), {sigma}2e,{gamma}), which is normal distribution (WANG et al. 1994 Down), where {alpha}(-j) denotes all terms of {alpha} except {alpha}j.

  4. Sample {sigma}2e from the full conditional posterior distribution:

  5. Sample {gamma}j from which is Bernoulli with probability

    where {gamma}(-j) denotes all terms of {gamma} except {gamma}j.

The above steps are repeated until a certain criterion for convergence is reached. The posterior sample converges in distribution to the joint posterior distribution, p(µ, {alpha}, {sigma}2e, {gamma}|y, M). The embedded subsequence thus converges to Generally, the markers with large effects will appear most frequently and quickly, making them easier to identify. Therefore, markers with high posterior probability included in the model will most probably be linked to large-effect QTL.


*  SIMULATION STUDIES AND REAL DATA ANALYSIS
*TOP
*ABSTRACT
*METHODS
*SIMULATION STUDIES AND REAL...
*DISCUSSION
*LITERATURE CITED

Simulation studies:
The applicability of the proposed method was demonstrated by analyzing simulated data. The experimental sample was from a backcross and contained 300 segregating individuals. Four chromosomes with length 100 cM each were simulated. Twenty-one codominant markers were evenly placed on each chromosome with marker intervals of 5 cM each. We simulated 8 large-effect QTL and 16 small-effect QTL controlling the expression of a quantitative trait. The locations of the simulated QTL and their genetic effects are shown in Table 1. The overall mean and the residual variance were set to be µ = 1 and respectively. The genetic variance of QTL j is calculated by where aj is the true genetic effect. Ignoring the covariance due to linkage, the total genetic variances for the 8 large-effect QTL and 16 small-effect QTL are 2 and 0.04, respectively. Therefore, the phenotypic variances explained by each large-effect QTL and each small-effect QTL are 8 and 0.08%, respectively. We randomly generated missing markers of 10%. The design was replicated five times and analyzed using the proposed method. The results averaged over the five replicates were reported.


 
View this table:
In this window
In a new window

 
Table 1. Locations and effects of simulated QTL

For each analysis, the initial values (µ(0), {alpha}(0), {sigma}2e(0), {gamma}(0)) were randomly generated from their priors. We used the uniform distribution as prior for {gamma} as described earlier. Following the principles developed in GEORGE and MCCULLOCH 1993 Down, GEORGE and MCCULLOCH 1997 Down for choosing {tau}j and cj, three different prior variances, i.e., ({tau}2j, c2j{tau}2j) = (0.001, 10), (0.01, 10), (0.01, 100), were used for the conditional prior distribution of the genetic effect {alpha}j. The prior correlation matrix was assigned to be the identity matrix, i.e., R = I. The prior distribution for µ was N(0, 2). The hyperparameter {nu}0 for {sigma}2e was set to zero, which yields the noninformative prior distribution p({sigma}2e) {propto} {sigma}-2e (GELMAN et al. 1995 Down).

The Gibbs sampler was run for 50,000 cycles after discarding the first 2000 cycles for the burn-in period. It took ~1 hr to generate each sample with a C++ program on a Pentium 4 PC. The chain was thinned (saved one iteration in every 5 cycles) to reduce serial correlation in the stored samples so that the total number of samples kept in the post-Bayesian analysis was 10,000 (GELMAN et al. 1995 Down). The stored sample was used to infer the parameters of interest.

The estimated posterior probabilities for the marker indicators {gamma}j (j = 1, ... , K) are given in Fig 1 (left). The posterior probability p({gamma}j|y, M) was obtained by counting the number of samples in which the marker indicator {gamma}j is 1, divided by the total of number of samples. As shown in Fig 1, for almost all markers, the posterior probabilities for the first prior setting were larger than those for the other two settings, and the posterior probabilities for the second prior setting were larger than those for the third. For the three sets of prior variances, however, the profiles of the posterior probability distributions were similar. These profiles are very peaked, suggesting that the markers corresponding to the peaks have much larger effects than the rest. From these profiles, we also found that for most situations two markers flanking a large-effect QTL have very different posterior probabilities, one being large and another being close to zero. Therefore, our method may be powerful in distinguishing closely linked markers. There are a total of eight main peaks along the four simulated chromosomes on the profiles, and two are on one chromosome. It can be observed that the markers corresponding to the peaks are those that are the closest to the simulated eight large-effect QTL. Therefore, our Bayesian method was shown to be powerful for identifying multiple QTL. None of the simulated small-effect QTL were identified in our analyses. Actually, these QTL had effects close to zero and thus were picked up only occasionally.



View larger version (30K):
In this window
In a new window
Download PPT slide
 
Figure 1. Simulation study. Posterior probabilities of marker indicators (left) and marker effects (right) are plotted against marker locations along the genome. Red, ; green, blue,

The profiles of marker effects are displayed in Fig 1 (right). Although empirical posterior distribution for each marker effect can be depicted, for simplicity we report only the posterior mean over the samples. The marker effects were estimated to be essentially identical for the three sets of prior variances ({tau}2j, c2j{tau}2j). Therefore, these prior variances had an ignorable influence on the posterior inference about the marker effects. As in the case of the posterior probability distribution of marker indicators, the profiles of marker effects also have eight obvious peaks, each corresponding to a marker that is the closest to a large-effect QTL. For markers far from the large-effect QTL, their effects were estimated to be close to zero. From Fig 1, it can be seen that the accuracy of the estimate for a marker effect depends on the estimated posterior probability of the marker indicator. When the posterior probability was estimated to be close to one, the estimate of the marker effect was close to the true value. Otherwise, the marker effects were slightly underestimated. These results are expected because the marker effects with the corresponding indicators being zero are forced to be close to zero by the priors. However, we observed that the conditional estimates of marker effects were close to the true value if we used only the posterior samples with the corresponding indicators equal to one.

The empirical posterior distributions for the overall mean and the residual variance are depicted in Fig 2 (a–f). The estimated means for these two parameters were very close to the simulated values and the standard deviations were small, showing that the overall mean and the residual variance were estimated with precision.



View larger version (34K):
In this window
In a new window
Download PPT slide
 
Figure 2. Simulation study. Posterior distributions of overall mean and residual variance are shown: (a) overall mean for (b) residual variance for ; (c) overall mean for ; (d) residual variance for ; (e) overall mean for ; (f) residual variance for .

For comparison, we also performed the single-marker analyses with the simple regression method for each marker and the usual multiple regression analysis with all markers as predictors. For the single-marker analyses, the profiles of the t-test statistics and the marker effects on chromosome 1 are shown in Fig 3. Apparently, these two profiles have only one peak covering a wide range. This shows that the single-marker analyses fail to separate the two linked QTL. It is also obvious that the marker effects were seriously overestimated. Since the marker density is quite high, results of single-marker analyses should be close to that of the interval mapping. Therefore, the proposed method was shown to be more powerful than the widely used interval-mapping method for detecting multiple QTL. Fig 4 shows the plot of the marker effects against the genome location (centimorgans) of the markers from multiple regression analysis. Obviously, the effects of most markers far from the large-effect QTL have not shrunk in the usual multiple regression analysis, indicating that multiple regression failed to detect clear signals of QTL. A common feature of the proposed Bayesian method and the usual multiple regression method is that all markers are included in the model. The clear advantage of the proposed method is that it uses two different prior distributions for the markers, which force the posterior means of insignificant markers to be close to zero and the posterior distributions of significant markers to be determined by the data.



View larger version (12K):
In this window
In a new window
Download PPT slide
 
Figure 3. Single-marker regression analysis for chromosome 1. (a) Values of t-test statistic; (b) marker effects.



View larger version (30K):
In this window
In a new window
Download PPT slide
 
Figure 4. Marker effects plotted against marker locations along the genome from multiple-marker regression analysis.

Real data analysis:
Data from the North American Barley Genome Mapping Project (TINKER et al. 1996 Down) were analyzed using the proposed Bayesian method. Seven traits were investigated in the project: heading, yield, maturity, height, lodging, kernel weight, and test weight. We present only the results of "heading" here. The DH (double-haploid) population contained 145 lines (n = 145), each grown in a range of environments. A total of 127 mapped markers (K = 127) covering a 1500-cM genome along seven linkage groups were used in the analysis. The average phenotypic values across the environments were calculated for each line and these average values were treated as the original phenotypic values (yi) for the analysis. These phenotypic values were further standardized. The standardized records were used in the analysis.

The prior distributions for (µ, {alpha}, {sigma}2e, {gamma}), the prior variances ({tau}2j, c2j{tau}2j), the length of the Gibbs sampler, and the thinning scheme of the posterior sample were set to be the same as those in the analysis of our simulated data described above. The initial values (µ(0), {alpha}(0), {sigma}2e(0), {gamma}(0)) were randomly generated from their priors.

For the three different prior specifications, the plots of the posterior probabilities of the marker indicators are shown in Fig 5, and the marker effects are depicted in Fig 6. As observed in our simulation studies, for almost all markers, the posterior probabilities for the first prior setting were larger than those for the other two settings, and the posterior probabilities for the second prior setting were larger than those for the third. However, the profiles of the posterior probability distributions were proximate. As shown in Fig 6, the marker effects were estimated to be essentially identical for the three sets of prior variances ({tau}2j, c2j{tau}2j). For the first prior setting, it was found that five markers with posterior probabilities from 0.75 to 0.95 and marker effects of ~ ±0.4, are located at chromosomes 1, 3, 4, and 6, respectively. We also found five markers on chromosomes 3, 4, 5, and 6, respectively, with posterior probabilities from 0.4 to 0.6. Using the interval mapping of LANDER and BOTSTEIN 1989 Down and the composite interval mapping of ZENG 1994 Down, however, TINKER et al. 1996 Down declared only three QTL on chromosomes 1, 4, and 7, respectively, as significant (data not shown here). Two markers on chromosome 1 were found to have the posterior probability of ~0.76 and the effect of ~ -0.4. However, TINKER et al. 1996 Down declared only one QTL on chromosome 1 as significant.



View larger version (20K):
In this window
In a new window
Download PPT slide
 
Figure 5. Posterior probabilities of marker indicators for heading in barley. Ticks on the horizontal axes represent markers. Red, green, blue,



View larger version (21K):
In this window
In a new window
Download PPT slide
 
Figure 6. Marker effects plotted against marker locations along the genome for heading in barley. Ticks on the horizontal axes represent markers. Red, ; green, ; blue, .

For the first prior setting, the posterior means of the overall mean and the residual variance were estimated to be = 0.0882 and , respectively. The proportion of phenotypic variance explained by the markers is calculated as , where is the phenotypic variance for the standardized phenotype. Thus, the proportion of phenotypic variance explained by the markers was estimated to be ~69%. From the estimates of the marker effects j (j = 1, ... , K), we calculated the proportion of phenotypic variance explained by marker j as and found that the proportion of phenotypic variance explained by each of the five strongest markers was ~4%.


*  DISCUSSION
*TOP
*ABSTRACT
*METHODS
*SIMULATION STUDIES AND REAL...
*DISCUSSION
*LITERATURE CITED

Mapping multiple QTL can be viewed essentially as a problem of model selection (e.g., BROMAN and SPEED 2002 Down; SILLANPAA and CORANDER 2002 Down). A variety of statistical selection procedures including both non-Bayesian and Bayesian methods have been developed for conventional statistical models. Some of these procedures have been modified to map multiple QTL. In this study, we developed a Markov chain Monte Carlo (MCMC) algorithm on the basis of the SSVS approach of GEORGE and MCCULLOCH 1993 Down for identifying multiple markers. The proposed method was shown to be extremely efficient under typical situations of most QTL studies in terms of the number of markers and the marker density. Compared with the existing Bayesian methods, such as the reversible-jump MCMC, the SSVS approach has advantages on simplicity of computation and diagnosis of convergence (GEORGE and MCCULLOCH 1997 Down). The SSVS procedure can even be implemented using the publicly available software BUGS (CONGDON 2002 Down) and thus can be widely used in QTL studies.

An essential element of the performance of our Gibbs sampler is its ability to move between two different values of the indicator variable {gamma}j. In the analyses of the simulated data and real data, the value of {gamma}j changed frequently, suggesting that the proposed algorithm mixes well and the chain converges quickly. However, as described in GEORGE and MCCULLOCH 1993 Down, GEORGE and MCCULLOCH 1997 Down, convergence can be very slow and thus computational problems can arise when c2j is set too large. This setup can lead to very small transition probabilities for {gamma}j to go from 0 to 1 or from 1 to 0. After extensive testing, GEORGE and MCCULLOCH 1997 Down indicated that these problems can be avoided whenever c2j <= 10,000. Also, mixing behavior and convergence of the chain is expected to be affected by marker density. In the cases of high-density maps with hundreds of markers, one might consider a second iteration of SSVS with a reduced set of markers based on the first run (GEORGE and MCCULLOCH 1993 Down). This two-stage strategy may improve accuracy of estimating the marker effects and the posterior probabilities.

Recently, XU 2003 Down proposed a Bayesian method under the random regression model to simultaneously estimate genetic effects associated with markers of the entire genome in inbred line crosses. In his Bayesian framework, each genetic effect was assigned a normal prior distribution with mean zero and a unique variance. The effect-specific prior variance was further assigned a vague prior so that the variance was estimated from the data. This approach is analogous to the Bayesian method of MEUWISSEN et al. 2001 Down for BLUP prediction of gene effects in outbred populations. For a backcross population with K markers, Xu's method needs to estimate K different marker effects {alpha}j (j = 1, ... , K) and K different variances where {alpha}j ~N(0, {sigma}2j). Although this approach can evaluate each marker effect, it does not provide a probability statement about statistical significance for marker effects. In our approach, indicator variables are introduced but all effects included in the model have the same prior variance and all effects excluded from the model have another common prior variance. Therefore, our method not only estimates each marker effect, but also provides the posterior probability that each marker has a significant effect on the trait. The introduction of indicator variables may allow a large number of markers to be included in the model. It is worth noting that both the random-model approach and our method include all markers in the analysis and thus may have the ability to control the genetic variances of a large number of small-effect QTL. Whether this property can improve power in detecting multiple QTL and estimating the genetic effects deserves further investigation.

We have applied the SSVS approach to identify multiple QTL by analyzing all markers of the whole genome. When the markers are densely and regularly spaced, the marker analysis would provide reasonable estimates of marker effects and marker posterior probabilities even when QTL are located in the marker intervals. If the marker density is low and irregularly spaced, however, the marker analysis will be biased. In these situations, however, we can extend the proposed method to allow for finer structure mapping by two ways. The first approach is to use the multiple imputation method to generate the missing genotypes at grids of points between markers (SEN and CHURCHILL 2001 Down). The imputed genotypes are then incorporated into our Bayesian procedure. The second approach is to substitute markers by positions in the marker intervals if we assume that at most one QTL is on any marker interval. This approach requires searching the optimal positions within the marker intervals. The algorithms for updating QTL positions have been developed (e.g., XU and YI 2000 Down; YI and XU 2000 Down, YI and XU 2001 Down) and can be easily incorporated into our procedure.

The SSVS approach has been extended to the multivariate regression model (BROWN et al. 1998 Down). In QTL-mapping studies, the joint analysis of multiple traits can provide formal procedures to test a number of biologically interesting hypotheses concerning the nature of genetic correlations between different traits. Under certain situations, the joint analysis can improve statistical power in detecting QTL and estimating the genetic parameters. We can extend the proposed method by applying the SSVS approach to jointly identify QTL for correlated multiple traits. In this study, we considered mapping multiple QTL under the nonepistatic model. A growing number of experiments provide strong evidence of the presence of interactions between genes for many complex traits. Under the epistatic model, the number of genetic effects increases exponentially as the number of markers increases. The multiple-stage SSVS approach can be employed to identify interacting QTL.


*  ACKNOWLEDGMENTS

We are grateful to two anonymous reviewers for their helpful comments. This work was supported by National Institutes of Health grants R01ES009912, P41RR006009, R01DK054298, and P30DK56336 to D.B.A.

Manuscript received December 5, 2002; Accepted for publication March 13, 2003.


*  LITERATURE CITED
*TOP
*ABSTRACT
*METHODS
*SIMULATION STUDIES AND REAL...
*DISCUSSION
*LITERATURE CITED

BALL, R. D., 2001  Bayesian methods for quantitative trait loci mapping based on model selection: approximate analysis using the Bayesian information criterion. Genetics 159:1351-1364.[Abstract/Free Full Text]

BROMAN, K. W. and T. P. SPEED, 2002  A model selection approach for the identification of quantitative trait loci in experimental crosses. J. R. Stat. Soc. B 64:641-656.

BROWN, P. J., M. VANNUCCI, and T. FEARN, 1998  Multivariate Bayesian variable selection and prediction. J. R. Stat. Soc. B 60:627-641.

CONGDON, P., 2002 Bayesian Statistical Modelling. John Wiley & Sons, New York.

GELMAN, A., J. CARLIN, H. STERN and D. RUBIN, 1995 Bayesian Data Analysis. Chapman & Hall, London.

GEORGE, E. I. and R. E. MCCULLOCH, 1993  Variable selection via Gibbs sampling. J. Am. Stat. Assoc. 88:881-889.

GEORGE, E. I. and R. E. MCCULLOCH, 1997  Approaches for Bayesian variable selection. Stat. Sin. 7:339-373.

HALEY, C. S. and S. A. KNOTT, 1992  A simple regression method for mapping quantitative trait loci in line crosses using flanking markers. Heredity 69:315-324.[Medline]

HEATH, S. C., 1997  Markov chain Monte Carlo segregation and linkage analysis for oligogenic models. Am. J. Hum. Genet. 61:748-760.[Medline]

HOESCHELE, I., 2001 Mapping quantitative trait loci in outbred pedigrees, pp. 599–644 in Handbook of Statistical Genetics, edited by D. J. BALDING, M. BISHOP and C. CANNINGS. Wiley, New York.

JANSEN, R. C. and P. STAM, 1994  High resolution of quantitative traits into multiple loci via interval mapping. Genetics 136:1447-1455.[Abstract]

JIANG, C. and Z-B. ZENG, 1997  Mapping quantitative trait loci with dominant and missing markers in various crosses from two inbred lines. Genetica 101:47-58.[Medline]

KAO, C. H., Z-B. ZENG, and R. D. TEASDALE, 1999  Multiple interval mapping for quantitative trait loci. Genetics 152:1203-1216.[Abstract/Free Full Text]

LANDER, E. S. and D. BOTSTEIN, 1989  Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics 121:185-199.[Abstract/Free Full Text]

LYNCH, M., and B. WALSH, 1998 Genetics and Analysis of Quantitative Traits. Sinauer Associates, Sunderland, MA.

MEUWISSEN, T. H. E., B. J. HAYES, and M. E. GODDARD, 2001  Prediction of total genetic value using genome-wide dense marker maps. Genetics 157:1819-1829.[Abstract/Free Full Text]

NTZOUFRAS, I., J. J. FORSTER and P. DELLAPORTAS, 1997 Stochastic search variable selection for log-linear models. Technical Report. Faculty of Mathematics, Southampton University, Southampton, UK.

PIEPHO, H.-P. and H. G. GAUCH, JR., 2001  Marker pair selection for mapping quantitative trait loci. Genetics 157:433-444.[Abstract/Free Full Text]

SATAGOPAN, J. M., and B. S. YANDELL, 1996 Estimating the number of quantitative trait loci via Bayesian model determination. Abstracts of the Joint Statistical Meeting, October 1996, Chicago.

SATAGOPAN, J. M., B. S. YANDELL, M. A. NEWTON, and T. C. OSBORN, 1996  A Bayesian approach to detect quantitative trait loci using Markov chain Monte Carlo. Genetics 144:805-816.[Abstract]

SEN, S. and G. CHURCHILL, 2001  A statistical framework for quantitative trait mapping. Genetics 159:371-387.[Abstract/Free Full Text]

SILLANPÄÄ, M. J. and E. ARJAS, 1998  Bayesian mapping of multiple quantitative trait loci from incomplete inbred line cross data. Genetics 148:1373-1388.[Abstract/Free Full Text]

SILLANPÄÄ, M. J. and J. CORANDER, 2002  Model choice in gene mapping: what and why. Trends Genet. 18:301-307.[Medline]

STEPHENS, D. A. and R. D. FISCH, 1998  Bayesian analysis of quantitative trait locus data using reversible jump Markov chain Monte Carlo. Biometrics 54:1334-1347.

TINKER, N. A., D. E. MATHER, B. G. ROSSNAGEL, K. J. KASHA, and A. KLEINHOFS, 1996  Regions of the genome that affect agronomic performance in two-row barley. Crop Sci. 36:1053-1062.[Abstract/Free Full Text]

WANG, C. S., J. J. RUTLEDGE, and D. GIANOLA, 1994  Bayesian analysis of mixed linear models via Gibbs sampling with an application to litter size in Iberian pigs. Genet. Sel. Evol. 26:91-115.

XU, S., 2003  Estimating polygenic effects using markers of the entire genome. Genetics 163:789-801.[Abstract/Free Full Text]

XU, S. and N. YI, 2000  Mixed model analysis of quantitative trait loci. Proc. Natl. Acad. Sci. USA 97:14542-14547.[Abstract/Free Full Text]

YI, N. and S. XU, 2000  Bayesian mapping of quantitative trait loci for complex binary traits. Genetics 155:1391-1403.[Abstract/Free Full Text]

YI, N. and S. XU, 2001  Bayesian mapping of quantitative trait loci under complicated mating designs. Genetics 157:1759-1771.[Abstract/Free Full Text]

ZENG, Z-B., 1994  Precision mapping of quantitative trait loci. Genetics 136:1457-1468.[Abstract]

ZENG, Z-B., C.-H. KAO, and C. J. BASTEN, 2000  Estimating the genetic architecture of quantitative traits. Genet. Res. 74:279-289.




This article has been cited by other articles:


Home page
GeneticsHome page
L. Zhang, H. Li, Z. Li, and J. Wang
Interactions Between Markers Can Be Caused by the Dominance Effect of Quantitative Trait Loci
Genetics, October 1, 2008; 180(2): 1177 - 1190.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
N. Yi and S. Xu
Bayesian LASSO for Quantitative Trait Loci Mapping
Genetics, June 1, 2008; 179(2): 1045 - 1055.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
D. Gianola and J. B. C. H. M. van Kaam
Reproducing Kernel Hilbert Spaces Regression Methods for Genomic Assisted Prediction of Quantitative Traits
Genetics, April 1, 2008; 178(4): 2289 - 2303.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
R. D. Ball
Quantifying Evidence for Candidate Gene Polymorphisms: Bayesian Analysis Combining Sequence-Specific and Quantitative Trait Loci Colocation Information
Genetics, December 1, 2007; 177(4): 2399 - 2416.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
S. Xu
Derivation of the Shrinkage Estimates of Quantitative Trait Locus Effects
Genetics, October 1, 2007; 177(2): 1255 - 1258.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
R. Yang and S. Xu
Bayesian Shrinkage Analysis of Quantitative Trait Loci for Dynamic Traits
Genetics, June 1, 2007; 176(2): 1169 - 1185.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
S. Xu and Z. Jia
Genomewide Analysis of Epistatic Effects for Quantitative Traits in Barley
Genetics, April 1, 2007; 175(4): 1955 - 1963.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
H. Li, G. Ye, and J. Wang
A Modified Algorithm for the Improvement of Composite Interval Mapping
Genetics, January 1, 2007; 175(1): 361 - 374.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
M. J. Sillanpaa and M. Bhattacharjee
Association Mapping of Complex Trait Loci With Context-Dependent Effects and Unknown Context Variable
Genetics, November 1, 2006; 174(3): 1597 - 1611.
[Abstract] [Full Text] [PDF]


Home page
Brief BioinformHome page
G. Montana
Statistical methods in genetics.
Brief Bioinform, September 1, 2006; 7(3): 297 - 308.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
D. Gianola, R. L. Fernando, and A. Stella
Genomic-Assisted Prediction of Genetic Value With Semiparametric Procedures
Genetics, July 1, 2006; 173(3): 1761 - 1776.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
H. Wang, Y.-M. Zhang, X. Li, G. L. Masinde, S. Mohan, D. J. Baylink, and S. Xu
Bayesian Shrinkage Estimation of Quantitative Trait Loci Parameters
Genetics, May 1, 2005; 170(1): 465 - 480.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
M. Zhang, K. L. Montooth, M. T. Wells, A. G. Clark, and D. Zhang
Mapping Multiple Quantitative Trait Loci by Bayesian Classification
Genetics, April 1, 2005; 169(4): 2305 - 2318.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
M. J. Sillanpaa and M. Bhattacharjee
Bayesian Association-Based Fine Mapping in Small Chromosomal Segments
Genetics, January 1, 2005; 169(1): 427 - 439.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
N. Yi
A Unified Markov Chain Monte Carlo Framework for Mapping Multiple Quantitative Trait Loci
Genetics, June 1, 2004; 167(2): 967 - 975.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
M. Bogdan, J. K. Ghosh, and R. W. Doerge
Modifying the Schwarz Bayesian Information Criterion to Locate Multiple Interacting Quantitative Trait Loci
Genetics, June 1, 2004; 167(2): 989 - 999.
[Abstract] [Full Text] [PDF]