- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Sorensen, D.
- Articles by Andersen, S.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Sorensen, D.
- Articles by Andersen, S.
Bayesian Analysis of Response to Selection: A Case Study Using Litter Size in Danish Yorkshire Pigs
D. Sorensena, A. Vernersenb, and S. Andersenba Danish Institute of Agricultural Sciences, DK-8830 Tjele, Denmark
b National Committee for Pig Production, Axelborg, DK-1609 Copenhagen, Denmark
Corresponding author: D. Sorensen, Danish Institute of Agricultural Sciences, Department of Animal Breeding and Genetics, P.O. Box 50, DK-8830 Tjele, Denmark., snfds{at}genetics.agrsci.dk (E-mail)
Communicating editor: C. HALEY
| ABSTRACT |
|---|
Implementation of a Bayesian analysis of a selection experiment is illustrated using litter size [total number of piglets born (TNB)] in Danish Yorkshire pigs. Other traits studied include average litter weight at birth (WTAB) and proportion of piglets born dead (PRBD). Response to selection for TNB was analyzed with a number of models, which differed in their level of hierarchy, in their prior distributions, and in the parametric form of the likelihoods. A model assessment study favored a particular form of an additive genetic model. With this model, the Monte Carlo estimate of the 95% probability interval of response to selection was (0.23; 0.60), with a posterior mean of 0.43 piglets. WTAB showed a correlated response of -7.2 g, with a 95% probability interval equal to (-33.1; 18.9). The posterior mean of the genetic correlation between TNB and WTAB was -0.23 with a 95% probability interval equal to (-0.46; -0.01). PRBD was studied informally; it increases with larger litters, when litter size is >7 piglets born. A number of methodological issues related to the Bayesian model assessment study are discussed, as well as the genetic consequences of inferring response to selection using additive genetic models.
OVER the last years, pig breeding programs have emphasized selection pressure on fertility traits. While it is recognized that fertility is a composite trait, most of the efforts to improve sow reproductive performance have been placed on selection for litter size. This is partly because litter size is easily recorded and partly because several studies have indicated that it is the most important economic component of sow reproductive performance (![]()
![]()
![]()
There is a substantial amount of information in the literature about genetic parameters for litter size in pigs; much of this has been reviewed by ![]()
![]()
![]()
0.3 to 0.5 pigs per litter and generation, depending on the amount of information included to predict breeding values (![]()
![]()
![]()
![]()
![]()
![]()
![]()
Against such a background, it was decided in the late 1980s to generate convincing experimental evidence that litter size could be improved by selection. The criterion of selection should be inexpensive and simple to measure and the choice fell on total number of piglets born per litter. A large-scale selection experiment was conducted from 1989 to 1992. The experiment's results have been reported on occasion (![]()
This article is an attempt to fill in this lack. The emphasis of this work is on the methodological aspects of the analysis. Over the last years, with the introduction of Markov chain Monte Carlo (MCMC) techniques, there has been a revival of Bayesian methods in the statistical literature, and geneticists have been receptive to the inferential possibilities that the Bayesian paradigm offers. This article illustrates the type of information that can be extracted from a selection experiment when MCMC is used in combination with the Bayesian approach.
| EXPERIMENTAL DESIGN AND DATA |
|---|
The data originate from the database of the national breeding program. Litter size (total number of piglets born per litter) had been recorded in the breeding herds, even though it had not been part of the breeding goal. In 1989, the large-scale selection experiment was started using data from the Yorkshire breed only. Information on litter size was gathered from all matings in the breeding herds, and predicted breeding values for each mating were computed using a model that is described below.
The data set used to compute predicted breeding values was produced as follows. On the basis of the matings available during an 8-month period, records on ancestors and collateral relatives were generated and updated each month. This data file consisted of records from 8988 litters and a total of 5796 animals, of which 3534 were females with records. These animals are referred to simply as the base. The pedigree of this file included animals born from 1981 until 1988.
During the period of 8 months, from the highest scoring 131 predicted breeding values of the registered matings, 1 gilt was chosen from each resulting litter, and from the highest scoring 36 predicted breeding values, 1 male piglet was chosen from each resulting litter. These 131 gilts and 36 male piglets constituted the selected line. A contemporaneous control line was created, choosing 130 gilts and 38 male piglets at random from the litters available. Selection took place on a monthly basis, during 8 months, and not just once at the end of the 8-month period. Table 1 shows the contribution of the different parities to the 261 sows from the selected and control lines, as well as the raw means of total number born per litter of the mothers of these 261 sows, within parity and line, and among the 3534 females that belong to the base. The weighted average of the difference across parities of records on females that contributed piglets to the contemporaneous selected and control lines is 1.6 piglets per litter.
|
Animals from both lines were purchased from the breeding herds and were transferred to a common farm immediately after weaning at 1723 days of age. On arrival at the common farm, the piglets underwent a period of quarantine in a nursery.
At the common farm, females of each line were randomly mated (mating took place after the second oestrus) with males of the corresponding line and produced two parities. These are records from generation 1. From the first of these two parities, one gilt was chosen from each litter. At sexual maturity, these females were mated twice to randomly chosen males from the appropriate line and produced, therefore, two litters. We refer to these as the records from generation 2. There was no directional selection to produce litters in generation 2, but the genotypic value of these females is expected to be a little higher than those from generation 1, because their 36 fathers represent a smaller proportion of the tail of the distribution than the fathers to females from generation 1.
In total, there were 1072 litters in the common farm, approximately equally distributed between the selected and the control lines. The total number of litters in the complete data set was 10,060 (8988 from the base plus 1072 from the selected and the control lines) and the total pedigree file consisted of 6437 individuals (5796 from the base and 641 from the selected and the control lines).
Animals from both lines were kept under identical conditions and it was not possible for the staff in the farm to identify to which line animals belonged. As customarily practiced in the breeding herds, litters in the common farm were also standardized at birth according to total number of piglets born. The traits recorded on the farm were total number of piglets born per litter (TNB), number of piglets born alive per litter (NBA), and average weight of liveborn piglets (WTAB). The latter was recorded within 24 hr after birth; the whole litter was weighed, rather than individual piglets.
Among the 6437 individuals in the pedigree file, 3537 had an inbreeding coefficient >0.0. The average inbreeding coefficient among inbred individuals was 2.4%; the modal value was a little >1%, and the maximum value was 32%. Only 88 inbreeding coefficients were >10%.
Selection criterionmodel for prediction of breeding values:
The data available on litter size before the selection experiment was initiated consisted of records on TNB only. The criterion of selection was the average predicted breeding value of the sire and dam of the litter that contributed piglets to the selected and control lines. The model was a repeatability additive genetic model and the method of inference was best linear unbiased prediction (BLUP; i.e., ![]()
![]()
2a, where A is the additive genetic relationship matrix among all breeding values in the data, and
2a is the additive genetic variance. The vectors of permanent environmental effects and residual effects were assumed to follow normal distributions with zero mean, and with variances I
2p and I
2e, respectively, where
2p denotes the variance due to permanent environmental effects and
2e is the residual variance. These three random vectors were assumed to be mutually independent.
The design used in the experiment was chosen on the basis of previous calculations using the above model. These calculations indicated a predicted deviation between selected and control line of 0.49 piglets per litter. With the experimental design implemented, the approximate probability of detecting such a deviation was estimated as 71%.
| STATISTICAL METHODS |
|---|
The main objective of this experiment has been to confirm that litter size (TNB) responds to selection in the Danish Yorkshire breed. To address this concern, a number of single-trait analyses of TNB were performed using a variety of models. The experiment also generates data to study correlated responses in proportion of piglets born dead (PRBD) and average piglet weight at birth, although it was not specifically designed to address these questions. Correlated responses in WTAB were analyzed using two-trait models only; PRBD was analyzed only informally. Most of the analyses were carried out within the Bayesian framework of inference.
In the analyses presented below, TNB and WTAB are both regarded as traits of the mother of the litter. In the case of WTAB, this assumption was dictated partly by the availability of litter weights, rather than of individual piglet weights.
The analyses of TNB were done by fitting two types of models that differ fundamentally in the way data contribute to infer response to selection. The first type is a nonhierarchical model, where only data that belong to a generation in question contribute to infer the genetic mean of that generation. The second type is represented by hierarchical additive genetic models, where the complete data contribute information to the genetic mean of a particular generation, leading to sharper inferences. This is at the expense of a higher level of parameterization and stronger assumptions about the form of gene action.
Single-trait analyses of TNB:
Nonhierarchical model:
The simplest model entertained is one that ignores the family structure in the data. This is labeled the nonhierarchical (NH) model, in contrast with the additive genetic models described below. The NH model considered here assumes the following relationship between the dependent variable and the parameters,
![]() |
(1) |
where yijklm is a record (TNB) from line i (li), parity j (pj), generation k (gk), and season l (sl). Data from the common farm only (selected and control lines) are included in the analysis. Response to selection is defined as
![]() |
(2) |
which is the difference between the effects of the selected and control lines and which is inferred from its marginal posterior distribution. In (2), no distinction is made between responses at generations 1 and 2. A model similar to (1), which included instead line-by-generation interaction effects, did not reveal a difference between the responses at generations 1 and 2, and the results of this analysis are therefore omitted.
Model (1) is written in matrix form as
![]() |
(3) |
where b is the vector that has parameters associated with line, parity, generation, and season, and X is the known incidence matrix. It is assumed that y|b is normally distributed, with mean and variance equal to Xb and I
2e, respectively. A priori, the vector b = {bi} is assigned a 10-dimensional uniform distribution, independent of the residual variance; the latter is assumed to be proportional to
. Therefore,

With the present choice of prior distributions, the marginal posterior distribution of selection response (define selection response as R = k'b) follows a univariate t-distribution with mean equal to E(R|y) = k'(X'X)-1X'y, scale parameter equal to
2ek'(X'X)-1k, and with degrees of freedom equal to n - rank(X), where

and n is the number of elements in y (![]()
2e is a scaled inverted chi-square distribution with scale parameter (n - rank(X))
2e and (n - rank(X)) d.f. Note that the mean of [R|y] is numerically identical to the least-squares estimator.
Hierarchical models via Markov chain Monte Carlo: Hierarchical models include a number of additive genetic models that were fitted using the Gibbs sampler. With the additive genetic models, response to selection at generation 1 is defined as the difference in the average additive genetic values between individuals from the selected line, generation 1, and those from the base. This is labeled R1. Response to selection at generation 2 is correspondingly defined as the difference in the average additive genetic values between individuals from the selected line, generation 2, and those from the base. This is labeled R2.
An alternative expression of response to selection involves deviating the mean additive genetic value of the offspring of selected parents from the average additive genetic values of all matings available during the 8-month period (the group from which parents were chosen). However, since TNB shows no genetic trend over the period studied (not shown) and the mean of the posterior distribution of the average additive genetic values of the base is smaller than 10-4, the above definition was chosen. Note that with the additive genetic model, response to selection is not defined as a deviation involving the control line.
The initial model choice was based partly on the work of ![]()
![]() |
(4) |
where b is a vector that contains location parameters whose a priori distributions are assumed to be uniform, with lower and upper bounds equal to bmin and bmax, respectively, and a and p are vectors that contain additive genetic values and permanent environmental effects, respectively. Matrices X, Z, and W are known incidence matrices, and
2e is the residual variance. A priori, it is assumed that additive genetic values and permanent environmental effects are multivariate normally distributed:
![]() |
(5) |
![]() |
(6) |
In these expressions,
2a is the additive genetic variance in the population from which individuals with unknown parents were conceptually sampled, and
2p is the permanent environmental variance. The matrix A has, as elements, additive genetic relationships among the genetic values.
The models differed in their prior distributions and in the specification of the herd-year-type of insemination effects. In model 1, vector b has effects of herd-year-type of insemination, season, and parity. A priori, variance components are assumed to be independently and uniformly distributed, taking values in the following ranges: 0 <
2a
4, 0 <
2p
2, and 0 <
2e
20. These distributions for the variance components imply an a priori distribution of heritability and repeatability with respective modal values of 0.15 and 0.23 (![]()
Model 2 differs from the above in that the vector of herd-year-type of insemination effects, h, is assumed a priori, to follow a multivariate normal distribution:
![]() |
(7) |
The vector b now contains effects of parity, season, and type of insemination (artificial vs. natural service). All variance components (
2a,
2h,
2p,
2e) are assumed to be a priori, independently and uniformly distributed, taking values in the ranges described above. For
2h, the range was 0 <
2h
2.
Finally, response was also evaluated using a model similar to model 1, except that heritability and repeatability were assumed to take values 0.10 and 0.15, respectively, and the posterior distribution of response was obtained, conditional on these values. This is denoted model 3 and is the same model used to carry out selection decisions and is numerically equivalent to a BLUP approach if heritability and repeatability are the true values. In a frequentist setting, the sampling variance of the "BLUP estimate of selection response" must be computed over the distribution of the selected data; this is not an analytically tractable task. The Bayesian analysis implemented here via the Gibbs sampler yields a measure of uncertainty associated with selection response in the form of a Monte Carlo estimate of the posterior variance.
Two other models were also fitted: one of these differed from model 1 in the a priori specification of the variance components. These were assumed to follow scaled inverted chi-square distributions, reflecting little information about the variance components. The other model differed from model 2 in the same way. The first of these two behaved almost identically to model 1, and the second one behaved almost identically to model 2. Therefore, results obtained from these two models are omitted.
The bounds bmin and bmax were set equal to 0 and 20, respectively, for all elements in either bmin or bmax, for all models. Runs with different bounds yielded almost identical results and these are therefore not presented.
Two-stage regression of TNB on parental predicted additive genetic values:
The data were analyzed also using a two-stage approach, designed to study whether TNB achieved after selection was consistent with predictions based on the additive genetic model 1 (![]()
![]() |
(8) |
where ti is the effect of the ith parity, sj is the effect of the jth season, pedk =
, the average predicted additive genetic values of the father and mother of sow k based on the first stage analysis,
f,k =
and
m,f,k =
are random errors of prediction associated with the father (f) and mother (m) of k, pk is the sum of the effects of Mendelian sampling and permanent environmental effects, and eijkl is a residual effect peculiar to the lth record. Note that no distinction is made between records from selection or control lines. If TNB does not respond to selection, it is expected that ß = 0; under the assumption that the model holds, ß = 1. This analysis was carried out using a frequentist approach.
Single-trait analysis of PRBD:
Data on PRBD were markedly skewed and it was not possible to find a transformation to remove this skewness. A formal analysis of the correlated response in PRBD, which requires inclusion of the trait selected for TNB, will be presented in the future. At this stage, we report results from a univariate analysis based on the approach in ![]()
It was cursorily observed that PRBD increases when TNB is >7. Therefore, the following phenotypic relationship was studied between PRBD and TNB,
![]() |
(9) |
where li is the effect of line i (i = 1, 2); pgj is the jth interaction effect of parity by generation; sk is the effect of season k; tl is a classification variable that takes the value TNB if TNB
7 and zero otherwise; I(TNBp > 7) is the indicator function that takes the value 1 if the argument is satisfied and zero otherwise; ß is a regression coefficient; Sn is the nth sire effect assumed normally distributed, with mean zero and variance
2s; Do is the oth dam effect, assumed normally distributed, with mean zero and variance
2d; and Ap is the pth sow effect, assumed normally distributed, with mean zero and variance
2a. A similar model with a regression coefficient for each of the two lines was also fitted to the data, but no difference could be detected between the two regression coefficients.
Two-trait analysis of TNB and WTAB:
As with the single-trait analyses, a number of two-trait models were investigated. The models differed in the form of the prior distributions and in the covariance structure of location parameters. Inferences about response to selection for TNB and correlated responses in WTAB were almost identical in all cases; we present, therefore, results from the most parsimonious among the models studied.
Let (y, z) denote the vector of length 10,060 x 2 representing the n = 10,060 records on TNB (vector y), and the augmented WTAB data (z). The latter was augumented with 10,060 - 1072 = 8988 missing residuals. It is assumed that the ith record (yi, zi) is conditionally, bivariate normally distributed, given the parameters
![]() |
(10) |
where Rei, is the ith residual covariance matrix, which has the structure
![]() |
(11) |
In (11), ni is the known number of records contributing to WTAB. Elements of the row vectors x'i, w'i, and z'i (subscripts y and z refer to TNB and to WTAB, respectively), associated with a missing residual, contain only zeroes. The vector b has location parameters that are assumed to be uniformly distributed, a priori. For both TNB and for WTAB, these are herd-year-type of insemination effects, season, and parity. The vectors py and pz represent permanent environmental effects and the vectors ay and az represent additive genetic values. Additive genetic values and permanent environmental effects are assumed to be, a priori, multivariate normally distributed:

In these expressions, Go is the 2 by 2 matrix of additive genetic variances and covariances, and
2py and
2pz are the variances due to permanent environmental effects associated with TNB and WTAB, respectively.
All variances and covariance matrices were assigned improper uniform prior distributions, a priori. This is denoted model 4. In model 5, for TNB, herd-year-type of insemination effects were assumed to have, a priori, independent normal distributions, with zero means. Independent scaled inverted chi-square distributions were assigned to
2py and to
2pz (and to
2hy), and scaled inverse Wishart distributions to Go and Re. The parameter "degrees of freedom" associated with all these distributions was set equal to 5. This is intended to reflect weak a priori information. Variants of the above models with respect to prior specifications were also fitted, with hardly any consequences on inferences about response and correlated response to selection. Therefore, results based on only models 4 and 5 are shown.
The first step in the implementation of the Gibbs sampler was the generation of the WTAB missing residuals from the fully conditional posterior distribution. From (10), this has the form

where

and

In these expressions, zm,i represents the ith missing residual, yo,i is the ith TNB observation, x'yo,i, w'yo,i, and z'yo,i are the rows of Xy, Wy, and Zy associated with yo,i,
e,yz is the residual covariance between TNB and WTAB,
2e,z(
2e,y) is the residual variance associated with WTAB (TNB), and re,yz =
is the residual correlation between traits. After generating the augmented data, location parameters were sampled from the appropriate multivariate normal distribution, variances were sampled from the appropriate fully conditional inverse chi-square posterior distributions, and covariance matrices from the appropriate scaled inverse Wishart distributions. Each iteration required generation of a new set of missing residuals and the construction of the mixed-model equations. Further details of the algorithm can be found, for example, in ![]()
Gibbs sampling implementation:
The results presented below are based on Monte Carlo estimates of means of posterior distributions. A large number of experiments for each model were run with the Gibbs sampler. The burn-in periods and number of iterations of the various Gibbs chains were chosen partly pragmatically, looking at the results of various independent chains as time-series plots overlaid and seeing if these could be distinguished, and partly using more formal methods. These include the approaches of ![]()
![]()
![]()
![]()
![]()
![]()
It was estimated that the large proportion of missing data could cause very slow mixing. Therefore, after augmenting with the missing residuals, the sampler was implemented using a multiple-trait extension of the approach suggested by ![]()
Evaluating contributions to statistical information about response to selection:
It is of interest to quantify the relative amount of information contributed by subsets of the data to the posterior distribution of selection response. To achieve this, response to selection is studied using two types of analyses. In the first one, phenotypic records from the selected and control lines (labeled vector y2) are excluded; only data from the base (labeled y1) contribute to inferences about response to selection. This is denoted the predictive analysis. Since the additive genetic models assume that parental additive genetic values combine additively to generate the additive genetic values in their offspring, and since the pedigree is known, selection response can be inferred using y1 only. In a second analysis, the complete data, y' = (y'1, y'2), are included to draw inferences; this is the complete data analysis.
Let
represent the vector of parameters for a given model and let p(
) denote the prior distribution of
. Then the posterior distribution of
, conditional on the complete data, is given by
![]() |
(12) |
where the equality arises because in the cases of the models considered here, y1 and y2 are conditionally independent, given
. The predictive analysis is associated with the first two terms in the right-hand side of (12). Taking logarithms, and differentiating twice with respect to
, yields
![]() |
(13) |
Equation 13 shows that the amount of information in the posterior distribution is equal to the sum of information contributed by the prior distribution, I(
), by the sampling model of the base, I(y1|
), and by the sampling model of data from selected and control lines only, I(y2|
). Information measures are approximated by the inverse of the respective variances, and in this way it can be quantified how different sources have contributed information to posterior inferences about selection response. This approximation to the concept of information is based on asymptotic results. These analyses are implemented with the hierarchical models only, and are in similar spirit to the two-stage regression study described above.
Model assessment:
Inferences about response to selection vary among the different models fitted. The problem of model comparison is addressed first, by comparing marginal likelihoods from different models (![]()
![]()
There is a voluminous literature on model determination, both from frequentist and Bayesian perspectives. A useful account of both schools of inference can be found in ![]()
![]()
![]()
The first of these quantities, p(y|Mi), is the marginal density of the data, or marginal likelihood, and is the basis for the computation of Bayes factors (i.e., ![]()
i and
j):
![]() |
(14) |
Here an MCMC estimator of (14) is used, proposed by ![]()
![]() |
(15) |
In (15),
(j)i represents the jth Gibbs sample from p(
i|y, Mi) and m is the total number of samples drawn. The marginal likelihood can be interpreted as the probability of obtaining the data actually observed under model i, calculated before any data became available (![]()
We are also interested in studying how the models predict specific features of the data. Since the scientific purpose of the present experiment was to generate a difference between the selected and the control lines, using the data from the base, the models were tested by comparing discrepancy measures involving observed data and simulated data z2 from the predictive density p(z2|z1, Mi). The vector z1 contains data from the base plus parity 1 records from the selected and the control lines. The latter were added to the base data to guarantee that all the parameters needed to sample from p(z2|z1, Mi) had been inferred from p(
i|z1, Mi). The vector z2 is a simulated value of all the parity 2 records of individuals in the selected and the control lines.
Simulation of data from p(z2|z1, Mi) can be easily accomplished and is based on the identity
![]() |
(16) |
Using the Gibbs sampler, draws
(j)i (j = 1, ... , m) were obtained from p(
i|z1, Mi) and draws z(j)2 were obtained from p(z2|
(j)i, Mi). The z(j)2's are approximate draws from p(z2|z1, Mi).
Having obtained draws from p(z2|z1, Mi), the posterior distributions of a number of discrepancy measures were chosen. The first one was the average difference between the simulated data z(j)2 and the observed data z2. This discrepancy measure is DT; the Monte Carlo estimator of DT is
![]() |
(17) |
where m is the length of the Gibbs chain, 1 is a vector of ones (of length n2), n2 is the number of elements in z2, and z(j)2 is the jth draw from the predictive density p(z2|z1, Mi), (j = 1, ... m).
The second discrepancy measure chosen was similar to (17), except that only the most extreme observations from z2 were kept (4 or less piglets born per litter and 16 or more piglets born per litter). We are interested in comparing the ability of the different models to predict extreme records. The Monte Carlo estimate of this discrepancy is labeled
EXTH for the extreme high and
EXTL for the extreme low observations.
The third and final discrepancy measure chosen involves the difference between the raw averages of parity 2 records from the selected and control lines. The observed average difference is DSC =
s,2 -
c,2 = 0.40 piglets per litter, where
s,2 is the observed raw average of the observed parity 2 records from the selected line, and
c,2 is the observed raw average of the parity 2 records from the control line. From the simulated data z(j)2 generated from the predictive density (16), values for
s,2 and
c,2 were computed and the Monte Carlo estimator of the difference between the averages of parity 2 records from the selected and control lines is labeled
SC.
Monte Carlo estimates of the average squared differences of the above discrepancies were also computed. The results do not contribute further insight and are therefore not presented.
| RESULTS |
|---|
Single-trait analysis based on the nonhierarchical model:
The posterior mean of response to selection for TNB computed using the NH model is 0.40 piglets per litter. The scale parameter
2ek' (X'X)-1 k of the t-distribution is equal to 0.0609 and the 95% probability interval of the response to selection is (-0.07, 0.89). Thus, 91% of the posterior distribution includes positive values of response to selection. This model understates uncertainty because no account is taken of the correlated structure in the data.
Single-trait analyses based on hierarchical models (additive genetic models):
Results from the predictive analyses are shown first. Estimates of posterior means of response to selection, heritability, repeatability, and proportion of variance due to herd-year-type of insemination effects, using models 1, 2, and 3, are shown in Table 2.
|
Differences among the predictions from model 1 and the other two models vary substantially. All models predict a higher response at generation 2, due to the fact that generation 2 females are offspring of the 36 highly selected sires.
Estimates of posterior means and standard deviations of response to selection for the three models, based on the complete data analyses, are shown in Table 3. Comparison of results in Table 2 and Table 3 shows that the complete data analyses produce less differences among models than the predictive analyses in the pattern of selection response. The posterior mean based on the complete data analyses for models 2 and 3 is smaller than for model 1 and is very similar to the one from the NH model. Inferences about response to selection on the basis of the hierarchical models are sharper than those on the basis of the NH model: the posterior variance is approximately six times smaller.
|
The estimates of posterior means in Table 3 produce a consistent picture of the pattern of response to selection across the three models. A comparison between model 1 and model 2 reveals that a smaller estimate is obtained when it is assumed that herd-year-type of insemination effects are a priori normally distributed (model 2). It is interesting to note this difference, especially since herd-year-type of insemination effects account for only 6% of the total variance in TNB, as we show below. Inferences under models 2 and 3 are similar, despite the fact that the posterior mean of heritability retrieved from model 2 (0.15) is 50% larger than the value assumed in the conditional model 3 (0.10).
All posterior distributions of response are symmetric (not shown). Estimates of the median and of the mode of the distributions agree (up to two decimal places) with estimates of the mean.
Shown in the last three columns of Table 2 and Table 3 are estimates of posterior means of heritability, repeatability, and of proportion of the variance due to herd-year-type of insemination effects. The numbers are very similar in both tables because a vast proportion of the information about these genetic parameters is contributed by the base. The values obtained for heritability and repeatability are a little higher for the model where it is assumed that herd-year-type of insemination effects are, a priori, uniformly distributed (model 1). The difference is due to a larger estimate of the posterior mean of the additive genetic variance (1.46 vs. 1.34) and to a smaller estimate of the posterior mean of the phenotypic variance (8.81 vs. 9.28). The figures of 0.17 for heritability and 0.24 for repeatability retrieved from model 1 are a little larger than the usual average value of 0.10 reported in the literature. Posterior distributions were highly symmetric (not shown; posterior means, modes, and medians agreed to two decimal places).
Quantifying statistical information about response to selection from different sources:
The breakdown of the relative contributions to information about selection response based on Equation 13, expressed as the percentage contributions of the base plus prior, and the data from selected and control lines, relative to the information in the posterior distribution, yield 69.4 and 30.6%, respectively. These numbers are very similar for all models, except for model 3, which assumes variances known. Here, the relative contributions are 81.0 and 19.0%. Clearly, for the hierarchical models, there is a very substantial part of the statistical information arising from the base and the prior distribution. However, within the set of prior distributions studied, the amount of statistical information contributed by these two sources is very similar. Notwithstanding, because this amount is very substantial, it is important to compare inferences drawn from the hierarchical models and from the less-parameterized NH model.
For heritability, the prior distribution contributes with 2.7% of the total information contained in the posterior distribution. The (complete) data are overwhelmingly informative relative to the prior.
Two-stage regression analysis of TNB on parental predicted additive genetic values:
The estimate of the regression coefficient in model 8 was 0.76, with a standard error of 0.26. This leads to a rejection of the hypothesis of no effect of pedigree index on TNB, with a P value of 0.2%. On the other hand, the analysis fails to reject the hypothesis that ß = 1. In other words, even though the estimated response is 76% of the response predicted by the model, the analysis does not generate statistical evidence to question the predictive ability of the model.
This is based on the analysis fitting model 1 to data set y1. Fitting models that treat herd-year-type of insemination as normally distributed yields estimates of the regression coefficient closer to 1. However, the conclusion is the same as above.
Analysis of PRBD:
On the probit scale, the estimated line difference (selected - control) was 0.132 with a standard error of 0.049, and the estimate of ß was 0.017 with a standard error of 0.007. On the probability scale, the line difference is 2.1% if the level of PRBD in the control is 8% and 2.5% if the level of PRBD in the control is 10%. If the level of PRBD is 8%, an increase of one piglet born increases PRBD to 8.25%, and if the level of PRBD is 10%, an increase of one piglet born increases PRBD to 10.3%.
Two-trait analysis of TNB and WTAB:
Only complete data analyses are discussed. Shown in Table 4 are estimates of means of posterior distributions of heritabilities, repeatabilities, and genetic correlations from the bivariate analyses. The parameters associated with TNB do not differ from the single-trait estimates. Both models produce almost identical results. The posterior standard deviation of the genetic correlation reflects a high degree of uncertainty. However, the posterior distribution assigns most of the probability mass to negative values of the genetic correlation; the 95% probability interval is (-0.46, -0.01).
|
As was the case with the single-trait analysis, the posterior mean of heritability is a little smaller for the model that assumes that herd-year-type of insemination effects for TNB are, a priori, normally distributed, and the posterior mean of the proportion of variance due to herd-year-type of insemination effects is
6%. A comparison of the posterior standard deviations in Table 5 with those of Table 4 reveals that the amount of information contributed by WTAB to inferences about TNB is undetectable.
|
Estimates of posterior means of selection response, using models 4 and 5, are shown in Table 5.
Estimates of response for TNB are almost identical to the results obtained from the analogous models in the single-trait analysis. As before, model 5, which assumes that for TNB, herd-year-type of insemination effects are, a priori, normally distributed, leads to slightly smaller estimates of posterior means.
Estimates of posterior means of correlated response in WTAB are negative. Posterior standard deviations are relatively large, indicating a considerable degree of uncertainty. Thus, for model 4, the Monte Carlo estimate of the 95% probability intervals for WTAB, associated with
1 and with
2, are (-33.3, 18.1) and (-44.8, 14.7), respectively. The probability that the correlated response (based on
1) is negative is 76%. The posterior distributions associated with TNB are sharper; the corresponding intervals for TNB are (0.24, 0.63) and (0.30, 0.77).
Model assessment:
Results from the model assessment study are shown in Table 6. Models 1 and 4 [those that considered that herd-year-type of insemination effects were, a priori, uniformly distributed (HYT-Un) in Table 6] produced almost indistinguishable results. So did models 2 and 5 [those that considered that herd-year-type of insemination effects were, a priori, normally distributed (HYT-Nor)]. Therefore, to economize on the presentation of results, these two sets of models were grouped. For TNB, the fact that there is no difference in the behavior of single-trait and two-trait models confirms that WTAB does not contribute detectable information to inferences about TNB.
|
The first two rows of Table 6 compare the two groups of hierarchical models. The difference between the natural logarithms of marginal likelihood (the logarithm of the Bayes factor) provides very strong evidence (using the standard in ![]()
![]()
Despite the rather marked difference in the marginal likelihoods, the difference between the two groups of models to predict specific features of the data via the discrepancy measures chosen is small. However, over all the criteria explored, the HYT-Nor group of models performed better than the HYT-Un group of models.
Conditionally on z1, both groups of hierarchical models tend to overpredict the difference between the means of parity 2 records from selected and control lines (the observed difference is 0.40 piglets per litter), with HYT-Nor behaving slightly better than HYT-Un. However, this observed difference is well within the range of posterior uncertainty: the Monte Carlo estimate of the 95% posterior interval of the marginal posterior distribution of DSC under HYT-Nor is (0.14, 1.12) and under HYT-Un is (0.20, 1.15).
The estimates of marginal likelihoods provide very strong evidence against the NH model. This is consistent with the observation that the posterior distribution of heritability assigns most of the probability mass away from zero. The NH model predicts the average observed difference of parity 2 records between selected and control lines very accurately. However, the posterior uncertainty of this predictive distribution is much larger than that obtained fitting the hierarchical models.
All the models implemented in this study underpredict extreme observations (last two columns of Table 6). A closer look at the extreme observations did not produce insight as the underlying cause for this result; errors in recording cannot be dismissed. The NH model performs worse than the hierarchical models; it performs poorly also, relative to the hierarchical models, in predicting individual records, on average.
A little digression is in place to describe the computation of the marginal likelihood associated with the NH model. Expression (14) shows that the Bayes factor is undetermined when p(
i|Mi) is improper, as defined in the NH model. To circumvent this problem,
(y|M = NH), where M = NH refers to the NH model, was computed with a model similar to the NH model, except that [b] was assigned a uniform distribution with support defined by upper and lower limits bmin = 0 and bmax = 20 (0 and 20 are vectors of order equal to the number of elements in b). Further,
2e was assigned a uniform prior with lower limit equal to 0 and upper limit equal to 20.0. Several other runs with varying limits for [b] and [
2e] gave virtually identical results.
Discrepancy measures derived from the predictive density p(z2|z1, Mi) were computed using the unmodified NH model, because p(z2|z1, Mi) is not affected by the impropriety of prior distributions. In fact, it can be shown that p(z2|z1, Mi) has the form of a multivariate t-distribution.
| DISCUSSION |
|---|
The majority of the analyses presented in this work were based on the Bayesian approach to inference. An important objective of this work is to illustrate how this approach allows the analyst to quantify probabilistically the uncertainty involved in the description of direct and correlated selection responses. This is done conditionally on the data at hand, without invoking the use of problematic mental constructs involving the distribution of an estimator in conceptual repetitions of the experiment under identical conditions. Bayesian methods used in conjunction with MCMC provide considerable freedom in building hierarchical models that describe the perceived underlying structures in the data. We have also illustrated how a Bayesian analysis used in conjunction with MCMC allows models to be compared using a variety of criteria and how a sensitivity analysisto the prior distribution or to the likelihoodcan be implemented.
Concerning TNB, the most parsimonious model, the NH model yielded an estimate of selection response of 0.40 piglets per litter. Point inferences using this simple model are appealing, because no assumptions are made about the underlying genetic model (![]()
1... -
2..., the difference between phenotypic means of the selected and control lines. Thus, using the NH model response is evaluated using data from these lines only.
This is in contrast to the more highly parameterized additive genetic models, which invoke an infinitesimal additive model as the genetic mechanism through which response to selection is generated. These models were fitted here using the Gibbs sampler. The properties of this approach to the analysis of selection experiments are described in ![]()
Posterior variances account for genetic drift; in a Bayesian setting, genetic drift can be interpreted as the relative loss of information about the genetic mean that evolves with time, due to the increased correlated structure in the data that builds as the experiment progresses. As a result of this, say, when the number of additive genetic values contributing to the (additive) genetic means of the different generations is the same, the posterior variance of the genetic mean increases with generations.
In the analyses based on hierarchical models, the whole data on which selection decisions were taken are included in the analysis. This holds for the analysis of direct response to selection for TNB and for the analysis of correlated response for WTAB. Therefore, joint and marginal posterior distributions are not affected by selection (![]()
Expressions of response to selection obtained via the additive genetic model contain information arising from the prior distribution, from what we have called the base and from animals that belonged to the selected and control lines in the common farm. The latter group comprised 1072 litter records, whereas the former contributed 8988 litter records. An attempt was made to quantify the information coming from these sources using the concept of observed statistical information and appealing to asymptotic results to estimate it. The amount of information [as defined in (13)] contributed by the prior distribution and by the 8988 litter records, relative to that from the selected and control lines, was almost 70%, and this number was consistent for all the prior distributions studied. In other words, had the lines not differed phenotypically in TNB, the additive genetic model-based inferences would have yielded a positive response. This is why it is important, especially from a genetic point of view, to contrast results obtained by fitting the additive genetic models and the NH model, which infers response to selection using the difference between the means of selected and control lines only. In our experiment, both types of models yielded similar inferences about selection response. This agreement is indicative that an additive genetic model is operationally adequate to describe the results of this experiment. Once this point is confirmed, there is little doubt that the hierarchical model yields a much sharper picture of the outcome of the selection experiment. The Monte Carlo estimate of the 95% probability interval of selection response for TNB based on the hierarchical model HYT-Nor is (0.23, 0.60); the corresponding interval from the NH model is (-0.07, 0.89).
The model assessment study shows that the most appropriate model among the ones fitted was the hierarchical model, which assumes that herd-year-type of insemination effects are, a priori, normally distributed. This conclusion is based on the marginal likelihood, which is a global test of the model, and on the ability of the models to predict specific features of the data. Due to the present experimental protocol, it was judged meaningful to study the latter, generating simulated data from posterior predictive distributions under a variety of models, conditional on data from base individuals. Replicated data could have easily been simulated from posterior predictive distributions under the various models, rather than conditioning on the base records. The general idea here is to study the ability of the models to generate replicated data that look similar to the data actually observed (![]()
Among the criteria used to compare models, the NH model performed worse than the hierarchical models in all cases, except in the ability to predict the difference between the raw means of selected and control lines among parity 2 records. This statement is based on the mean of the relevant posterior distribution as a criterion to decide what is "best." However, despite the fact that the hierarchical models yield a mean of the posterior predictive difference larger than the observed difference (0.65 vs. 0.40), the observed value of 0.40 piglets per litter is well within the range of posterior uncertainty. We note also that the numbers in Table 6 show that the variance of the posterior predictive difference is 1.75 times larger in the case of the NH model.
Litter size is known to be affected by inbreeding depressiona phenomenon that the additive genetic models implemented in this study are unable to explain. To account for inbreeding depression, some form of interaction effects within or between loci, or both, is required as part of the mechanism responsible for genetic variation. Another way of testing the operational adequacy of the additive genetic model is to compare predictions of crossbred performance under such a model, using data from purebred performance. ![]()
The analysis of TNB and WTAB with model 5 indicates that the posterior probability that the correlated response in WTAB is negative is
76%. The 95% posterior probability interval for the genetic correlation is (-0.46; -0.01), and the Monte Carlo estimate of the mean of the posterior distribution is -0.21. The conclusions based on these numbers are qualitatively in agreement with those of ![]()
![]()
![]()
![]()
Litter size has been part of the Danish breeding objective since 1992. The decision is fully justified on the grounds of the results of this experiment. However, it seems prudent to monitor birth weight and piglet viability, together with other components of reproduction, to avoid undesirable genetic changes.
| LITERATURE CITED |
|---|
ANDERSEN, S., B. PEDERSEN and A. VERNERSEN, 1998 Impact of nucleus selection at production level. Proceedings of the 6th World Congress on Genetics Applied to Livestock Production, Armidale, Australia. Vol. XXIII, pp. 515518.
ÁVALOS, E. and C. SMITH, 1987 Genetic improvement of litter size in pigs. Anim. Prod. 44:153-164.
BICHARD, M., and C. M. SEIDEL, 1983 Selection for reproductive performance in maternal lines of pigs. Proceedings of the 2nd World Congress on Genetics Applied to Livestock Production, Madrid, Spain. Vol. VIII, pp. 565569.
BICHARD, M., M. BOVEY, L. SEIDEL, P. DAVID and C. TOMKINS, 1983 New Developments in Scientific Pig Breeding, No. 3. Pig Improvement Company, Fyfield Wick, Abingdon, Oxfordshire, UK.
BLASCO, A., J. P. BIDANEL and C. HALEY, 1995 Genetics and neonatal survival, pp. 1738 The Neonatal Pig. Development and Survival, edited by M. A. VARLEY. CAB International, Wallingford, Oxon, UK.
BOX, G. E. P., and G. C. TIAO, 1973 Bayesian Inference in Statistical Analysis. John Wiley & Sons, New York.
CARLIN, B. P., and T. A. LOUIS, 1996 Bayes and Empirical Bayes Methods for Data Analysis. Chapman & Hall, London.
CASEY, D., T. A. RATHJE and R. K. JOHNSON, 1994 Second thoughts on selection for components of reproduction in swine. Proceedings of the 5th World Congress on Genetics Applied to Livestock Production, Guelph, ON, Canada. Vol. XVII, pp. 315318.
ESTANY, J. and D. SORENSEN, 1995 Estimation of genetic parameters for litter size in Danish Landrace and Yorkshire pigs. Anim. Sci. 60:315-324.
GARCIA-CORTES, L. A. and D. SORENSEN, 1996 On a multivariate implementation of the Gibbs sampler. Genet. Sel. Evol. 28:121-126.
GEISSER, S., 1993 Predictive Inference: An Introduction. Chapman & Hall, London/New York.
GELFAND, A. E., 1996 Model determination using sampling-based methods, pp. 145161 in Markov Chain Monte Carlo in Practice, edited by W. R. WILKS, S. RICHARDSON and D. J. SPIEGELHALTER. Chapman & Hall, London/New York.
GELMAN, A. and D. B. RUBIN, 1992 Inference from iterative simulation using multiple sequences (with discussion). Stat. Sci. 7:467-511.
GELMAN, A., J. B. CARLIN, H. S. STERN and D. B. RUBIN, 1995 Bayesian Data Analysis. Chapman & Hall, London/New York.
GEYER, C. J., 1992 Practical Markov chain Monte-Carlo (with discussion). Stat. Sci. 7:473-511.
GIANOLA, D. and R. FERNANDO, 1986 Bayesian methods in animal breeding. J. Anim. Sci. 63:217-244
HALEY, C. S., E. ÁVALOS, and C. SMITH, 1988 Selection for litter size in pigs. Anim. Breed. Abstr. 56:317-332.
HENDERSON, C. R., 1973 Sire evaluation and genetic trends. Proceedings of the Animal Breeding and Genetics Symposium in Honor of Dr. Jay L. Lush, Champaign, IL, pp. 1041.
KASS, R. E. and A. E. RAFTERY, 1995 Bayes factors. J. Am. Stat. Assoc. 90:773-795.
KERR, J. C. and N. D. CAMERON, 1995 Reproductive performance of pigs selected for components of efficient lean growth. Anim. Sci. 60:281-290.
LEGAULT, C., 1985 Selection of breeds, strains and individual pigs for prolificacy. J. Reprod. Fertil. 33(Suppl.):151-166.
NEWTON, M. A. and A. E. RAFTERY, 1994 Approximate Bayesian inference by the weighted likelihood bootstrap. J. R. Stat. Soc. Ser. B 56:3-48.
O'HAGAN, A., 1994 Kendall's Advanced Theory of Statistics, Vol. 2B: Bayesian Inference. Edward Arnold, London.
RAFTERY, A. E., and S. M. LEWIS, 1993 How many iterations in the Gibbs sampler? pp. 763773 in Bayesian Statistics, Vol. 4, edited by J. M. BERNARDO, J. O. BERGER, A. P. DAWID and A. F. M. SMITH. Oxford University Press, Oxford.

















1) and 2 (