- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Force, A.
- Articles by Postlethwait, J.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Force, A.
- Articles by Postlethwait, J.
Preservation of Duplicate Genes by Complementary, Degenerative Mutations
Allan Forcea, Michael Lyncha, F. Bryan Pickettb, Angel Amoresa, Yi-lin Yana, and John Postlethwaitaa Department of Biology, University of Oregon, Eugene, Oregon 97403
b Department of Biology, Loyola University of Chicago, Chicago, Illinois 60626
Corresponding author: Allan Force, Department of Biology, University of Oregon, Eugene, OR 97403., force{at}oregon.uoregon.edu (E-mail)
Communicating editor: A. G. CLARK
| ABSTRACT |
|---|
The origin of organismal complexity is generally thought to be tightly coupled to the evolution of new gene functions arising subsequent to gene duplication. Under the classical model for the evolution of duplicate genes, one member of the duplicated pair usually degenerates within a few million years by accumulating deleterious mutations, while the other duplicate retains the original function. This model further predicts that on rare occasions, one duplicate may acquire a new adaptive function, resulting in the preservation of both members of the pair, one with the new function and the other retaining the old. However, empirical data suggest that a much greater proportion of gene duplicates is preserved than predicted by the classical model. Here we present a new conceptual framework for understanding the evolution of duplicate genes that may help explain this conundrum. Focusing on the regulatory complexity of eukaryotic genes, we show how complementary degenerative mutations in different regulatory elements of duplicated genes can facilitate the preservation of both duplicates, thereby increasing long-term opportunities for the evolution of new gene functions. The duplication-degeneration-complementation (DDC) model predicts that (1) degenerative mutations in regulatory elements can increase rather than reduce the probability of duplicate gene preservation and (2) the usual mechanism of duplicate gene preservation is the partitioning of ancestral functions rather than the evolution of new functions. We present several examples (including analysis of a new engrailed gene in zebrafish) that appear to be consistent with the DDC model, and we suggest several analytical and experimental approaches for determining whether the complementary loss of gene subfunctions or the acquisition of novel functions are likely to be the primary mechanisms for the preservation of gene duplicates. For a newly duplicated paralog, survival depends on the outcome of the race between entropic decay and chance acquisition of an advantageous regulatory mutation.
![]()
![]()
![]()
THE genomes of most organisms contain multiple copies of genes that are closely related in structure and function. Such gene families can arise from tandem duplications, as in the case of the HOX, hemoglobin, and keratin clusters in animals, or from polyploidization events such as those presumed to have preceded the origin of vertebrates (![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
Here we discuss difficulties in the ability of the classical model to explain the preservation of gene duplicates in evolution and then propose a new model that can explain duplicate gene preservation by the fixation of degenerative mutations rather than by the fixation of new beneficial mutations. Next, we present several examples, including original data from the zebrafish engrailed genes, consistent with the new model. Finally, we suggest a series of experimental approaches for testing the new model.
| Problems with the classical model for the preservation of gene duplicates |
|---|
Under the simplest model for the fate of duplicate genes (the double-recessive model), the rate at which nonfunctional genes (genes that do not make a functional protein product) become fixed in populations is largely determined by random genetic drift and the null mutation rate (u), provided the product of the effective population size and u is <0.01. Under these conditions, the frequency of individuals homozygous null at both duplicate loci is negligible, and null mutations behave essentially as neutral alleles. The probability that one copy will become nonfunctional is then ~1 - e-2ut, where t is the number of generations since the two loci have been functionally diploid with respect to meiosis (![]()
![]()
![]()
![]()
Three general observations involving species derived from polyploidization events appear to contradict the rapid demise of gene duplicates predicted by the classical model. First, in numerous cases, the fraction of genes preserved is higher than predicted by the classic model. For example, in tetraploid fish lineages, 3075% of the duplicate protein-coding genes have avoided nonfunctionalization for time spans on the order of 50 to 100 million yr (![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
Several attempts have been made to explain the high rate of duplicate gene preservation found by empirical observation. First, surviving duplicate loci in these taxa may have been preserved because new gene functions evolve at a much higher rate than predicted. We are not aware, however, of any convincing evidence that the majority of duplicate copies have acquired new functions that did not already exist in the ancestral genes (![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
| Gene structure and duplicate gene preservation |
|---|
An alternative reason for the failure of the classical model to explain the fates of most duplicate loci may be an overly simplistic view of gene structure. Although a general assumption of the classical model is that the properties of a gene may be adequately subsumed under a single function, genes often have several functions, each of which may be controlled by different DNA regulatory elements (see the following reviews for a number of examples: PIATIGORSKY and WISTOW 1991; ![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
The widespread existence of complementation classes within eukaryotic gene loci indicates that gene expression patterns are typically controlled by multiple (and often modular and independent) regulatory regions associated with distinct protein-coding domains (![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
The model presented below outlines how degenerative mutations in regulatory subfunctions can facilitate the preservation of duplicate genes, in the absence of any positive selection for beneficial mutations, by partitioning the repertoire of gene expression patterns of ancestral alleles. This model is quite distinct from the classical model, under which degenerative mutations can only lead to gene loss and beneficial mutations are the only route to gene preservation.
| GENE PRESERVATION BY COMPLEMENTARY DEGENERATIVE MUTATIONS (SUBFUNCTIONALIZATION) |
|---|
Following a polyploidization event, genomic redundancies exist at several levels: duplicate chromosomes, duplicate genes, and duplicate regulatory regions driving gene expression. Each level of redundancy is subject to processes of mutation and random genetic drift, which can lead to loss of function by chromosome loss, gene inactivation, or loss of individual regulatory elements. If duplicate chromosomes lose different genes, then for the organism to remain viable, the two chromosomes must complement each other by jointly retaining functional copies of all genes present on the original ancestral chromosome. Likewise, if duplicate genes lose different regulatory subfunctions, then they must complement each other by jointly retaining the full set of subfunctions present in the original ancestral gene. We refer to the complementary loss of duplicate genetic elements by degenerative mutation as the duplication-degeneration-complementation (DDC) process. The unique feature that distinguishes the DDC process from the classical model is that degenerative mutations facilitate rather than hinder the preservation of duplicate functional genes. In the following discussion, we focus on duplications of entire chromosomes or genomes rather than tandem gene duplications because we wish to exclude for now complications caused by uncertainty about the extent of the original duplication and local homogenization events caused by unequal crossing over or gene conversions (![]()
Under the general DDC model, the process of duplicate gene evolution occurs in two phases (Figure 1). During phase I, genes may experience one of three alternative fates, the first two of which correspond to outcomes under the classical model. First, one copy may incur a null mutation in the coding region, which subsequently drifts to fixation, leading to gene loss (nonfunctionalization). Nonfunctionalization can also occur if all of the regulatory regions of one duplicate are destroyed. Second, one copy may acquire a mutation conferring a new function, which becomes fixed through positive Darwinian selection (neofunctionalization). It is now thought that such mutations may often involve changes in regulatory regions (![]()
![]()
![]()
![]()
|
Subfunctionalization can occur by two different routes: qualitative or quantitative. Under qualitative subfunctionalization, which we model below and illustrate in Figure 1, one duplicate copy goes to fixation for a complete loss-of-subfunction mutation, and the second locus subsequently acquires a null mutation for a different subfunction. In contrast, quantitative subfunctionalization results from the fixation of reduction-of-expression mutations in both duplicates. In this case, once the total regulatory efficiency of a subfunction in both copies has been reduced to a threshold level determined by organismal requirements, any further degradation of the subfunction from either copy may be opposed by purifying selection.
Mutations that cause subfunctions to degrade may occur by several mechanisms, including nucleotide substitutions, deletions, inversions, insertions of transposable elements, slippage/replication errors, and unequal crossing over between repeated transcription-factor binding sites. Transposable elements may generate many subfunctional alleles. For example, P, copia, and gypsy elements are known to be mutagenic when they insert into 5' regions of Drosophila genes (![]()
![]()
![]()
The probability of subfunctionalization:
The arguments presented above suggest that the DDC process could make both gene duplicates essential, but can it account for the high levels of duplicate gene preservation observed in polyploid lineages? Here we consider a simple model that suggests that, with reasonable parameter values, the DDC process can account for a significant fraction of preserved duplicate genes.
Consider the situation in which both members of a recently duplicated gene have z independently mutable subfunctions, all of which are essential, at least in single copy, and all of which mutate at identical rates (ur) to alleles lacking the relevant subfunction. Letting uc be the rate at which null mutations arise in the coding region, the null mutation rate for the locus is then uc + zur per gene copy. We assume that conditions are such that one functional allele (of four possible allele copies) of a given duplicated gene pair is sufficient for wild-type function (the double recessive model), and that beneficial mutations are rare relative to degenerative mutations. Provided the product of population size and genic mutation rate is <0.01 (![]()
![]()
Now imagine that one of the duplicate gene copies experiences a fixation event. Assuming there is more than one subfunction, the probability that the gene survives this event (and does not become a pseudogene) is the total regulatory-region mutation rate divided by the total mutation rate for the two copies
![]() |
(1) |
Following the elimination of one of the z subfunctions from the first gene copy, the second copy must maintain this subfunction, because complete loss of an essential subfunction from both duplicates would be lethal. Thus, the permissible mutation rate for the second copy becomes (z - 1)ur. Additional null mutations can occur in the remaining (z - 1) regulatory subfunctions or in the coding region in the partially degraded first copy. Therefore, the total rate (summed over both copies) for the second mutational event is [uc + 2(z - 1)ur]. The probability of subfunctionalization upon this second event, PS,2, is equal to the probability that the coding regions have survived the first hit multiplied by the probability that the second mutation occurs in a complementary subfunction in the second copy,
![]() |
(2) |
Following this logic, it can be seen that (z - 1) distinct series of mutational events can lead to duplicate-gene preservation by subfunctionalizationthe first two null mutations in regulatory regions may occur on different gene copies, two may initially occur on the same copy followed by a third on the second copy, three may initially occur on the same copy followed by a fourth on the second copy, and so on. The probability of each of these additional pathways to subfunctionalization, i.e., (i - 1) consecutive regulatory-region null mutations on one copy followed by one on the other, is given by the generalization of Equation 2,
![]() |
(3) |
The total probability of gene preservation by subfunctionalization, PS, is obtained by summing this quantity over i = 2 to z,
![]() |
(4) |
|
The DDC process leads to subfunctionalization with high probability given reasonable parameter values. For example, if there are five subfunctions and the mutation rate per subfunction is 10% of the coding region null rate, then the probability of subfunctionalization is 0.1, and if the mutation rate per subfunction is 30% that of the null rate, then the probablitity of subfunctionalization is 30% (Figure 2). Generally, if the total rate of subfunctional mutations (zur) exceeds the null rate in the coding region by more than approximately fourfold, then the probability of gene preservation by subfunctionalization exceeds 50%. The complexity and size of regulatory regions of eukaryotic genes (![]()
![]()
Time scales for subfunctionalization and resolution:
Using the model presented above, the mean time to gene preservation conditional on its actual occurrence can be obtained by treating the times to mutational events as geometrically distributed variables. The rate of occurrence of an initial regulatory-region null mutation is 2zur, because each of the two copies contains z mutational targets. As noted above, subsequent to this initial event, zero to (z - 2) additional degenerative mutations may be incurred by the first-hit copy before the first mutation on the opposite copy. The mean time to subfunctionalization conditional on the occurrence of (i - 1) consecutive regulatory-region null mutations on one copy followed by one on the other is then
![]() |
(5a) |
The mean time to subfunctionalization is then
![]() |
(5b) |
As in the classical model, these expressions indicate that the fates of duplicate genes are generally determined in a relatively short period (on an evolutionary time scale; Figure 3A). For example, if ur = 10-7/yr,
S is on the order of 4 million yr or less provided the number of regulatory regions is greater than five, and even with z < 5 it does not exceed 12.5 million yr. Thus, under the DDC model, most duplicate genes that are destined to be preserved by subfunctionalization are expected to become so within a few million years. With a regulatory-region mutation rate x times that in the figure, the mean time to subfunctionalization would be divided by x.
|
Unless there are only two initial regulatory regions, some regulatory regions (as many as z - 2) will likely remain to be resolved over evolutionary time after the initial subfunctionalization event. The fraction of regulatory regions that is expected to be resolved at the time of gene preservation by subfunctionalization is
![]() |
(6) |
This fraction depends only weakly on the ratio of coding-region to regulatory-region mutation rates, and is <0.5 if the number of regulatory regions exceeds five (Figure 3B). Thus, we anticipate that after the preservation of duplicate genes by the DDC process, a substantial fraction of regulatory subfunctions will typically remain to be resolved in phase II. Assuming that the occurrence of mutations that destroy regulatory regions is a Poisson process, for any site that is unresolved at the time of gene preservation, the probability that it is still unresolved after t further time units is simply P0(t) = e-2tur. The number of unresolved sites at time t then follows a binomial distribution with parameter P0(t).
The molecular nature of subfunctions and the preservation of genetic redundancy:
The preceding theory assumes that individual regulatory subfunctions are independently mutable, with single mutations being sufficient to eliminate a subfunction. Under this simple scenario, the various subfunctions within duplicate genes preserved by the DDC process are expected to be resolved randomly, with each copy retaining about half of its subfunctions within the limits of binomial sampling. However, while we define subfunctions by their mutational properties such that they are members of distinct complementation classes, this definition does not describe how such subfunctions are arranged on the DNA molecule. Regulatory regions for different subfunctions are often partially overlapping or embedded, leading to the situation where the number of expression domains exceeds the number of complementation groups (![]()
![]()
|
Complexities involving the physical arrangement of regulatory regions on the DNA may help explain, without invoking positive selection, how the same expression domains may be preserved by both gene duplicates (![]()
![]()
The topology of regulatory regions may also help explain unidirectional and bidirectional divergence of gene duplicates observed by ![]()
DDC and dosage effects:
In some situations, gene dosage requirements might increase the probability that both gene duplicates are preserved. The theoretical model developed above assumes that for each subfunction, activity of only one of the four alleles of the two gene duplicates is sufficient for survival. It is possible, however, that after gene duplication some subfunctions must remain intact in more than one of the four alleles to ensure optimal fitness. For instance, consider a gene with three separate subfunctions. After duplication, the first subfunction may be sufficient for survival if intact in a single allele, the second subfunction may be sufficient in two alleles, but the third subfunction might be required in three of the four alleles. In such a case, the first and second subfunctions could be resolved to either duplicate gene by the principles of DDC. The third subfunction, however, would have to be maintained by both gene duplicates to have at least three active alleles. In such cases, dosage requirements would provide the initial gene preservation mechanism, but complementary loss of other subfunctions or acquisition of a new function could reinforce the initial preservation event. Note that this type of dosage effect provides an alternative mechanism to shared embedded elements (Figure 4) for retaining a specific expression domain by both gene duplicates.
In some cases, gene dosage requirements might cause the partitioning of subfunctions to be favored by positive selection. For example, consider a situation in which activity of all four alleles of a duplicated gene pair in a certain tissue or time is deleterious. In such a case, the fixation of a nonfunctional or subfunctional allele might be accelerated by positive selection. Note that a case like this differs from the formal model proposed above, which assumes that drift and purifying selection is usually sufficient for the fixation of subfunctional alleles. In these cases, mutations of subfunctions that would be deleterious in the single-copy genes before duplication would become beneficial after duplication. Because this might increase the rate of fixation of subfunctional alleles while simultaneously increasing the rate of fixation of nonfunctional alleles, the overall effect on the probability of duplicate gene preservation is not clear. Future experimental and modeling work may help to define these more complex interactions between gene dosage, population size, the mutation rates to subfunctional, coding null, and neofunctional alleles, and the roles of purifying and positive selection in duplicate gene preservation. It is hoped that the near-neutral DDC model provided here can act as a null hypothesis for testing these and other more complex possibilities.
Possible examples of the DDC process:
Here we present several possible examples of the general DDC process and the way in which it can account for observed patterns of duplicate gene expression. Additionally, we suggest experiments that could falsify the DDC model as an explanation for these specific cases. We consider here a pair of duplicate engrailed genes in zebrafish and the ZAG1 and ZMM2 gene pair in maize. Analysis of such cases must identify gene duplicates, determine whether they arose by tandem duplication or by duplication of large chromosome regions, infer ancestral functions of the unduplicated parent gene, and finally determine whether the distribution of gene functions between duplicated genes can be explained by the complementary sharing of ancestral functions or only by the acquisition of novel functions.
Engrailed genes in zebrafish:
Tetrapods have two members of the engrailed gene family, called En1 and En2 (![]()
![]()
![]()
![]()
![]()
|
To determine whether the zebrafish eng gene pairs originated in chromosome-scale duplications or local tandem duplications, we mapped the eng1b locus and compared it to the genome locations of other engrailed genes in mammals and zebrafish (![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
|
Note that two independent data sets, gene phylogenies based on sequence information and chromosomal locations based on genetic mapping data, concur that the tetrapod En1 gene is an outgroup to the two zebrafish duplicates eng1/eng1b. Therefore, En1 can be used as an outgroup to infer the ancestral shared expression domains of eng1 and eng1b.
Although the expression patterns of engrailed genes are complex, here we focus on expression patterns of the engrailed-1 gene family in two groups of cells. Zebrafish eng1 is expressed in the pectoral appendage bud, while eng1b is expressed in a specific set of neurons in the hindbrain/spinal cord (Figure 7). Is either of these expression domains due to neofunctionalization? Or were both present in the progenitor gene before duplication and one domain lost by each duplicate? Examining the most recent unduplicated outgroup would allow one to infer the state of the ancestral gene. In the absence of information from the most recent outgroup, tetrapods can provide appropriate data. In mouse and chicken, En1 is expressed in both expression domains, the developing pectoral appendage bud, and in specific neurons of the hindbrain and spinal cord (![]()
![]()
![]()
|
Is this a case of gene preservation by subfunctionalization? These data suggest complementary loss of expression, which is consistent with the DDC model. A definitive test of this hypothesis will require identification of the regulatory elements responsible for these expression domains in zebrafish, fish that share the eng1/eng1b duplication, fish that diverged from the lineage giving rise to zebrafish before the duplication event, and tetrapods, including mouse and chicken. In zebrafish, there appear to be many examples similar to engrailed, including duplicates of msx genes (![]()
![]()
![]()
ZAG1 and ZMM2 in maize:
As a second possible example of the preservation of gene duplicates by subfunctionalization, consider the duplicate genes known as ZAG1 and ZMM2 in the maize genome, which originated via an allotetraploidization event between two closely related grasses about 11 mya (![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
|
The DDC model can explain these data by suggesting that the ancestral genes to ZAG1 and ZMM2 were both expressed strongly in the developing stamens and carpels in the allotetraploid ancestor of maize shortly after the polyploidization event, as AGAMOUS and PLENA are today in Arabidopsis and Antirrhinum. We hypothesize that reciprocal regulatory mutations in the ZAG1/ZMM2 duplicates complemented each other, thereby preserving both genes that exist in today's maize. After the allotetraploidization event, degenerative regulatory mutations decreased the expression of ZAG1 in stamens but not in carpel, while other regulatory mutations eliminated the expression of ZMM2 in the early carpel but not in the stamens. If this hypothesis is correct, then, maize plants doubly homozygous for ZMM2 and ZAG1 null mutations should produce plants that phenocopy AGAMOUS and PLENA mutants in Arabidopsis and Antirrhinum. In addition, molecular analysis of the promoters of this gene family in maize, its close relative sorghum, Arabidopsis, and Antirrhinum should identify conserved regulatory elements that became partitioned after gene duplication.
Hoxa1 and Hoxb1 in mouse:
A third possible example of DDC in duplicate genes involves the Hoxa1 and Hoxb1 genes in mouse (Figure 9). These genes reside in duplicate Hox clusters, groups of closely linked genes that encode a family of DNA-binding proteins that specifies fate along the anterior-posterior axis of bilaterian animals (![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
|
Hoxb1 and Hoxa1 cooperate to pattern anterior ectodermal and mesodermal derivatives of vertebrate embryos (Figure 9). Hoxa1 is important for segment identity in rhombomere 5 (r5) of the hindbrain and for the development of the glossopharyngeal nerve as well as more caudal rhombomeres (![]()
![]()
![]()
![]()
![]()
In addition to the independent roles of Hoxa1 and Hoxb1 just discussed, these two genes have early redundant roles, including expression in broadly overlapping territories and activation of some of the same downstream targets (![]()
![]()
![]()
![]()
![]()
![]()
![]()
What experiments can distinguish whether the current subfunctions of murine Hoxa1 and Hoxb1 duplicates arose by subfunctionalization, neofunctionalization, or some other model? A critical issue is whether the r4 and r5 enhancers were present in the ancestral gene before duplication. One can infer the state of the ancestral gene by examining an outgroup that diverged from the lineage of tetrapods just before the duplication event that produced the Hoxa1 and Hoxb1 genes. The lamprey might be such an outgroup (![]()
![]()
![]()
In summary, the examples discussed above provide data that are consistent with the DDC model, and in some cases are more readily explained by the DDC model than the classical model. Further experiments need to be done to firmly establish which route of duplicate gene preservation was employed in each case.
Testing the DDC and classical models:
As we noted earlier, even the most basic premise of the classical model of duplicate gene evolutionthat gene duplicates are preserved only by the evolution of new functionshas never been tested. Because deleterious mutations are much more common than beneficial mutations, we believe that the DDC process provides a reasonable (and parsimonious) alternative explanation for at least some cases of long-term preservation of gene duplicates. Unlike the classical model, the mutational mechanisms that lead to gene preservation by DDC are distinct from those responsible for the origin of new gene functions. On the other hand, by expanding the time period for which genes are exposed to selection, the preservation of duplicates by the DDC process facilitates subsequent opportunity for the evolution of new functions. If the evolution of new gene functions is the only mechanism of duplicate gene preservation, then it should be possible to empirically reject our alternative subfunctionalization hypothesis. We now consider some potentially fruitful avenues for future research.
- Phylogenetic analysis: The subfunctionalization model predicts that the sum of subfunctions in preserved gene duplicates will be equal to the total subfunctions in the ancestral gene. This prediction is clearly distinct from the position of the classical model, which suggests that gene preservation is dependent upon the acquisition of new cis-regulatory regions driving novel expression patterns during development (
SIDOW 1996 ;
COOKE et al. 1997 ). To test these alternative hypotheses, the evolutionary time frame must be short enough to preclude the possibility that genes initially preserved by subfunctionalization will have also subsequently acquired new functions. This then requires the analysis of recently preserved duplicates on a cladogram that also allows the inference of ancestral expression patterns from appropriate outgroups. For example, to explain the derivation of the triplicate Drosophila genes paired, gooseberry, and gooseberry-neuro, which have conserved protein function but distinct developmental functions,
LI and NOLL 1994 suggested that following duplication "genes may acquire new functions by changes in their regulatory regions generating an altered expression" without considering the possibility that these three genes simply result from the differential loss of subsets of the expression domains of the ancestral gene. A phylogenetic analysis of closely related outgroup species with single gene copies would distinguish between the classical and DDC models.
- Mutation rate to subfunctional alleles: The simple subfunctionalization model discussed here requires that the total subfunction mutation rate relative to the total null mutation rate be on the order of 0.3 or larger to achieve at least a 10% probability of duplicate gene preservation, and on the order of 0.7 or larger to achieve at least a 50% probability of gene preservation (Table 1). If the relative rate of formation of subfunctional alleles is not within this range, then subfunctionalization as modeled is unlikely to be a major mechanism of duplicate gene preservation. Experiments must be designed to measure the mutation rate to subfunctional, neofunctional, and nonfunctional alleles to test critically the various models. If empirical studies demonstrate that the rate of mutation to subfunctional alleles is too low relative to the rate of coding null mutations, then this particular subfunctionalization model is falsified.
View this table:
In this window
In a new window
Table 1. Critical values of zur/(uc + zur) required for specific probabilities of subfunctionalization (PS), given for different numbers of subfunctions (z) obtained from Equation 3 and Equation 4 - Regulatory region complexity: The subfunctionalization model predicts that the probability of gene preservation should be higher for more complex genes (with larger numbers of subfunctions), particularly for genes in which the regulatory regions for subfunctions are spatially independent on the DNA because more complex genes provide more targets for subfunction mutations. Testing this prediction requires the molecular characterization of regulatory regions for various genes in species with duplicated genes and comparison with closely related species without the duplications.
- Multiple polyploidization events: The subfunctionalization model makes specific predictions about the probability of duplicate gene preservation after closely spaced polyploidization events, such as those thought to have occurred early in vertebrate phylogeny (
HOLLAND et al. 1994 ;
HOLLAND and GARCIA-FERNANDEZ 1996 ;
NADEAU and SANKOFF 1997 ;
AMORES et al. 1998 ). The DDC model suggests that duplicate loci preserved by subfunctionalization after the first polyploidization event (Figure 1, right) will have fewer subfunctions than the parent locus before duplication (Figure 1, top). Therefore, because theory predicts that the likelihood of preservation depends on the number of subfunctions (Figure 2), the probability that both duplicate loci will be preserved after the second round of duplication is diminished relative to the first polyploidization event. If, on the other hand, a single duplicate survives the first round with all of the original subfunctions intact (Figure 1, left), then after the second round of duplication, the probability of duplicate preservation will be approximately the same as in the first event. If the level of gene preservation does not change between polyploidization events, then subfunctionalization is an unlikely explanation for the preservation of duplicate genes. Data from the HOX complexes of vertebrates suggest that the level of duplicate preservation does indeed decline with subsequent duplication events. Assuming the (AB)(CD) model of HOX cluster duplication (
KAPPEN and RUDDLE 1993 ;
ZHANG and NEI 1996 ;
AMORES et al. 1998 















