Genetics, Vol. 162, 533-542, October 2002, Copyright © 2002

Molecular Basis of Adaptive Convergence in Experimental Populations of RNA Viruses

José M. Cuevasa, Santiago F. Elenaa, and Andrés Moyaa
a Institut Cavanilles de Biodiversitat i Biologia Evolutiva and Departament de Genètica, Universitat de València, 46071 València, Spain

Corresponding author: Andrés Moya, Edifici d'Instituts d'Investigació, Campus de Burjassot-Paterna, Universitat de València, Apartat 22085, 46071 València, Spain., andres.moya{at}uv.es (E-mail)

Communicating editor: H. OCHMAN


*  ABSTRACT
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

Characterizing the molecular basis of adaptation is one of the most important goals in modern evolutionary genetics. Here, we report a full-genome sequence analysis of 21 independent populations of vesicular stomatitis ribovirus evolved on the same cell type but under different demographic regimes. Each demographic regime differed in the effective viral population size. Evolutionary convergences are widespread both at synonymous and nonsynonymous replacements as well as in an intergenic region. We also found evidence for epistasis among sites of the same and different loci. We explain convergences as the consequence of four factors: (1) environmental homogeneity that supposes an identical challenge for each population, (2) structural constraints within the genome, (3) epistatic interactions among sites that create the observed pattern of covariation, and (4) the phenomenon of clonal interference among competing genotypes carrying different beneficial mutations. Using these convergences, we have been able to estimate the fitness contribution of the identified mutations and epistatic groups. Keeping in mind statistical uncertainties, these estimates suggest that along with several beneficial mutations of major effect, many other mutations got fixed as part of a group of epistatic mutations.


ONE of the main tasks in evolutionary biology is to measure the forces operating in populations, not only by statistical inference, but by bringing together evidences from population genetics and functional biology (LEWONTIN 2000 Down). To show which molecular changes are responsible for differences in fitness, we need an appropriate experimental setup, i.e., an experimental population of sufficient size and appropriate time periods where small selection intensities can be detected. RNA viruses meet both conditions. They reach population sizes as high as 1010 after just a few hours of replication in fully susceptible cell cultures. Their evolution can be followed by sampling evolving populations every day or even more often if desired.

Here, we explore the molecular basis of viral adaptation to cell culture by characterizing the molecular changes that occurred during the experimental evolution of two competing populations of the ribovirus vesicular stomatits virus (VSV; MIRALLES et al. 1999 Down). In previous work, we confirmed the role played by clonal interference in the evolution of VSV (MIRALLES et al. 1999 Down, MIRALLES et al. 2000 Down). Clonal interference is a phenomenon characteristic of rapidly evolving asexual populations (GERRISH and LENSKI 1998 Down). As a consequence of high mutation rates and large population sizes, beneficial mutations arise in different coexisting lineages. The lack of recombination precludes the possibility that two beneficial mutations combine in the same genotype and, therefore, coexisting beneficial mutations compete with each other to become fixed in the population. Consequently, only the best possible candidate will become fixed and all alternative beneficial mutations of smaller effect will be outcompeted. The larger the population size or the mutation rate, the more beneficial mutations may coexist and, thus, the more intense clonal interference will be. In our previous work, we tested two specific predictions of this model: (1) A positive correlation exists between the magnitude of the fixed beneficial effect and the population size (MIRALLES et al. 1999 Down) and (2) the rate of adaptation is not linearly proportional to the availability of beneficial mutations but a diminishing-returns effect exists (MIRALLES et al. 1999 Down, MIRALLES et al. 2000 Down).

Our results are interesting in two ways. First, they extend previous observations, on other viral systems, about the extent of convergent evolution under constant environmental conditions. Convergent evolution indicates the existence of constraints in RNA virus evolution. Second, we show how epistasis among nucleotide sites modulates genomic divergence of experimental viral populations. We made an attempt to assign fitness effects to each observed substitution. As a result, we have been able to identify five statistically significant beneficial changes. Four of these changes arose at the protein involved in recognition of the cellular receptor. The fifth positively selected amino acid belongs to the major component of the RNA polymerase. Despite the evidence for multiple mutations occurring on each lineage, our results are still consistent with one of the major predictions of the clonal interference model: the positive correlation between population size and the magnitude of fixed beneficial effects.

Epistasis is a key factor in evolutionary genetics. The existence and abundance of epistasis are important for many different evolutionary theories, including those seeking an explanation for the origin and maintenance of sexual reproduction (KONDRASHOV 1985 Down), diploidy (KONDRASHOV and CROW 1991 Down), dominance (PECK and WAXMAN 2000 Down), reproductive isolation (COYNE 1992 Down), segmentation of viral genomes (CHAO 1988 Down), and even the shifting-balance process (WRIGHT 1931 Down). Therefore, one of the most interesting questions for evolutionary biologists is how common is epistasis. However, the role of epistasis on the evolutionary dynamics of RNA viruses has been poorly explored from an experimental standpoint, despite the fact that their compacted genome seems prone to strong interactions among gene products as well as at the level of the RNA secondary structure. So far, only the extent of synergistic epistasis among deleterious mutations accumulated after consecutive bottlenecks of foot-and-mouth disease virus has been explored (ELENA 1999 Down).


*  MATERIALS AND METHODS
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

Overview of the evolution experiments and fitness determinations:
A detailed description of the experimental methods is provided in MIRALLES et al. 1999 Down. In brief, we mixed, in equal proportions, two variants of VSV that differed only in the presence of a neutral marker, implying that they should coexist in a stable state until a successful beneficial mutation appeared in one of them (MIRALLES et al. 1999 Down). These two clones were a surrogate wild type and an I1 mAb-resistant mutant (MARM C). Seven different evolutionary regimes were designed, each one differing from the others in effective population size, Ne, ranging between ~102 and ~108 viral particles (see MIRALLES et al. 1999 Down for an operative definition of Ne). Each regime was independently replicated five times, for a total of 35 experimental lines. Each mixture was kept under the appropriate batch transfer conditions until one of the two variants became fixed. Our criterion for considering that one allele had fixed was to obtain, during three consecutive days, no evidence for the presence of the opposite genetic marker.

Then, the winner variant was placed in a head-to-head competition with its nonevolved counterpart to estimate the fitness effect (W) associated with the beneficial mutation that became fixed. Competition experiments are described elsewhere (HOLLAND et al. 1991 Down; MIRALLES et al. 1999 Down, MIRALLES et al. 2000 Down). In brief, each evolved population was mixed, in three independent test tubes, with a known amount of the ancestral strain bearing the opposite phenotypic neutral marker, and the initial ratio for each replicate mixture, R0, was determined by performing plaque assays with and without I1 mAb in the agarose overlay medium. Each competition mixture was transferred serially during a sufficient number of passages to obtain good estimates of relative fitness as follows. At each transfer, the resulting virus was diluted and used to initiate the next competition passage by infection of a fresh host cell monolayer. The ratio of each genotype was determined by plating with and without I1 mAb in the overlay agarose medium at different transfers. These determinations gave the proportion of each competitor at time t, Rt. Selection rate constants (S, with per-day units) were estimated from the linear regression ln Rt = ln R0 + St. MIRALLES et al. 1999 Down reported fitness as selection rate constants, whereas the values shown here (fourth column in Fig 1) are dimensionless fitness. To transform a selection rate constant into a true fitness, it is necessary to take into account the dilution factor employed at each competition day (D = 104) and how many days of competition occurred (t), and then apply the expression (LENSKI et al. 1994 Down).



View larger version (36K):
In this window
In a new window
Download PPT slide
 
Figure 1. Molecular changes characterized for independent lineages evolved under different demographic conditions (Ne). The length of each gene (bp) is indicated in parentheses, as well as the type and position (with respect to the total genome length) of the nucleotide change detected. Wherein nonsynonymous changes were detected, the amino acid replacements are also indicated (numbered within the protein). Red boxes represent nonsynonymous changes; green boxes, synonymous changes; and yellow boxes, changes in noncoding regions. Evol stands for the experimental block at which the corresponding population was evolved; Extr represents the experimental block at which RNA was purified from each population.

Twenty-one of the evolved mixed populations were chosen for the whole-genome consensus sequencing analysis reported in this study.

RNA sequences determination:
Three end-point-evolved populations from each of the seven Ne were randomly chosen for sequencing, as well as the two ancestral clones. Full-genome (11,161 bp) sequencing was done according to standard techniques (BRACHO et al. 1998 Down). For each evolved population, we determined the consensus sequence of the whole population. RNA was extracted by a guanidinium thiocyanate-phenol-chloroform method, and it was reverse-transcribed and amplified by reverse transcription (RT)-PCR with specific primers. PCR products were directly sequenced in an ABI 310 automated sequencer following manufacturer's indications. Sequences were processed and analyzed with the STADEN software package. We used 14 pairs of primers to amplify the whole genome (available upon request). The resulting fragments were overlapping, facilitating the task of assembling fragments.

The sequencing of both ancestral clones (MARM C and wild type) confirmed that they differed only in the Asp259 -> Ala substitution in the G surface protein that confers the neutral MARM phenotype.

Minimizing the risk of contaminations:
To minimize the possibility of cross-contamination among evolving populations, we took four precautions:

  1. The five replicates of the evolution experiments were done in four independent, sequential in time, blocks (1, 2, 3, and 4 + 5; second column in Fig 1) by MIRALLES et al. 1999 Down. Each block contained one replicate per each Ne. By doing this, we protected the experiment against among-replicates cross-contamination, since blocks were separated in time (with the exception of only replicates 4 and 5). Still, common changes are present among replicates.

  2. 2. RNA extractions were done in six independent blocks (third column in Fig 1), and each block was separated in time as well. During RNA extraction, control tubes were added. Each mock tube was subjected to exactly the same steps as the true samples. We never obtained RNA from these tubes.

  3. The RT-PCR and sequencing reactions were done in random blocks, protecting the results from undesirable effects associated with specific blocks.

  4. Regarding the probability of cross-contamination during the RT-PCR procedure, each time we ran a RT-PCR, we also ran control tubes containing all the reaction mixture but without the RNA template. DNA amplification was never obtained within these control tubes.


*  RESULTS
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

Ruling out the possibility of cross-contamination:
Our conclusion of evolutionary convergence is extremely sensitive to contaminations. Contaminations can create identical changes in otherwise unrelated lineages and thus give the false impression of convergence. Therefore, previous to making any inference from our data regarding evolutionary convergences, it is compulsory to test whether our results can be explained simply by cross-contamination among populations. A possible way to test whether the observed sequences are the result of cross-contamination among populations evolved at the same time or among RNA extractions done simultaneously is to compute a phylogenetic tree from the sequences obtained for each population and compare the observed clades with those expected if contaminations occurred at the different critical stages of our experimental protocol. Fig 2 shows the estimated maximum- likelihood unrooted tree topology. This tree was obtained using DNAML program from PHYLIP v3.6{alpha} package (http://evolution.genetics.washington.edu/phylip) and assuming Kimura's two-parameter nucleotide substitution model. Bootstrap values were obtained from a set of 1000 randomized sequences. The Akaike information criterion (AIC; AKAIKE 1974 Down) for this tree topology is 423.04. As the tree shows, sequences do not group according to RNA extraction blocks (third column in Fig 1). Under the hypothesis of cross-contamination, during the RNA purification, among lineages treated simultaneously, we should expect a tree with six clades, each containing sequences of populations treated at the same time. The AIC for such a tree (actually, the one showing the smallest value) was 587.43, larger than the value obtained for the inferred tree. This result rules out the possibility of cross-contamination during the RNA purification. [Note that the model with minimum AIC is considered to be the most appropriate one (AKAIKE 1974 Down).] In general, with the exception of three pairs of identical sequences (sequences in lines 4 and 7, 16 and 20, and 17 and 21 in Fig 1), sequences do not group according to blocks of evolution (second column in Fig 1). According to the hypothesis of cross-contamination among populations evolved at the same time, we should expect a tree showing four different clades, one for each evolution block. The AIC for such a tree topology (again, the one showing the smallest value) was 571.46, larger than that observed for the inferred tree topology. [The SHIMODAIRA and HASEWAGA (1999) test generated exactly the same conclusion: None of the trees expected under both cross-contamination hypotheses was significantly better than the inferred tree.]



View larger version (16K):
In this window
In a new window
Download PPT slide
 
Figure 2. A maximum-likelihood phylogenetic analysis of the sequences rules out the possibility of cross-contamination among populations at different stages of the experimental protocol. Numbers in nodes correspond with bootstrap P values. Populations are labeled as follows: The first number corresponds with the effective population size at which it was evolved, the second number with the evolution experimental block, and the third number represents the RNA purification block.

Most of the bootstrap P values shown in Fig 2 are too low for identifying significant clusters. This finding gives further support to the view of an independent evolutionary history for each population; in other words, populations radiated from a common ancestor. In conclusion, results are better explained by evolutionary convergence on independently evolved lineages than by cross-contamination among lineages at the two more critical steps of our experimental protocol.

Nevertheless, the existence of three pairs of identical sequences that were evolved in the same blocks calls for extra caution when interpreting the results. These three pairs could be the result of either evolutionary convergences, as we believe, or rare cross-contaminations during the evolutionary part of our experiment.

Identification of nucleotide substitutions:
Fig 1 shows the fitness values estimated for each of the 21 chosen populations (MIRALLES et al. 1999 Down) as well as the 25 molecular changes observed. Red blocks indicate nonsynonymous changes, green blocks synonymous changes, and yellow blocks changes in intergenic regions. Twenty-five different positions showed differences among evolved populations and the ancestral genotypes. The substitutions observed in each population were always fixed (at the resolution level of the sequencing chromatograms). Of those, 15 were nonsynonymous changes, 6 were synonymous changes, and only 3 were changes at intergenic regions. Regarding specific genes, the nucleocapsid protein (N) does not show any difference, a fact that could be telling us about the importance of this gene. The phosphoprotein (P), which is a minor component of the RNA polymerase, shows four variable sites among populations; two were synonymous and two were nonsynonymous. The matrix protein (M), which covers the inner side of the virion, also shows four variable sites among populations; three of them are nonsynonymous changes. The coat protein (G) is the major antigenic component of the capsid and responsible for interactions with the cell receptors. A priori, we should expect this gene to be one of the main targets of selection during the adaptation to cell culture conditions. Naked eye inspection of the data supports this expectation, since 10 nucleotide positions were variable among populations, being 7 nonsynonymous changes. Finally, the major component of the RNA polymerase protein (L) shows four polymorphic sites among populations, three of them nonsynonymous.

Heterogeneity among sites and among genes in the number of mutations:
Of course, the observed pattern of polymorphic sites could, in principle, be produced by chance. To rule out this possibility, we computed the dispersion coefficient, {delta}, of the number of times that a site changed across lineages (SOKAL and ROHLF 1995 Down). Under the null hypothesis of changes being randomly distributed across the entire 11,161 bp of the genome, the number of times that a site changes follows a Poisson distribution and hence {delta} = 1. Alternatively, if certain sites changed by chance more often than expected, then {delta} > 1. Our estimate of {delta} = 5.16 rejects the null hypothesis of observing this pattern by chance (, one-tailed P < 0.0001). (This conclusion holds if instead of the full genome we compute {delta} for only the 25 variable sites: , 24 d.f., one-tailed P = 0.0001.) Hence, we conclude that some sites mutated more often than others along the entire genome.

Now, we were interested in testing whether different genes accumulated different numbers of mutations or, alternatively, whether mutations were equally scattered among genes. A Scheirer-Ray-Hare nonparametric two-way ANOVA (SOKAL and ROHLF 1995 Down) showed a significant among-genes heterogeneity in the number of mutations accumulated per nucleotide site (Table 1). A post hoc Tukey's test (SOKAL and ROHLF 1995 Down) indicates that this difference was due to the ~10-fold lower number of mutations per nucleotide site observed for the L gene, the major RNA polymerase component. This observation suggests that the L gene is subjected to more evolutionary constraints than any other component in the genome (with the exception of the N gene, at which we did not find any change at all). On the other side, on average, M, P, and G genes had the same number of mutations per nucleotide site. This tells us about the naïveté of our above expectation of the G gene fixing more changes than any other component of the genome as a result of being an important target of selection. However, as we see in the next two sections, changes in the G protein were selectively more important than changes in other parts of the genome. There is no effect of the Ne on the number of mutations fixed per gene. The interaction of gene by Ne term was not significant either.


 
View this table:
In this window
In a new window

 
Table 1. Nonparametric analysis of variance for the number of nucleotide substitutions among genes and for each experimental population size

Covariation among sites:
Another interesting feature of Fig 1 is the existence of an apparent covariation among sites. For example, the substitution of Ser195 -> Pro at the P gene in almost every case goes together with the substitution Gly165 -> Asp at the M gene. Similarly, substitutions Pro120 -> Gln and C2750 -> U at the M gene and A10162 -> G at the L gene always occur together. Covariation among sites within and among genes arises as a result of epistatic gene interactions and selection acting on the groups of sites rather than on individual sites. To assess the extent of this covariation among sites, as well as to identify pairs of significantly interacting sites, we computed the matrix of covariances among sites. Fig 3 shows, in a schematic way, the 16 significant cases (after sequential Bonferroni's correction for multiple tests of the same hypothesis). Significant covariance values ranged between 0.0476 (the smallest bubbles in Fig 3) and 0.1857 (the largest bubble in Fig 3). These 16 significant cases can be grouped into the following six different covariation groups.

  • Group I: U1820 (Ile142 -> Thr) in the P gene covaried with site A3675 (Met200 -> Val) in the G gene. These two sites are present in only one population (labeled 106 3 4 in Fig 1).



    View larger version (58K):
    In this window
    In a new window
    Download PPT slide
     
    Figure 3. Covariation among nucleotide sites. Each significant covariance among sites (after sequential Bonferroni's correction) is represented in the plot by a bubble. The size of each bubble is proportional to the magnitude of the covariance. The rectangles represent regions of possible covariation within genes. Every significant covariance outside these regions corresponds with interactions among genes. It is noteworthy that the number of significant epistatic interactions is significantly greater among than within genes (binomial test, P = 0.0213). Only the upper one-half of the matrix is shown.

  • Group II: U1978 (Ser195 -> Pro in the P gene) and G2743 (Gly165 -> Asp in the M gene) were present together in several lineages (Fig 1). It is worth noting that all synonymous changes always covaried with nonsynonymous changes (with the only exception of unique change U3524 -> C in the G gene, which, indeed, was not neutral; see below).

  • Group III: For example, synonymous sites G1713 and A4394 covaried with nonsynonymous sites A3802 (Asp242 -> Arg in the G protein) and U5202 (Leu157 -> Trp in the L protein) in the same population (labeled 106 3 4 in Fig 1.

  • Group IV: A synonymous change at site C2151 was present in several populations significantly associated with a nonsynonymous change at site G3846 (Asp257 -> Asn of the G protein).

  • Group V: Similarly, nonsynonymous changes at sites C2750 and A10162 significantly covaried with a nonsynonymous change at position C2608 (Pro120 -> Gln) in several populations.

  • Group VI: Finally, a synonymous substitution at site U4295 was associated with a nonsynonymous substitution in site U3848 (Asp257 -> Asn of the G protein) in a single population (labeled 102 2 1 in Fig 1).

Could we assign fitness values to specific mutations or to covariation groups? We took advantage of these convergences to estimate specific fitness effects for each site or covariation group (i.e., epistatic sites). The general way to model epistatic fitness effects is to assume that the fitness of a genotype that carries k epistatic mutations that interact together as a group is , where the si are the multiplicative effects of each mutation and e1,2,...,k is the increase in fitness gained by the overall interaction of all mutations in the group (PHILLIPS et al. 2000 Down). Hence, to describe the fitness effect of each epistatic group of mutations we need to estimate as many parameters from the dataset as mutations involved in the group [si (i = 1, 2, ... , k)] plus an overall interaction term (e1,2,...,k). A way of reducing the number of parameters to be estimated is to assign a single fitness value to each epistatic group as a whole, regardless of the specific contribution of each mutation. This value will, obviously, be composed of the individual contribution of each mutation plus the added value.

By assuming an infinite-sites model, the expected fitness of a given clone, E(Wi), will be determined by multiplying the fitness effects, 1 + sj, associated with each observed covariation group (six in our dataset, involving 15 nucleotide sites) and with each nonepistatic, purely multiplicative, site (25 - 15 = 10). Hence, . These 16 sj parameters can then be estimated, from the 21 populations, by means of a quadratic sequential programming procedure that minimizes the differences between the observed and expected fitnesses (CNLR procedure in the SPSS package). The advantage of quadratic sequential programming compared with linear regression is that the former allows us to impose biologically meaningful restrictions to the parameters. We used the following two restrictions: (1) When a mutation appears alone, it necessarily has to be beneficial (i.e., sj > 0); (2) if a mutation belonging to an epistatic group is found alone in any population, then it must be beneficial by itself (sj > 0) and its numerical value must be necessarily smaller than the value for the whole epistatic group. Furthermore, the MIRALLES et al. 1999 Down study dealt with beneficial mutations and fitness improvements. Deleterious mutations would not have an effect under their experimental setup since they were washed out by strong purifying selection. Hence, we implicitly assumed that all the nonvariable sites detected in our study (11,161 - 25 = 11,136) were deleterious or (quasi-neutral) under our experimental conditions.

Table 2 shows the estimated fitness contribution of each observed change (either point mutations or covariation groups). We report in Table 2 only those sites whose fitness contributions converged to nonzero estimates (not necessarily significant) in the quadratic sequential programming procedure. The correlation between predicted and observed fitness values can be used as a way to assess the goodness of fit of the model. Here, the match between observed and predicted fitnesses was highly significant (r = 0.8966, 19 d.f., P < 0.0001). All nonsignificant cases must be considered either as neutral or, in the worse case, as small-effect mutations that are indistinguishable from zero with the statistical power associated with our experimental sample size.


 
View this table:
In this window
In a new window

 
Table 2. Fitness values estimated for different mutations and covariation groups

None of the four changes found at the P protein were beneficial by themselves. The synonymous substitution C2151 -> U, along with the replacement Asp257 -> Asn at the G protein, had an estimated fitness effect of 4.9%, although nonsignificant. The only substitution at intergenic regions with a fitness effect was C2990 -> U at the M-G segment. The small effect associated with this substitution, 0.3%, was not significant either. Whether or not this replacement has a real fitness effect, the fact is that nucleotide substitutions in intergenic regions had been previously reported to be of special significance for the transcription levels of downstream genes in VSV (SMALLWOOD and MOYER 1993 Down). One of the four changes found in the M protein was beneficial, with a fitness effect of ~2.3%. This was a nonsynonymous change, which implies replacing a positive charge (Lys) by a polar radical (Thr). However, it is worthwhile to note that this change was not significant either. The small effect of this change is probably associated with the fact that it was present in only one clone. Five significant changes were detected at the G protein, one of them linked with a change in the P protein (see above). Surprisingly, one of these beneficial substitutions was the synonymous change at position 3524. Furthermore, this change had the largest observed effect on fitness among all the beneficial mutations (~6.7%). The change Leu211 -> Pro (~4.8% effect) implies a substantial structural change in the protein, since Pro is associated with the induction of ß-turns. The change Arg345 -> Lys a priori seems conservative, since it maintains the positive charge at the radical. However, it has a significant effect of ~5.6%. Finally, the change Thr368 -> Ala (~4.3% effect) replaces a polar radical by a short nonpolar one. As we speculated above, the G protein is important for adaptation to a cellular host, since it is responsible for interacting with the cellular receptor (phosphatidylserine). In the other side, only one nonsynonymous change was identified at the L protein. The change Ile1516 -> Val at this important gene had the second largest effect detected (~5.8%), although it is difficult to find a biochemical explanation of this effect, since it implies a reduction of only one carbon in the length of the radical. Nonetheless, changes in the major component of the RNA polymerase should be of importance in our experimental system, since our daily transfer protocol selects for rapidly replicating viruses and improvement in replication efficiency must arise by changes in the L protein.

Conservatively, we can say that changes in 5 of the 11,161 nucleotide sites of the VSV genome were selectively important in our experimental conditions.


*  DISCUSSION
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

Our experiment dealt with the existence of evolutionary convergences. Evolutionary convergences constitute a very slippery topic, since a result of convergent evolution can always be seen as a cross-contamination by those critical of the existence of convergence. The only serious way to address evolutionary convergences is to (1) design and run experiments in such a way that physical or temporal coexistence of evolving lineages is minimized and (2) test whether the results can be explained by potential contaminations at different experimental steps. With our experiments, we took all possible precautions to minimize the risk of cross-contaminations and, in fact, a detailed phylogenetic analysis of our results supports the view that our results are better explained by evolutionary convergences than by a general contamination at different steps. However, we found three pairs of identical sequences that were, unfortunately, obtained from populations evolved in the same blocks. For these three particular cases, our phylogenetic analysis could not fully rule out the possibility of a cross-contamination. The problem in these cases was that we did not have any basis to decide which member of each pair must be eliminated (the contaminated) and which one retained (the contaminator) in the analysis. Conservatively, we removed all six sequences and redid all the above analysis. By doing so, a few covariances in Fig 3 became not significant (but none that were nonsignificant changed) and the CNLR method provided slightly different estimates for fitness effects (interestingly, the four significant cases in Table 2 retained their significance). Hence, despite absolute numerical values, our major conclusions of molecular convergences and epistasis are solidly supported by our dataset.

One of the most amazing features illustrated in Fig 1 is the large amount of evolutionary convergences observed among independent lineages. Twelve of the variable sites were shared by different lineages. More surprisingly, convergences also occurred within synonymous sites and intergenic regions. Evolutionary convergences during the adaptation of viral lineages under identical artificial environmental conditions have been described previously (BULL et al. 1997 Down; WICHMAN et al. 1999 Down; FARES et al. 2001 Down). However, this phenomenon is observed not only in the laboratory. It is also a relatively widespread observation among human immunodeficiency virus (HIV)-1 clones isolated from patients treated with different antiviral drugs; parallel changes are frequent, often following a common order of appearance (LARDER et al. 1991 Down; BOUCHER et al. 1992 Down; KELLAM et al. 1994 Down; CONDRA et al. 1996 Down; MARTINEZ-PICADO et al. 2000 Down). Subsequent substitutions may confer increasing levels of drug resistance or, alternatively, may compensate for deleterious pleiotropic effects of earlier mutations (MOLLA et al. 1996 Down; MARTINEZ-PICADO et al. 1999 Down; NIJHUIS et al. 1999 Down). Also, molecular convergences have been observed between chimeric simian-human immunodeficiency viruses (strain SHIV-vpu+) isolated from pig-tailed macaques, rhesus monkeys, and humans after either chronic infections or rapid virus passage (HOFMANN-LEHMANN et al. 2002 Down). Convergent evolution at the molecular level is not controversial as long as it can be reconciled with the neutralist and the selectionist theories. The neutral theory suggests that convergences are simply accidents, whereas within the framework of selectionism, there are two qualifications for convergences. The first explanation considers convergences as being adaptive and the result of organisms facing the same environment (as in the case of our experiments) with a few alternative pathways of adaptation (as expected for compacted genomes). Second, keeping in mind the model of clonal interference, beneficial mutations have to become fixed in an ordered way (GERRISH and LENSKI 1998 Down), with the best possible candidate fixing first, and then the second best candidate, and so on. This implies that, given a large enough population size to make clonal interference an important evolutionary factor, we should always expect the same mutations to be fixed.

The above argument is valid for nonsynonymous changes but an alternative explanation must be found for synonymous changes and for changes in the intergenic regions. Genomic RNA is involved in many RNA-RNA and RNA-protein interactions that affect viral replication. This is obvious for noncoding, regulatory regions (STILLMAN and WHITT 1997 Down, STILLMAN and WHITT 1998 Down), but there is increasing evidence that capsid-coding regions in picornaviruses may also have an effect on viral replication (MCKNIGHT and LEMON 1998 Down; FARES et al. 2001 Down). Therefore, the RNA itself (apart from its protein-coding capacity) may contribute to the viral phenotype, and fitness may also be affected by synonymous replacements. Evidence for selection on synonymous sites has been inferred also in mammals (EYRE-WALKER 1999 Down), as a consequence of selection acting upon the base composition of isochores and large sections of junk DNA.

For the sake of illustration, it would be interesting to compare the number of selectively important sites in the VSV genome with those estimated for other genomes. For example, FAY et al. 2001 Down reported that, in humans, the vast majority (80%) of amino acidic changes are deleterious to some extent and only a minor fraction are neutral. Among these deleterious amino acidic mutations, at least 20% are slightly deleterious. Here, we found that 15 amino acid sites changed, with only 5 being significantly advantageous. At this point, we can only speculate about the selective role of all the amino acid sites shown to be invariable in our study. The total number of amino acids in the five genes of VSV is 3536. Assuming that changes in any of the 3536 - 15 = 3521 invariable amino acids would be deleterious (and thus washed out by purifying selection during our evolution experiment), then the fraction of amino acid replacements that are potentially harmful would be 3521/3536 {equiv} 99.58%; the fraction of neutral sites would be 10/3536 {equiv} 0.28%, whereas only 5/3536 {equiv} 0.14% would be beneficial. Despite the differences between humans and VSV in genome size and organization and in the nature of the nucleic acid used, in both cases the fraction of potentially deleterious amino acid substitutions is overwhelmingly larger than that of neutral or beneficial ones.

At the other extreme, we can make the otherwise unrealistic assumption that all the 3521 invariable amino acids are neutral or quasi-neutral (sensu OHTA 1973 Down; depending on population size); then the fraction of neutral sites in the VSV genome would be as high as 3531/3536 {equiv} 99.86%, 0.14% would be advantageous, and none would be deleterious. However, recent computations showed that the deleterious genomic mutation rate for VSV might be as high as ~1.2 per genome replication, with a majority of mutations of small effect but with a significant fraction being of large effect (ELENA and MOYA 1999 Down).

Quantifying the proportion of amino acid substitutions in the proteome of RNA viruses that are effectively neutral (or quasi-neutral) is of extreme importance for testing the validity of the quasi-species model of viral evolution. The quasi-species model differs from the classical population genetics models in that neutral mutations do not lead to genetic drift of the population, and natural selection acts on the mutant distribution as a whole rather than on individual variants. The reason for challenging the quasi-species model resides on whether or not neutral sites are abundant in the genome (JENKINS et al. 2001 Down; HOLMES and MOYA 2002 Down). If neutral mutations are abundant, the sequence space may be too large to allow the formation of a stable quasi-species distribution. If an RNA virus genome contains neutral sites, then the number of sequences close to maximum fitness may exceed the population size of the virus itself (JENKINS et al. 2001 Down). This would prevent the formation of a quasi-species because the viral population could not occupy all the sequence space around the master sequence and so would be subjected to genetic drift, which in turn would prevent the mutational coupling necessary for natural selection to act on the entire mutant distribution.

The presence of multiple beneficial mutations in the same genome allows some important conclusions to be drawn. Fitness effects previously measured (MIRALLES et al. 1999 Down) cannot be assigned to single changes. Instead, the picture is a little bit more complex, involving multiple beneficial mutations as well as linked neutral changes. Is this jeopardizing our previous conclusion of clonal interference as a key factor during the evolution of RNA viruses? To test whether the first prediction (see above) of the clonal interference model still holds when multiple mutations got fixed on each lineage, we studied the correlation between the best beneficial effect fixed within each clone (from Table 2) and Ne. If the prediction still holds, a positive correlation between these two parameters would be expected. Fig 4 illustrates the existence of a significant positive trend (r = 0.6128, 19 d.f., one-tailed P = 0.0016), confirming the prediction made by the clonal interference model.



View larger version (11K):
In this window
In a new window
Download PPT slide
 
Figure 4. Correlation between the best beneficial mutation fixed in a given lineage and the effective population size at which the lineage evolved. Error bars represent SEM (n = 3). The dashed line is shown for illustrative purposes only.

A second important conclusion of our study is the existence of extensive epistatic interactions among and within viral genes (Fig 3). However, we have shown that the net effect of epistasis on fitness is likely to be very low in VSV, since only 1 out of 16 detected epistatic interactions had significant fitness effects (4.9%, Table 2). This result adds to the emerging picture that epistasis between mutations is very weak across a broad phylogenetic range (DE VISSER et al. 1997 Down; ELENA and LENSKI 1997 Down; CHARLESWORTH 1998 Down; ELENA 1999 Down; PETERS and KEIGHTLEY 2000 Down).

The results reported in this article open the possibility for new and exciting research. The most straightforward step to be taken is the recreation, by site-directed mutagenesis on a full-genome cDNA, of each observed mutation alone and measuring the fitness of individual mutations to confirm the validity of the above inferences and the role of epistasis among pairs of mutations. Other interesting research will be to quantify the magnitude of nucleotide diversity within our evolving populations. The results of such an analysis would, in principle, be helpful for understanding the role of processes such as hitchhiking, background selection, or selective sweeps in the evolution of RNA viruses.


*  ACKNOWLEDGMENTS

We thank Rosario Miralles and Rafael Sanjuán for stimulating and fruitful discussions, Olga Cuesta for technical assistance, and A. M. Powell for critical reading of the manuscript. The comments of two reviewers were invaluable for improving the manuscript. This work was supported by a predoctoral fellowship to J.M.C. and a grant to A.M., both from the Spanish Ministerio de Educación y Cultura.

Manuscript received March 27, 2002; Accepted for publication June 24, 2002.


*  LITERATURE CITED
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

AKAIKE, H., 1974  A new look at the statistical model identification. IEEE Trans. Autom. Contr. 19:716-723.

BOUCHER, C. A. B., E. O'SULLIVAN, J. W. MULDER, C. RAMAUTARSING, and P. KELLAM et al., 1992  Ordered appearance of zidovudine resistance mutations during treatment of 18 human immunodeficiency virus-positive subjects. J. Infect. Dis. 165:105-110.[Medline]

BRACHO, M. A., A. MOYA, and E. BARRIO, 1998  Contribution of Taq polymerase-induced errors to the estimation of RNA virus diversity. J. Gen. Virol. 79:2921-2928.[Abstract]

BULL, J. J., M. R. BADGETT, H. A. WICHMAN, J. P. HUELSENBECK, and D. M. HILLIS et al., 1997  Exceptional convergent evolution in a virus. Genetics 147:1497-1507.[Abstract]

CHAO, L., 1988  Evolution of sex in RNA viruses. J. Theor. Biol. 133:99-112.[Medline]

CHARLESWORTH, B., 1998  The effect of synergistic epistasis on the inbreeding load. Genet. Res. 71:85-89.[Medline]

CONDRA, J. H., D. J. HOLDER, W. A. SCHLEIF, O. M. BLAHY, and R. M. DANOVICH et al., 1996  Genetic correlates of in vivo viral resistance to indinavir, a human immunodeficiency virus type 1 protease inhibitor. J. Virol. 70:8270-8276.[Abstract]

COYNE, J. A., 1992  The genetics of speciation. Nature 355:511-515.

DE VISSER, J. A. G. M., R. F. HOEKSTRA, and H. VAN DEN ENDE, 1997  Test of interaction between genetic markers that affect fitness in Aspergillus niger.. Evolution 51:1499-1505.

ELENA, S. F., 1999  Little evidence for synergism among deleterious mutations in a nonsegmented RNA virus. J. Mol. Evol. 49:703-707.[Medline]

ELENA, S. F. and R. E. LENSKI, 1997  Test of synergistic interactions among deleterious mutations in bacteria. Nature 390:395-398.[Medline]

ELENA, S. F. and A. MOYA, 1999  Rate of deleterious mutation and the distribution of its effects on fitness in vesicular stomatitis virus. J. Evol. Biol. 12:1078-1088.

EYRE-WALKER, A., 1999  Evidence of selection on silent site base composition in mammals: potential implications for the evolution of isochores and junk DNA. Genetics 152:675-683.[Abstract/Free Full Text]

FARES, M. A., A. MOYA, C. ESCARMIS, E. BARANOWSKI, and E. DOMINGO et al., 2001  Evidence for positive selection in the capsid protein-coding region of the foot-and-mouth disease virus (FMDV) subjected to experimental passage regimens. Mol. Biol. Evol. 18:10-21.[Abstract/Free Full Text]

FAY, J. C., G. J. WYCHKOFF, and C.-I WU, 2001  Positive and negative selection on the human genome. Genetics 158:1227-1234.[Abstract/Free Full Text]

GERRISH, P. J. and R. E. LENSKI, 1998  The fate of competing beneficial mutations in an asexual population. Genetica 102(103):127-144.

HOFMANN-LEHMANN, R., J. VLASAK, A. L. CHENINE, P. L. LI, and T. W. BABA et al., 2002  Molecular evolution of human immunodeficiency virus env in humans and monkeys: similar patterns occur during natural disease progression or rapid virus passage. J. Virol. 76:5278-5284.[Abstract/Free Full Text]

HOLLAND, J. J., J. C. DE LA TORRE, D. K. CLARKE, and E. A. DUARTE, 1991  Quantitation of relative fitness and great adaptability of clonal populations of RNA viruses. J. Virol. 65:2960-2967.[Abstract/Free Full Text]

HOLMES, E. C. and A. MOYA, 2002  Is the quasispecies concept relevant to RNA viruses? J. Virol. 76:460-462.[Free Full Text]

JENKINS, G. M., M. WOROBEY, C. H. WOELK, and E. C. HOLMES, 2001  Nonquasispecies evidence for the evolution of RNA viruses. Mol. Biol. Evol. 18:987-994.[Abstract/Free Full Text]

KELLAM, P., C. A. B. BOUCHER, J. M. G. H. TIJNAGEL, and B. A. LARDER, 1994  Zidovudine treatment results in the selection of human immunodeficiency virus type 1 variants whose genotypes confer increasing levels of drug resistance. J. Gen. Virol. 75:341-351.[Abstract/Free Full Text]

KONDRASHOV, A. S., 1985  Deleterious mutations as an evolutionary factor. II. Facultative apomixes and selfing. Genetics 111:635-653.[Abstract/Free Full Text]

KONDRASHOV, A. S. and J. F. CROW, 1991  Haploidy or diploidy: Which is better? Nature 351:314-315.[Medline]

LARDER, B. A., K. E. COATES, and S. D. KEMP, 1991  Zidovudine-resistant human immunodeficiency virus selected by passage in cell culture. J. Virol. 65:5232-5236.[Abstract/Free Full Text]

LENSKI, R. E., V. SOUZA, L. P. DUONG, Q. G. PHAN, and T. N. NGUYEN et al., 1994  Epistatic effects of promoter and repressor functions of the Tn10 tetracycline-resistance operon of the fitness of Escherichia coli.. Mol. Ecol. 3:127-135.[Medline]

LEWONTIN, R., 2000 Population genetics: problems, foundations, and historical perspectives, pp. 5–23 in Evolutionary Genetics: From Molecules to Morphology, edited by C. B. KRIMBAS and R. S. SINGH. Cambridge University Press, Cambridge, UK.

MARTÍNEZ-PICADO, J., A. V. SAVARA, L. SUTTON, and R. T. D'AQUILA, 1999  Replicative fitness of protease inhibitor-resistant mutants of human immunodeficiency virus type 1. J. Virol. 73:3744-3752.[Abstract/Free Full Text]

MARTÍNEZ-PICADO, J., M. P. DEPASQUALE, N. KARTSONIS, G. J. HANNA, and J. WONG et al., 2000  Antiretroviral resistance during successful therapy of HIV type 1 infection. Proc. Natl. Acad. Sci. USA 97:10948-10953.[Abstract/Free Full Text]

MCKNIGHT, K. L. and S. M. LEMON, 1998  The rhinovirus type 14 genome contains an internally located RNA structure that is required for viral replication. RNA 4:1569-1584.[Abstract]

MIRALLES, R., P. J. GERRISH, A. MOYA, and S. F. ELENA, 1999  Clonal interference and the evolution of RNA viruses. Science 285:1745-1747.[Abstract/Free Full Text]

MIRALLES, R., A. MOYA, and S. F. ELENA, 2000  Diminishing returns of population size in the rate of RNA virus adaptation. J. Virol. 74:3566-3571.[Abstract/Free Full Text]

MOLLA, A., M. KORNEYEVA, Q. GAO, S. VASAVANONDA, and P. J. SCHIPPER et al., 1996  Ordered accumulation of mutations in HIV protease confers resistance to ritonavir. Nat. Med. 2:760-766.[Medline]

NIJHUIS, M., R. SCHUURMAN, D. DE JONG, E. J. RICKSON, and E. GUSTCHINA et al., 1999  Increased fitness of drug resistant HIV-1 protease as a result of acquisition of compensatory mutations during suboptimal therapy. AIDS 13:2349-2359.[Medline]

OHTA, T., 1973  Slightly deleterious mutant substitutions in evolution. Nature 246:74-76.

PECK, J. R. and D. WAXMAN, 2000  Mutation and sex in a competitive world. Nature 406:399-404.[Medline]

PETERS, A. D. and P. D. KEIGHTLEY, 2000  A test for epistasis among induced mutations in Caenorhabditis elegans.. Genetics 156:1635-1647.[Abstract/Free Full Text]

PHILLIPS, P. C., S. P. OTTO and M. C. WHITLOCK, 2000 Beyond the average: the evolutionary importance of gene interactions and variability of epistatic effects, pp. 20–38 in Epistasis and the Evolutionary Process, edited by J. B. WOLF, E. D. BRODIE III and M. J. WADE. Oxford University Press, Oxford.

SHIMODAIRA, H. and M. HASEGAWA, 1999  Multiple comparisons of log-likelihoods with applications to phylogenetic inference. Mol. Biol. Evol. 16:1114-1116.

SMALLWOOD, S. and S. A. MOYER, 1993  Promoter analysis of the vesicular stomatitis virus RNA polymerase. Virology 192:254-263.[Medline]

SOKAL, R. S., and F. J. ROHLF, 1995 Biometry, Ed. 3. W. H. Freeman, New York.

STILLMAN, E. A. and M. A. WHITT, 1997  Mutational analyses of the intergenic dinucleotide and the transcriptional start sequence of vesicular stomatitis virus (VSV) define sequences required for efficient termination and initiation of VSV transcripts. J. Virol. 71:2127-2137.[Abstract]

STILLMAN, E. A. and M. A. WHITT, 1998  The length and sequence composition of vesicular stomatitis virus intergenic regions affect mRNA levels and the site of transcript initiation. J. Virol. 72:5565-5572.[Abstract/Free Full Text]

WICHMAN, H. A., M. R. BADGETT, L. A. SCOTT, C. M. BOULIANNE, and J. J. BULL, 1999  Different trajectories of parallel evolution during viral adaptation. Science 285:422-424.[Abstract/Free Full Text]

WRIGHT, S., 1931  Evolution in Mendelian populations. Genetics 16:97-159.[Free Full Text]




This article has been cited by other articles:


Home page
Mol Biol EvolHome page
S. K. Remold, A. Rambaut, and P. E. Turner
Evolutionary Genomics of Host Adaptation in Vesicular Stomatitis Virus
Mol. Biol. Evol., June 1, 2008; 25(6): 1138 - 1147.
[Abstract] [Full Text] [PDF]


Home page
J. Virol.Home page
P. Carrasco, F. de la Iglesia, and S. F. Elena
Distribution of Fitness and Virulence Effects Caused by Single-Nucleotide Substitutions in Tobacco Etch Virus
J. Virol., December 1, 2007; 81(23): 12979 - 12984.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
S. Collins, J. de Meaux, and C. Acquisti
Adaptive Walks Toward a Moving Optimum
Genetics, June 1, 2007; 176(2): 1089 - 1099.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
R. Sanjuan, J. M. Cuevas, A. Moya, and S. F. Elena
Epistasis and the Adaptability of an RNA Virus
Genetics, July 1, 2005; 170(3): 1001 - 1008.
[Abstract] [Full Text] [PDF]


Home page
J. Virol.Home page
S. Zarate and I. S. Novella
Vesicular Stomatitis Virus Evolution during Alternation between Persistent Infection in Insect Cells and Acute Infection in Mammalian Cells Is Dominated by the Persistence Phase
J. Virol., November 15, 2004; 78(22): 12236 - 12242.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
R. Sanjuan, A. Moya, and S. F. Elena
The distribution of fitness effects caused by single-nucleotide substitutions in an RNA virus
PNAS, June 1, 2004; 101(22): 8396 - 8401.
[Abstract] [Full Text] [PDF]


Home page
J. Virol.Home page
E. C. Holmes
Molecular Clocks and the Puzzle of RNA Virus Origins
J. Virol., April 1, 2003; 77(7): 3893 - 3897.
[Full Text] [PDF]