Genetics, Vol. 155, 1683-1692, August 2000, Copyright © 2000

Nonrandom Spatial Distribution of Synonymous Substitutions in the GP63 Gene From Leishmania

Fernando Alvarez-Valina, José Francisco Tortb, and Giorgio Bernardic
a Sección Biomatemática, Facultad de Ciencias, Montevideo 11400, Uruguay,
b Departamento de Genética, Facultad de Medicina, Montevideo 11400, Uruguay
c Laboratorio di Evoluzione Molecolare, Stazione Zoologica Anton Dohrn, Villa Comunale, I-80121, Napoli, Italy

Corresponding author: Fernando Alvarez-Valin, Sección Biomatemática, Facultad de Ciencias, Igua 4225 Montevideo 11400, Uruguay., falvarez{at}fcien.edu.uy (E-mail)

Communicating editor: S. YOKOYAMA


*  ABSTRACT
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS AND DISCUSSION
*CONCLUSIONS
*LITERATURE CITED

In this work we analyze the variability in substitution rates in the GP63 gene from Leishmania. By using a sliding window to estimate substitution rates along the gene, we found that the rate of synonymous substitutions along the GP63 gene is highly correlated with both the rate of amino acid substitution and codon bias. Furthermore, we show that comparisons involving genes that represent independent phylogenetic lines yield very similar divergence/conservation patterns, thus suggesting that deterministic forces (i.e., nonstochastic forces such as selection) generated these patterns. We present evidence indicating that the variability in substitution rates is unambiguously related to functionally relevant features. In particular, there is a clear relationship between rates and the tertiary structure of the encoded protein since all divergent segments are located on the surface of the molecule and facing one side (almost parallel to the cell membrane) on the exposed surface of the organism. Remarkably, the protein segments encoded by these variable regions encircle the active site in a funnel-like distribution. These results strongly suggest that the pattern of nucleotide divergence and, notably, of synonymous divergence is affected by functional constraints.


SYNONYMOUS (silent) substitution rates vary greatly among different genes of a given species (LI et al. 1985 Down; BERNARDI et al. 1993 Down; WOLFE and SHARP 1993 Down). This disparity in synonymous rates has been visualized as the result of variation in the rate and pattern of mutation among different regions of the genome (WOLFE et al. 1989 Down), differences of base composition (MORIYAMA and GOJOBORI 1992 Down; BERNARDI et al. 1993 Down) and of selection for codon usage. Indeed, the rate of synonymous substitutions is inversely proportional to the strength of codon bias in genes from enterobacteria (SHARP and LI 1987A Down), Drosophila (SHARP and LI 1989 Down) Caenorhabditis (STENICO et al. 1994 Down), and Mycobacterium (DE MIRANDA et al. 2000 Down). This relationship between synonymous rates and codon biases is postulated to depend upon selection for increasing the efficiency of translation. Two lines of evidence support this hypothesis. First, highly expressed genes exhibit higher codon biases (GOUY and GAUTIER 1982 Down) and lower synonymous rates than do genes expressed at lower levels. And second, the preferred codons in highly expressed genes (optimal codons) of Escherichia coli (IKEMURA 1981 Down), Saccharomyces cerevisiae (BENNETZEN and HALL 1982 Down; IKEMURA 1982 Down), and Drosophila are recognized by the most abundant tRNAs. Therefore it is to be expected that in highly expressed genes, selection for maintaining optimal codons tends to lower their synonymous rates.

Several authors reported that synonymous and nonsynonymous rates are correlated in genes of mammals (FITCH 1980 Down; GRAUR 1985 Down; LI et al. 1985 Down; WOLFE and SHARP 1993 Down; MOUCHIROUD et al. 1995 Down), bacteria (SHARP and LI 1987A Down; DE MIRANDA et al. 1999), and Drosophila (COMERON and KREITMAN 1998 Down). Several alternative hypotheses have been advanced to explain this correlation. One of them is that a systematic variation in the mutation rates in different regions of the genome is responsible for the variability in substitution rates and that this variability would be the cause of the correlation between synonymous and nonsynonymous distances (WOLFE et al. 1989 Down). However, OHTA and INA 1995 Down have shown that even under the assumption that the variability in synonymous rates reflects the underlying variation in mutation rates, the expected correlation coefficient would be much lower (one-half) than the observed values. In addition, INA 1995 Down has presented evidence showing that the correlation could only be explained by this mutationist hypothesis if the variation in the mutation rate were correlated with functional constraints (i.e., constrained amino acids tend to mutate less). An alternative mutationist hypothesis has been proposed by WOLFE and SHARP 1993 Down. These authors suggested that the correlation could arise as a result of the fixation of doublet mutations, that is, those mutations that affect two consecutive nucleotide positions. Certainly it is plausible that this kind of mutation may contribute to the correlation, since approximately 47% of them produce at the same time a synonymous and a nonsynonymous change. However, strong evidence has been presented against this hypothesis. In the first place, MOUCHIROUD et al. 1995 Down have shown that in genes from mammals the correlation under consideration remains rather high and statistically significant even after removing from the alignments those codons that underwent substitutions in adjacent positions (and that are thus putatively derived from doublet mutations). This hypothesis also has been rejected for Drosophila genes on the basis that the synonymous distances at the third codon position, Ks3, are not correlated with the nonsynonymous distances at the first codon position, Ka1 (COMERON and KREITMAN 1998 Down).

As an alternative to these mutationist hypotheses, two different selectionist hypotheses have been proposed. MOUCHIROUD et al. 1995 Down suggested that similar constraints (i.e., negative selection) acting on synonymous and nonsynonymous mutations could be responsible for the correlation. These authors found that the synonymous rate is gene specific since independent processes of divergence (human-calf; rat-mouse) produce strongly correlated distances. This hypothesis has been gaining support from several lines of evidence. CACCIO et al. 1995 Down reported that in mammalian genes the degree of synonymous divergence in duet codons is correlated with that in quartet codons. In addition, ZOUBAK et al. 1995 Down showed that conserved positions (especially in GC3-rich genes) exhibit a synonymous base composition that differs substantially from what would be expected in sequences subjected to a random substitution process. Positive selection has also been proposed as the responsible factor for the correlation (LIPMAN and WILBUR 1985 Down). According to this view, a nonsynonymous change could favor a synonymous substitution from one nonpreferred codon to a more preferred one for those cases in which the amino acids involved differ in their preference at third codon positions.

The correlation between synonymous and nonsynonymous distances is observed not only when the genes are considered as a whole (i.e., across genes), but also at the intragenic level (i.e., within genes). We have investigated the intragenic variability in synonymous and nonsynonymous distances in genes from mammals and monocots (ALVAREZ-VALIN et al. 1998 Down, ALVAREZ-VALIN et al. 1999 Down). Our results show that the number of genes displaying significant intragenic correlation coefficients is much higher than random expectation. These intragenic correlations can be interpreted in terms of a common constraint hypothesis (negative selection), since this variability in substitution rates is also related to the synonymous GC level, thus suggesting a link between amino acid conservation, synonymous conservation, and codon usage. Similar analyses involving Drosophila genes yielded contradictory results. While no intragenic correlation was found between the synonymous rate and the strength of codon biases in the Xdh gene (COMERON and AGUADE 1996 Down), a strong negative correlation was observed for the gene encoding the large subunit of RNA polymerase II (LLOPART and AGUADE 1999 Down).

SMITH and HURST 1998 Down have proposed a mutationist hypothesis to explain the variability in substitution rates at the intragenic level. According to these authors the distribution of conserved and divergent regions along the gene would reflect the distribution of mutational hotspots, in particular CpG dinucleotides. However, further evidence supporting the common constraint hypothesis to explain the correlation at the intragenic level was recently presented by our group (CHIUSANO et al. 1999 Down). We found that in genes from mammals, the secondary structure of proteins affects both the substitution rates and the base composition at the third position of codons. In this regard, it is worth mentioning that in regions predicted to be {alpha}-helix, ß-sheet, or coil, the rates of both synonymous and nonsynonymous substitutions are significantly different. This suggests that different selective constraints associated with the different kinds of structures are affecting both synonymous and nonsynonymous rates in a similar way.

In this work we have investigated substitution rates at the intragenic level by analyzing the variation in synonymous and nonsynonymous substitutions along the coding sequence of the surface metalloproteinase GP63 from Leishmania. The genus Leishmania, belonging to the family Trypanosomatidae, comprises parasitic protozoa that cause several diseases that affect humans and other mammals. The life cycle of Leishmania includes two stages, promastigotes and amastigotes. The former are inoculated by the sandfly vector into the host skin. After inoculation, promastigotes must survive cellular and humoral immune responses. Finally, promastigotes are phagocytized by macrophages and transformed into amastigotes, the obligate intracellular stage. GP63 plays a pivotal role in this process by facilitating phagocytosis of promastigotes by macrophages (RUSSELL and WILHELM 1986 Down; SOTERIADOU et al. 1992 Down; PUENTES et al. 1999 Down) and by helping them to survive intracellularly in phagolysosomes (CHAUDHURI et al. 1989 Down). As a result of the biological and medical importance of the GP63 protein, a considerable amount of information about its sequence, function, biochemistry, and structure has accumulated in the last few years. This provides an excellent opportunity to conduct an analysis of substitution rates and their possible connection with biologically relevant features of encoded proteins. In particular, we investigated the relationship of the substitution rates along the gene with the three-dimensional structure of the protein. Moreover, to determine whether the intragenic pattern of divergence was conserved, we performed comparisons of independent processes of divergence.


*  MATERIALS AND METHODS
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS AND DISCUSSION
*CONCLUSIONS
*LITERATURE CITED

Sequences data set:
In this work we analyzed a set of 19 genes (listed in Table 1) encoding the surface metalloproteinase (GP63) of Leishmania. This gene belongs to a multigene family that has a variable number of members and a different organization in different Leishmania species (BUTTON et al. 1989 Down). The simplest organization of this multigene family is that observed in Leishmania major (from the Old World, producing cutaneous leishmaniasis) where a cluster of 5 almost identical gene copies is followed by 2 more divergent genes (BUTTON et al. 1989 Down). While in L. donovani this gene family presents an organization similar to that of L. major (WEBB et al. 1991 Down), other species from the same species complex, namely the L. donovani/chagasi/infantum species complex (from the New World, producing visceral leishmaniasis), exhibit a much more complicated organization consisting of three groups of genes arranged in a single cluster. For the case of L. chagasi, this gene cluster contains 18 genes of which 4 tandem copies are expressed in promastigotes in the stationary phase of growth, 12 genes are expressed in the early (logarithmic) phase of growth, and 2 genes are constitutively expressed (ROBERTS et al. 1993 Down). The organization is even more complex in L. guyanensis where several gene clusters that are located in different chromosomes have been described (STEINKRAUS et al. 1993 Down).


 
View this table:
In this window
In a new window

 
Table 1. Leishmania genes analyzed in this work

The sequences were aligned at the amino acid level (translated sequences) using the multiple alignment program CLUSTALW (THOMPSON et al. 1994 Down).

Substitution rate analysis:
The substitution rates along the gene were measured using a sliding window. Pairwise nucleotide distances (synonymous and nonsynonymous) within each window were estimated by the method of COMERON 1995 Down as implemented in the computer program K-estimator. For those windows where the method was inapplicable (due to the negative argument of the logarithm), we used the NEI and GOJOBORI method (1986) with the modifications suggested by ZHANG et al. 1998 Down. This modification corrects for transition/transversion biases and mainly affects the way of counting the number of synonymous and nonsynonymous sites in the third codon positions of twofold degenerate codons. The correction was done according to the transition/transversion ratio observed at the third codon positions of quartet codons. Estimations done using the modified version of the Nei and Gojobori method, as well as the original version of this method, give results almost identical to those obtained by the Comeron method.

The codon adaptation index (CAI) of SHARP and LI 1987B Down was used to measure codon biases. CAI was calculated using the reference set of highly expressed genes presented in ALVAREZ et al. 1994 Down.

Analyses of protein structure:
The crystallographic coordinates of L. major surface glycoprotein (SCHLAGENHAUF et al. 1998 Down; id. code: 1lml) were retrieved from the Protein Data Bank. Specific residues or regions of the gene were localized on the three-dimensional structure of the protein using the computer program Raswin 2.6 (R. Sayle, Glaxo Wellcome Research and Development, Stevenage, Hertfordshire, UK).


*  RESULTS AND DISCUSSION
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS AND DISCUSSION
*CONCLUSIONS
*LITERATURE CITED

Within-gene covariation between synonymous and nonsynonymous substitutions:
Fig 1 shows the variation in the rates of synonymous and nonsynonymous substitutions within metalloproteinase genes. Each value in the graph corresponds to the average, in each window, of all pairwise distances. The first point that is evident from this figure is that the rates of nucleotide substitutions are not approximately uniform for either synonymous or nonsynonymous positions. On the contrary, there are regions of the gene that are rather well conserved, while other regions are much more divergent.



View larger version (19K):
In this window
In a new window
Download PPT slide
 
Figure 1. Profiles of synonymous (thin line) and nonsynonymous (thick line) distances and codon adaptation index (dotted line). The window size used was 30 codons, shifting 1 codon at a time. Each window was labeled according to the codon falling in the middle so that the first window is assigned to codon 15. The correlation coefficients were calculated using nonoverlapping windows to ensure the independence of sampling points.

A second and more striking point is that the profiles of synonymous and nonsynonymous distances exhibit a strong covariation resulting in a very high correlation coefficient (r = 0.87, P {approx} 0). This means that those regions of the gene that are less divergent at the amino acid level are also less divergent at the synonymous level, whereas those sectors of the gene that are not conserved at the amino acid level are also less conserved at the synonymous level. Intragenic correlations between synonymous and nonsynonymous distances have already been described for mammalian and monocot genes (ALVAREZ-VALIN et al. 1998 Down, ALVAREZ-VALIN et al. 1999 Down). However, in no case studied was the correlation as strong as the one observed in the GP63 gene.

A third point is that the profile of the CAI has a pattern of variation that is inverse to those of the substitutions. The regions with higher CAI present low substitution rates, while the regions with lower CAI show higher substitution rates. These covariations are very significant as indicated by their correlation coefficients (r = -0.74, P < 10-3 and r = -0.57, P < 10-2, for synonymous and nonsynonymous substitution rates, respectively). This result indicates that those gene segments with a lower rate of nucleotide substitutions have a high frequency of optimal codons.

The last point is that the rate of synonymous substitutions is extremely low (reaching a value of almost zero in a few windows) in a segment of 30 amino acids (between positions 340 and 370). Moreover, the frequency of major (translationally optimal) codons is very high in this same region. As a consequence the ratio Ka/Ks is very high (>4). Nonsynonymous distances are clearly significantly higher than synonymous ones for almost all pairwise comparisons (t-test). HUGHES and NEI 1988 Down have proposed that this kind of behavior (Ka/Ks > 1) may indicate positive selection. However, contrary to what would be expected under positive selection acting toward increasing amino acid variability, the rate of nonsynonymous substitutions is much lower in this region than in many other regions of the gene where Ka < Ks. Additional evidence against positive selection acting on this segment of the gene is that conservative amino acid changes are more frequent than radical ones (14 radical and 27 conservative amino acid substitutions). Note that if selection were favoring diversity at the protein level, then radical (nonconservative) amino acid changes will occur with a high frequency (see HUGHES 1994 Down and references therein). All these elements taken together suggest that the effect observed in this region (Ka > Ks), rather than being explained by positive selection at the amino acid level, is due to purifying selection acting against synonymous changes to maintain major codons. It is noteworthy that a similar situation has been recently described in the nef-1 gene from HIV-1 (ZANOTTO et al. 1999 Down).

For the purpose of testing if the observed pattern of divergence is governed by a deterministic force, such as natural selection, it becomes necessary to analyze processes of divergence between phylogenetically independent lineages. Therefore, one should know the evolutionary relationships among the genes under study, in order to determine which comparisons are phylogenetically independent. The phylogenetic tree presented in Fig 2A shows that the processes of divergence between the sequences referred to as 1 and 2, 3 and 4, and 5 and 6 are independent since they do not share any common branch. Consequently, we obtained the profiles of synonymous and nonsynonymous distances for the three pairs of homologous genes. Fig 2B shows the variation in the rate of synonymous and nonsynonymous substitutions between sequences 1 and 2, 3 and 4, and 5 and 6, respectively. The similarity among the three pairs of profiles is so evident that it can be appreciated even by visual inspection. A very similar pattern of variation is obtained in other pairwise comparisons with the only exception of those involving couples of genes that are very closely related (data not shown). These results show that the pattern of conservation/divergence along the GP63 gene is indeed due to deterministic forces.



View larger version (19K):
In this window
In a new window
Download PPT slide
 
Figure 2. Comparison of the conservation/divergence patterns among phylogenetically independent lineages. (a) Phylogenetic tree depicting the evolutionary relationships among the genes used in this work. The tree-building method was the neighbor-joining method (SAITOU and NEI 1987 Down) from amino acid distances estimated using the Poisson correction. Numbers near nodes are bootstrap values (1000 pseudoreplicates). Each gene is represented by its GenBank locus name along with the species symbol, where Lg stands for L. guyanensis; Lp, L. panamensis; La, L. amazonensis; Lme, L. mexicana; Lm, L. major; Li, L. infantum; and Ld, L. donovani. (b) Profiles of synonymous (dotted line) and nonsynonymous (continuous line) between the sequences referred to as 1 and 2, 3 and 4, and 5 and 6 in Fig 2A. The correlation coefficients between the profiles of synonymous and nonsynonymous distances (calculated using nonoverlapping windows) are indicated in each case.

Segmental gene conversion and covariation of substitution rates:
Owing to the fact that the genes used in this study belong to a multigene family, it is possible that segmental gene conversion could be responsible for the observed pattern of divergence. Gene conversion is a process of unidirectional transfer of genetic material between members of a multigenic family, in which two homologous sequences interact in such a way that one becomes identical to the other (JACKSON and FINK 1981 Down). Taking into account that, after the occurrence of gene conversion, the converted segments of the interacting genes become identical, it is possible that, when we compare two given members of one of these families, we find patches with very little or no differentiation, both at the synonymous and nonsynonymous levels (these patches correspond to the segments that were converted relatively recently), and patches with greater differentiation.

Therefore, segmental gene conversion might produce an intragenic pattern of nucleotide divergence in which synonymous and nonsynonymous substitutions strongly covary. Nevertheless, it is very unlikely that gene conversion would produce the same or a similar output (i.e., intragenic pattern of divergence) in independent evolutionary lineages, unless segmental gene conversion took place recurrently in the same region of the gene. Therefore, the results presented in the previous section showing that independent processes of divergence yield similar conserved patterns strongly suggest that gene conversion is not responsible for the observed patterns. The assumption that gene conversion produces different patterns of conservation/divergence in independent comparisons is based on what has been observed in the chorion locus from the silkmoth Bombyx mori. This locus contains 15 tandemly arranged gene pairs. When genes from this locus are compared, patches of high similarity and divergence along the gene are observed. Each individual comparison produces a unique pattern of patches with high similarity indicating gene conversion events. In other words, under gene conversion the spatial distribution of patches with high similarity changes from comparison to comparison (EICKBUSH and BURKE 1985 Down, EICKBUSH and BURKE 1986 Down). It is possible to argue that contrary to the situation observed in the silkmoth chorion gene family, gene conversion could be deterministic in the GP63 multigene family. A further step in the analysis was the comparison of the profiles of divergence obtained for Leishmania genes with those obtained for GP63 genes from other Trypanosomatidae. As gene conversion can occur only between gene copies in the same genome, it could explain the observed patterns in comparisons between genes from the same species. It could, however, also affect those comparisons involving paralogous genes from closely related species. The latter possibility remains open because some conversion events could have taken place in the common ancestor of two species, and it would still be possible to detect their footprints if the species under consideration are not too distant. On the other hand, if in comparisons involving genes from more distant species we obtain a similar pattern of conservation/divergence, we can discard gene conversion as the underlying force since the genes under consideration were diverging in different genomes for the majority of the time.

We obtained the profile of synonymous and nonsynonymous divergence between each Leishmania GP63 gene and the only GP63 gene available in Crithidia fasciculata. C. fasciculata is another trypanosomatid that, in contrast to Leishmania, parasitizes only insects. The average profile between Leishmania genes and the C. fasciculata gene (as well as each individual profile) is very similar to the average profile obtained for Leishmania genes alone (Fig 3; Table 2). That is, those regions that are conserved in the Leishmania/Crithidia profiles are also conserved in the profiles from Leishmania alone, while the regions that are more divergent in the Leishmania/Crithidia profiles are also more divergent in the profiles from Leishmania alone. This observation is almost impossible to reconcile with gene conversion even if gene conversion were a deterministic process, because the C. fasciculata gene branches off much before the divergence among Leishmania genes (SCHLAGENHAUF et al. 1998 Down).



View larger version (28K):
In this window
In a new window
Download PPT slide
 
Figure 3. Profiles of synonymous (thin line) and nonsynonymous (thick line) distances in comparisons involving genes from Leishmania and C. fasciculata. (a) Average profiles including Leishmania genes alone. These profiles are similar to those presented in Fig 1 but the 5' region is absent because it is not available in C. fasciculata. (b) Profiles of synonymous and nonsynonymous distances between Leishmania genes and the GP63 gene from C. fasciculata. These profiles are the averages of the pairwise profiles between each gene from Leishmania and the gene from C. fasciculata.


 
View this table:
In this window
In a new window

 
Table 2. Summary of the correlation coefficients between profiles of synonymous and nonsynonymous distances involving Leishmania species alone and the average profile between each Leishmania gene and the C. fasciculata gene

Moreover, comparisons involving GP63 genes from Trypanosoma brucei lead to similar conclusions. Specifically, we found that profiles of synonymous and nonsynonymous divergence between two GP63 genes from T. brucei and also the profile of nonsynonymous distances between T. brucei and Leishmania genes exhibit a very significant correlation with the profiles of synonymous and nonsynonymous distances obtained using only Leishmania genes (not shown).

Relationship between the substitution rates and functional constraints:
The strong correlation between the intragenic distributions of synonymous and nonsynonymous substitutions, distributions that persist when comparisons involving independent processes of divergence are considered, indicates that neither the synonymous nor the nonsynonymous divergence is random in the GP63 gene. Rather, these results show that deterministic forces affect both kinds of nucleotide substitutions in a similar way.

SMITH and HURST 1998 Down have suggested that a putative deterministic force could be the distribution of mutational hotspots along the gene. Certainly, if the localization of these mutational hotspots were conserved among genes, it is then likely that independent processes of divergence would give similar intragenic patterns of synonymous differentiation. However, this hypothesis does not predict any relationship between synonymous and nonsynonymous divergences, contrary to what is observed in the GP63 gene. Moreover, this mutationist hypothesis does not predict any relationship between the substitution rates and features known to be functionally relevant. It is worth mentioning that the gene conversion hypothesis does not predict any relationship between the substitution rates and functionally relevant features either. By contrast, if selection were the deterministic force at work, variation in substitution rates would be correlated with functionally important features.

To test these hypotheses, we first analyzed those amino acid positions directly involved in the catalytic mechanism. These correspond to residues His264, Glu265, His268, His334, and Met345 (MACDONALD et al. 1995 Down). As expected, these catalytic residues are fully conserved at the amino acid level. As far as the synonymous changes in these codons are concerned, only His334 varies while the remaining ones are totally conserved. Residues located in the neighborhood of catalytic sites show important conservation both at the synonymous and nonsynonymous levels. No drastic amino acid changes affect these positions, consistent with the fact that they create the necessary framework for catalysis. Moreover, there is experimental evidence showing that the tetrapeptide SRYD plays an important role in parasite-macrophage interaction, very likely as the adhesion site, helping in parasite internalization (SOTERIADOU et al. 1992 Down; PUENTES et al. 1999 Down). This segment of the protein is very conserved both at the amino acid and at the synonymous level in the corresponding gene. It must be taken into account, however, that all these sites represent a very small fraction of the whole molecule and cannot explain by themselves the global pattern of variation already described.

A second element that should be considered, to explain the patterns of conservation/divergence observed, is the secondary structure of the protein. We failed to detect any obvious relationship between the conservation/divergence pattern and the linear distribution of {alpha}-helices, coils, or ß-strands (not shown). This may seem to contradict our previous results on mammalian genes (CHIUSANO et al. 1999 Down). However, it should be taken into account that a different approach was followed in that case and that a much larger number of codons was analyzed. Indeed, the codons encoding the same structural elements were pooled together irrespective of their location in the tertiary structure. Moreover, given that the differences in substitution rates among the different kinds of secondary elements are not very large, they can become evident only when a large number of codons is used.

To further investigate any possible relationship between the structure and nucleotide distances, we localized those regions of the gene that are more divergent at the synonymous and nonsynonymous level (i.e., the peaks of divergence) on the three-dimensional structure of GP63 (Fig 4). We found that the distribution of the variable regions is clearly nonrandom in three-dimensional space. In the first place, according to what would be expected from structural constraints all divergent segments are located on the surface of the molecule, and none of them participates directly in the structural core of the enzyme. CREIGHTON and DARBY 1989 Down pointed out that since structural constraints act on internal amino acid residues that are important for maintaining the folding of a protein, they are expected to change less than those on the surface. Indeed, KIMURA and OHTA 1973 Down have noted that the rate of amino acid substitutions in hemoglobins is about twice as high at the surface as in buried amino acids. CHOTHIA and LESK 1986 Down found that an exponential relationship exists between changes in surface amino acid residues and buried ones. As for the particular case of the GP63 protein, SCHLAGENHAUF et al. 1998 Down noted that amino acid variability is correlated with structural flexibility. Therefore the results obtained for the GP63 protein are compatible with the idea that the external regions are more free to vary due to their lower structural constraints. However, we would like to stress that while amino acid substitutions and protein structure have already been related, this is the first time that the divergence at both synonymous and nonsynonymous sites is found to be clearly correlated with the protein structure.



View larger version (29K):
In this window
In a new window
Download PPT slide
 
Figure 4. Three different views of the three-dimensional structure of the GP63 protein. Divergent regions are represented by cyan color. The only variable region close to the anchoring site is shown in green. The amino acids composing the catalytic site are represented in green. The Zn atom appears in red. The membrane anchoring tail is represented in red. The amino acids that indicate the beginning of the nonresolved segments of the protein (residues Asn407, Ala412, Ala498, and Ser505) are indicated to show the location of these nonresolved segments in the three-dimensional structure of the protein.

A second and more remarkable observation is that all but one of the variable regions are facing one side of the molecule, opposite to the anchoring site, and thus would be located on the exposed surface of the organism, almost parallel to the cell membrane. The remaining variable region is also on the surface but located beside the anchoring site. Most striking, though, is the fact that these surface variable regions create a kind of "funnel" that ends up at the active site (Fig 4B). Two small segments of the protein (residues 408–411 and 499–504) were not resolved in the crystallographic analysis due to their weak electron density (SCHLAGENHAUF et al. 1998 Down). These nonresolved segments are a part of the variable regions. In spite of this lack of information it is still possible to see in Fig 4 that these variable regions also lie on the upper external surface of the molecule. In summary, it can be stated that nucleotide divergence and protein tertiary structure are clearly related in GP63 proteins.

An interesting aspect that deserves to be mentioned is that some of the variable regions coincide with segments that have been indicated as relevant in eliciting immunological responses. In effect, sera from dogs infected by L. infantum recognize preferentially one of the variable regions located toward the carboxyl end of the GP63 protein (MORALES et al. 1997 Down). Moreover, the same region is reported to induce lymphocytic responses of the Th1 type (involved in cellular immunity) in mice (SOARES et al. 1994 Down) and also to produce high levels of gamma interferon in humans affected by leishmaniasis (RUSSO et al. 1993 Down).


*  CONCLUSIONS
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS AND DISCUSSION
*CONCLUSIONS
*LITERATURE CITED

Two alternative hypotheses have been proposed to explain the variability in nucleotide substitution rates: mutation and selection. In contrasting these hypotheses particular attention has been paid to the fact that synonymous and nonsynonymous substitution rates are correlated in genes from different organisms. In this work we present results that are significant for discriminating these two hypotheses.

Here we show that in the gene encoding the GP63 protein there is a strong intragenic correlation between the rates of synonymous and nonsynonymous changes. We also show that the patterns of variation in substitution rates are clearly reproducible, since comparisons performed on independent lines of divergence yield remarkably similar results. On this basis, as well as on the basis of comparisons involving other trypanosomatid species, it is possible to state that only a deterministic force may produce such a reproducible pattern, thus ruling out segmental gene conversion as a possible explanation of this phenomenon.

The results of different approaches taken to differentiate between mutationist and selectionist hypotheses unambiguously favor the latter type of explanation. The fact that codon positions known to be functionally relevant (such as those encoding residues that participate in catalytic activity) are fully conserved is in agreement with the selectionist point of view. More significant yet is the fact that the substitution rates are clearly correlated with the three-dimensional structure of the encoded protein. These two observations are impossible to reconcile with a mutationist hypothesis.

The common constraint hypothesis (negative selection) proposed by MOUCHIROUD et al. 1995 Down may explain the variability in synonymous and nonsynonymous substitution rates as well as the correlation between these rates. Under this hypothesis, the divergent regions would be less constrained from the functional standpoint and consequently they are expected to be more free to vary. The correlation would be caused by common constraints affecting both synonymous and nonsynonymous changes. The constraining force could be translational accuracy, since this force may affect synonymous variation on constrained amino acids in order to maintain a strong codon bias. In this regard it is worth mentioning that evidence has been presented showing that functionally important amino acids tend to be preferentially encoded by major codons, a preference that has been attributed to the reduction of error rates during peptide elongation (AKASHI 1994 Down). Moreover, there is experimental evidence showing that in E. coli genes the replacement of an optimal codon by a minor (synonymous) one produces an almost 10-fold increase in the rate of translational errors at the amino acid where the replacement occurred (PRECUP and PARKER 1987 Down). As a consequence, in those regions of the gene that are conserved at the amino acid level (i.e., encoding putatively important amino acids), one would expect conservation of translational optimal codons resulting in turn in lower synonymous rates. This hypothesis is in keeping with our results since in the GP63 gene conserved regions also exhibit a higher frequency of translationally optimal codons.

It is difficult, however, to explain, on the basis of negative selection as dictated by structural constraints, why, among the external regions, those located around the active site are the most variable. One possible explanation is that the lateral external regions of the protein participate in side-to-side interactions with other membrane proteins or with other molecules, and therefore they are not as free to vary as are the segments located in the upper region. Nevertheless, it is not possible to rule out that positive selection may also contribute to this behavior, since the localization of the variable regions suggests that they might participate in ligand interactions. In particular, it is likely that these regions participate in the docking of different protein substrates. Consequently the different variants might represent enzymes exhibiting variable substrate affinities. In this respect it is worth mentioning that accelerated amino acid substitution rates have been observed in protein regions that participate in ligand interactions. Examples of this are the complementarity-determining regions of immunoglobulins (TANAKA and NEI 1989 Down), T-cell receptors and peptide-binding regions in the MHC genes (HUGHES and YEAGER 1998 Down). Nevertheless, we could not find any evidence of positive selection acting to increase amino acid diversity in these external regions (i.e., Ka > Ks). Innovative approaches are needed to assess whether positive selection is also contributing to the pattern of divergence in the GP63 gene family.

Manuscript received February 2, 2000; Accepted for publication April 21, 2000.
*  LITERATURE CITED
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS AND DISCUSSION
*CONCLUSIONS
*LITERATURE CITED

AKASHI, H., 1994  Synonymous codon usage in Drosophila melanogaster, natural selection and translational accuracy. Genetics 136:927-935[Abstract].

ALVAREZ, F., C. ROBELLO, and M. VIGNALI, 1994  Evolution of codon usage and base contents in kinetoplastid protozoans. Mol. Biol. Evol. 11:790-802[Abstract].

ALVAREZ-VALIN, F., K. JABBARI, and G. BERNARDI, 1998  Synonymous and nonsynonymous substitutions in mammalian genes: intragenic correlations. J. Mol. Evol. 46:37-44[Medline].

ALVAREZ-VALIN, F., K. JABBARI, N. CARELS, and G. BERNARDI, 1999  Synonymous and nonsynonymous substitutions in genes from Gramineae: intragenic correlations. J. Mol. Evol. 49:330-342[Medline].

BENNETZEN, J. L. and B. D. HALL, 1982  Codon selection in yeast. J. Biol. Chem. 257:3026-3031[Abstract/Free Full Text].

BERNARDI, G., D. MOUCHIROUD, and C. GAUTIER, 1993  Silent substitutions in mammalian genomes and their evolutionary implications. J. Mol. Evol. 37:583-589[Medline].

BUTTON, L., D. G. RUSSELL, H. L. KLEIN, E. MEDINA-ACOSTA, and R. E. KARESS et al., 1989  Genes encoding the major surface glycoprotein in Leishmania are tandemly linked at a single chromosomal locus and are constitutively transcribed. Mol. Biochem. Parasitol. 32:271-284[Medline].

CACCIÒ, S., S. ZOUBAK, G. D'ONOFRIO, and G. BERNARDI, 1995  Nonrandom frequency patterns of synonymous substitutions in homologous mammalian genes. J. Mol. Evol. 40:280-292[Medline].

CHAUDHURI, G., M. CHAUDHURI, A. PAN, and K. P. CHANG, 1989  Surface acid proteinase (gp63) of Leishmania mexicana. A metalloenzyme capable of protecting liposome-encapsulated proteins from phagolysosomal degradation by macrophages. J. Biol. Chem. 264:7483-7489[Abstract/Free Full Text].

CHIUSANO, M. L., G. D'ONOFRIO, F. ALVAREZ-VALIN, K. JABBARI, and G. COLONNA et al., 1999  Correlations of nucleotide substitution rates and base composition of mammalian coding sequences with protein structure. Gene 238:23-31[Medline].

CHOTHIA, C. and A. M. LESK, 1986  The relation between the divergence of sequence and structure in proteins. EMBO J. 5:823-826[Medline].

COMERON, J. and M. AGUADÉ, 1996  Synonymous substitutions in the Xdh gene of Drosophila: heterogeneous distribution along the coding region. Genetics 144:1053-1062[Abstract].

COMERON, J. M., 1995  A method for the number of synonymous and nonsynonymous substitutions per site. J. Mol. Evol. 41:1152-1159[Medline].

COMERON, J. M. and M. KREITMAN, 1998  The correlation between synonymous and nonsynonymous substitutions in Drosophila: mutation, selection or relaxed constraints? Genetics 150:767-775[Abstract/Free Full Text].

CREIGHTON, T. E. and N. J. DARBY, 1989  Functional evolutionary divergence of proteolytic enzymes and their inhibitors. Trends Biochem. Sci. 14:319-324[Medline].

DE MIRANDA, A. B., F. ALVAREZ-VALIN, K. JABBARI, W. M. DEGRAVE, and G. BERNARDI, 2000  Gene expression, amino acid conservation and hydrophobicity are the main factors shaping codon preferences in Mycobacterium tuberculosis and Mycobacterium leprae.. J. Mol. Evol. 50:45-55[Medline].

EICKBUSH, T. H. and W. D. BURKE, 1985  Silkmoth chorion gene families contain patchwork patterns of sequence homology. Proc. Natl. Acad. Sci. USA 82:2014-2018[Abstract/Free Full Text].

EICKBUSH, T. H. and W. D. BURKE, 1986  The silkmoth late chorion locus. II. Gradients of gene conversion in two paired multigene families. J. Mol. Biol. 190:357-366[Medline].

FITCH, W. M., 1980  Estimating the total number of nucleotide substitutions since the common ancestor of a pair of genes: comparison of several methods and three beta hemoglobin messenger RNA's. J. Mol. Evol. 16:153-209[Medline].

GOUY, M. and C. GAUTIER, 1982  Codon usage in bacteria: correlation with gene expressivity. Nucleic Acids Res. 10:7055-7074[Abstract/Free Full Text].

GRAUR, D., 1985  Aminoacid composition and the evolutionary rate of proteins. J. Mol. Evol. 22:53-62[Medline].

HUGHES, A. L., 1994  The evolution of functionally novel proteins after gene duplication. Proc. R. Soc. Lond. Ser. B 256:119-124[Medline].

HUGHES, A. L. and M. NEI, 1988  Pattern of nucleotide substitution at major histocompatibility complex loci reveals overdominant selection. Nature 335:167-170[Medline].

HUGHES, A. L. and M. YEAGER, 1998  Natural selection at major histocompatibility complex loci of vertebrates. Annu. Rev. Genet. 32:415-435[Medline].

IKEMURA, T., 1981  Correlation between the abundance of Escherichia coli transfer RNAs and the occurrence of the respective codons in proteins genes. J. Mol. Biol. 146:1-21[Medline].

IKEMURA, T., 1982  Correlation between the abundance of yeast transfer RNAs and the occurrence of the respective codons in proteins genes. J. Mol. Biol. 158:573-587[Medline].

INA, Y., 1995 Correlation between synonymous and non synonymous substitutions and variation in the synonymous substitution numbers, pp. 105–113 in Current Topics on Molecular Evolution, edited by M. NEI and N. TAKAHATA. Institute of Molecular Evolutionary Genetics, Penn State University, University Park, PA.

JACKSON, J. A. and G. R. FINK, 1981  Gene conversion between duplicated genetic elements in yeast. Nature 292:306-311[Medline].

KIMURA, M. and T. OHTA, 1973  Mutation and evolution at the molecular level. Genetics 73(Suppl.):19-35.

LI, W. H., C. I. WU, and C. C. LUO, 1985  A new method for estimating synonymous and nonsynonymous rates of nucleotide substitutions considering the relative likelihood of nucleotide codon changes. Mol. Biol. Evol. 2:150-174[Abstract].

LIPMAN, D. J. and W. J. WILBUR, 1985  Interaction of silent and replacement changes in eukaryotic coding sequences. J. Mol. Evol. 21:161-167.

LLOPART, A. and M. AGUADÉ, 1999  Synonymous rates in the RpII215 gene of Drosophila: variation among species and across the coding region. Genetics 152:269-280[Abstract/Free Full Text].

MACDONALD, M. H., C. J. MORRISON, and W. R. MCMASTER, 1995  Analysis of the active site and activation mechanism of the Leishmania surface metalloproteinase GP63. Biochim. Biophys. Acta 1253:199-207[Medline].

MORALES, G., G. CARRILLO, J. M. REQUENA, F. GUZMAN, and L. C. GOMEZ et al., 1997  Mapping of the antigenic determinants of the Leishmania infantum gp63 protein recognized by antibodies elicited during canine visceral leishmaniasis. Parasitology 114:507-516.

MORIYAMA, E. N. and T. GOJOBORI, 1992  Rates of synonymous substitutions and base composition of nuclear genes in Drosophila.. Genetics 130:855-864[Abstract].

MOUCHIROUD, D., C. GAUTIER, and G. BERNARDI, 1995  Frequencies of synonymous substitutions in mammals are gene-specific and correlated with frequencies of non-synonymous substitutions. J. Mol. Evol. 40:107-113[Medline].

NEI, M. and T. GOJOBORI, 1986  Simple methods for estimating the number of synonymous and nonsynonymous nucleotide substitutions. Mol. Biol. Evol. 3:418-426[Abstract].

OHTA, T. and Y. INA, 1995  Variation in synonymous substitutions rates among mammalian genes and correlations between synonymous and nonsynosymous divergences. J. Mol. Evol. 41:717-720[Medline].

PRECUP, J. and J. PARKER, 1987  Missense misreading of asparagine codons as a function of codon identity and context. J. Biol. Chem. 262:11351-11356[Abstract/Free Full Text].

PUENTES, F., F. GUZMAN, V. MARIN, C. ALONSO, and M. E. PATARROYO et al., 1999  Leishmania: fine mapping of the Leishmanolysin molecule's conserved core domains involved in binding and internalisation. Exp. Parasitol. 93:7-22[Medline].

ROBERTS, S. C., K. G. SWIHART, M. W. AGEY, R. RAMAMOORTHY, and M. E. WILSON et al., 1993  Sequence diversity and organization of the msp gene family encoding gp63 of Leishmania chagasi.. Mol. Biochem. Parasitol. 62:157-171[Medline].

RUSSELL, D. G. and H. WILHELM, 1986  The involvement of the major surface glycoprotein (gp63) of Leishmania promastigotes in attachment to macrophages. J. Immunol. 136:2613-2621[Abstract].

RUSSO, D. M., A. JARDIM, E. M. CARVALHO, P. R. SLEATH, and R. J. ARMITAGE et al., 1993  Mapping human T cell epitopes in Leishmania gp63. Identification of cross-reactive and species-specific epitopes. J. Immunol. 150:932-939[Abstract].

SAITOU, N. and M. NEI, 1987  The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4:406-425[Abstract].

SCHLAGENHAUF, E., R. ETGES, and P. METCALF, 1998  The crystal structure of the Leishmania major surface proteinase leishmanolysin (gp63). Structure 6:1035-1046[Medline].

SHARP, P. M. and W. H. LI, 1987a  The rate of synonymous substitution in enterobacterial genes is inversely related to codon usage bias. Mol. Biol. Evol. 4:222-230[Abstract].

SHARP, P. M. and W. H. LI, 1987b  The codon adaptation index—a measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Res. 15:1281-1295[Abstract/Free Full Text].

SHARP, P. M. and W. H. LI, 1989  On the rate of DNA sequence evolution in Drosophila.. J. Mol. Evol. 28:398-402[Medline].

SMITH, N. G. and L. D. HURST, 1998  Molecular evolution of an imprinted gene: repeatability of patterns of evolution within the mammalian insulin-like growth factor type II receptor. Genetics 150:823-833[Abstract/Free Full Text].

SOARES, L. R., E. E. SERCARZ, and A. MILLER, 1994  Vaccination of the Leishmania major susceptible BALB/c mouse. I. The precise selection of peptide determinant influences CD4+ T cell subset expression. Int. Immunol. 6:785-794[Abstract/Free Full Text].

SOTERIADOU, K. P., M. S. REMOUNDOS, M. C. KATSIKAS, A. K. TZINIA, and V. TSIKARIS et al., 1992  The Ser-Arg-Tyr-Asp region of the major surface glycoprotein of Leishmania mimics the Arg-Gly-Asp-Ser cell attachment region of fibronectin. J. Biol. Chem. 267:13980-13985[Abstract/Free Full Text].

STEINKRAUS, H. B., J. M. GREER, D. C. STEPHENSON, and P. J. LANGER, 1993  Sequence heterogeneity and polymorphic gene arrangements of the Leishmania guyanensis gp63 genes. Mol. Biochem. Parasitol. 62:173-185[Medline].

STENICO, M., A. T. LLOYD, and P. M. SHARP, 1994  Codon usage in Caenorhabditis elegans: delineation of translational selection and mutational biases. Nucleic Acids Res. 22:2437-2446[Abstract/Free Full Text].

TANAKA, T. and M. NEI, 1989  Positive darwinian selection observed at the variable-region genes of immunoglobulins. Mol. Biol. Evol. 6:447-459[Abstract].

THOMPSON, J. D., D. G. HIGGINS, and T. J. GIBSON, 1994  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positions-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22:4673-4680[Abstract/Free Full Text].

WEBB, J. R., L. L. BUTTON, and W. R. MCMASTER, 1991  Heterogeneity of the genes encoding the major surface glycoprotein of Leishmania donovani.. Mol. Biochem. Parasitol. 48:173-184[Medline].

WOLFE, K. H. and P. M. SHARP, 1993  Mammalian gene evolution: nucleotide sequence divergence between mouse and rat. J. Mol. Evol. 37:441-456[Medline].

WOLFE, K. H., P. M. SHARP, and W. H. LI, 1989  Mutation rates differ among regions of the mammalian genome. Nature 337:283-285[Medline].

ZANOTTO, P. M., E. G. KALLAS, R. F. DE SOUZA, and E. C. HOLMES, 1999  Genealogical evidence for positive selection in the nef gene of HIV-1. Genetics 153:1077-1089[Abstract/Free Full Text].

ZHANG, J., H. F. ROSENBERG, and M. NEI, 1998  Positive Darwinian selection after gene duplication in primate ribonuclease genes. Proc. Natl. Acad. Sci. USA 95:3708-3713[Abstract/Free Full Text].

ZOUBAK, S., G. D'ONOFRIO, S. CACCIÒ, and G. BERNARDI, G. BERNARDI, 1995  Specific compositional patterns of synonymous positions in homologous mammalian genes. J. Mol. Evol. 40:293-307[Medline].




This article has been cited by other articles:


Home page
J. Virol.Home page
M. Costa-Mattioli, J. Cristina, H. Romero, R. Perez-Bercof, D. Casane, R. Colina, L. Garcia, I. Vega, G. Glikman, V. Romanowsky, et al.
Molecular Evolution of Hepatitis A Virus: a New Classification Based on the Complete VP1 Protein
J. Virol., August 12, 2002; 76(18): 9516 - 9525.
[Abstract] [Full Text] [PDF]