Genetics, Vol. 156, 1157-1167, November 2000, Copyright © 2000

Two Genes Become One: The Genes Encoding Heterochromatin Protein SU(VAR)3-9 and Translation Initiation Factor Subunit eIF-2{gamma} Are Joined to a Dicistronic Unit in Holometabolic Insects

Veiko Kraussa and Gunter Reutera
a Institute of Genetics, Martin Luther University Halle-Wittenberg, D-06108 Halle, Germany

Corresponding author: Veiko Krauss, Department of Genetics, University of Leipzig, D-04103 Leipzig, Johannisallee 21–23, Germany., krauss{at}rz.uni-leipzig.de (E-mail)

Communicating editor: J. A. BIRCHLER


*  ABSTRACT
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

The Drosophila suppressor of position-effect variegation Su(var)3-9 encodes a heterochromatin-associated protein that is evolutionarily conserved. In contrast to its yeast and mammalian orthologs, the Drosophila Su(var)3-9 gene is fused with the locus encoding the {gamma} subunit of translation initiation factor eIF2. Synthesis of the two unrelated proteins is resolved by alternative splicing. A similar dicistronic Su(var)3-9/eIF-2{gamma} transcription unit was found in Clytus arietis, Leptinotarsa decemlineata, and Scoliopterix libatrix, representing two different orders of holometabolic insects (Coleoptera and Lepidoptera). In all these species the N terminus of the eIF-2{gamma}, which is encoded by the first two exons, is fused to SU(VAR)3-9. In contrast to Drosophila melanogaster, RT-PCR analysis in the two coleopteran and the lepidopteran species demonstrated the usage of a nonconserved splice donor site located within the 3' end of the SU(VAR)3-9 ORF, resulting in removal of the Su(var)3-9-specific stop codon from the mRNA and complete in-frame fusion of the SU(VAR)3-9 and eIF-2{gamma} ORFs. In the centipede Lithobius forficatus eIF-2{gamma} and Su(var)3-9 are unconnected. Conservation of the dicistronic Su(var)3–9/eIF-2{gamma} transcription unit in the studied insects indicates its origin before radiation of holometabolic insects and represents a useful tool for molecular phylogenetic analysis in arthropods.


DURING recent years molecular analysis has revealed several interesting exceptional types of gene organization in eukaryotes (BLUMENTHAL 1998 Down). These include cases where several genes are arranged into a multicistronic unit expressed from a single promoter. The polycistronic pre-mRNA is processed by alternative or trans-splicing to monocistronic mRNAs. In Caenorhabditis elegans ~25% of the genes are estimated to be contained in polycistronic units (ZORIO et al. 1994 Down). Dicistronic transcription units that are resolved by alternative splicing were documented in C. elegans unc-17/cha-1, Drosophila ChAT/VAChT, and mammalian UOG-1/GDF-1 (LEE 1991 Down; ALFONSO et al. 1994 Down; KITAMOTO et al. 1998 Down). A similar gene structure was found for the sesB/Ant2 genes in Drosophila (ZHANG et al. 1999 Down). In contrast the Adh/Adhr locus of Drosophila melanogaster is transcribed as one dicistronic mRNA (BROGNA and ASHBURNER 1997 Down).

Here we describe another dicistronic transcription unit that evolved by the fusion of two functionally unrelated genes: the suppressor of position-effect variegation Su(var)3-9 and the gene encoding the {gamma} subunit of translation initiation factor eIF-2. The eIF-2{gamma} protein binds GTP and the initiator MettRNA at translation initiation (ERICKSON and HANNIG 1996 Down) whereas SU(VAR)3-9 is a heterochromatin-associated protein involved in chromatin condensation (TSCHIERSCH et al. 1994 Down; SCHOTTA and REUTER 2000 Down). Heterochromatin-associated proteins (HPs) were first identified by dominant modifier mutations of position-effect variegation (PEV) in Drosophila (REUTER and SPIERER 1992 Down; WEILER and WAKIMOTO 1995 Down; WALLRATH 1998 Down). Three different HPs have been studied in more detail: the HP1 chromo domain (JAMES and ELGIN 1986 Down; EISSENBERG et al. 1992 Down), the SU(VAR)3-7 zinc finger (CLEARD et al. 1997 Down), and the SU(VAR)3-9 chromo and SET domain protein (TSCHIERSCH et al. 1994 Down). HP1 and SU(VAR)3-9 are evolutionarily conserved. Su(var)3-9 orthologs were identified in Schizosaccharomyces pombe (Clr4; IVANOVA et al. 1998 Down) and in mice and humans (SUV39h1/SUV39H1; AAGAARD et al. 1999 Down). Molecular analysis of the Su(var)3-9 genomic region in Drosophila revealed a unique gene structure (TSCHIERSCH et al. 1994 Down). The D. melanogaster SU(VAR)3-9 sequences are encoded by a specific exon within intron 2 of eIF-2{gamma}. The mRNAs of the two genes are produced by alternative splicing. SU(VAR)3-9 becomes fused to the N-terminal 80 amino acids of eIF-2{gamma} encoded by the first two exons. This unusual gene structure is not found in fission yeast or mammals. Human eIF–2{gamma} and SUV39H1 are both X chromosomal genes but are located at distant regions (Xp21 and Xp11.2, respectively; GERAGHTY et al. 1993 Down; EHRMANN et al. 1998 Down).

Here we show that the fusion of Su(var)3-9 and eIF-2{gamma} into a dicistronic transcription unit is conserved in holometabolic insects. It is not found in the centipede Lithobius forficatus. In D. melanogaster SU(VAR)3-9 is fused to the 80 N-terminal amino acids of eIF-2{gamma} whereas in the coleopteran and lepidopteran species studied in addition alternative splicing at a nonconserved splice donor site at the 3' end of the SU(VAR)3-9 open reading frame (ORF) results in a complete fusion of the SU(VAR)3-9 and eIF-2{gamma} ORFs.


*  MATERIALS AND METHODS
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

Study specimens:
D. erecta was received from Umea European Drosophila Stock Center. Adult specimens of Leptinotarsa decemlineata and L. forficatus were captured around Halle (Sachsen-Anhalt, Germany). Adult Clytus arietis and Scoliopterix libatrix were captured near Nebra (Sachsen-Anhalt) and Ruhla (Thüringen, Germany), respectively. From each species one individual was used for RNA isolation (Trizol reagent; Life Technologies) and RT-PCR analysis. Two other individuals from each species were used for independent DNA isolation and PCR analysis of genomic fragments to evaluate DNA polymorphisms.

Sampling DNA sequences:
Sequences of primers and their positions within the corresponding genes can be found in Table 1 and Fig 1, respectively. Su(var)3-9-specific sequences of ~1 kb length were isolated with the degenerated primer pair chromo800 and chromo1790 from C. arietis and S. libatrix. Partial eIF-2{gamma} cDNAs from C. arietis, S. libatrix, L. decemlineata, and L. forficatus were isolated by reverse transcriptase (RT)-PCR using degenerate primers EF120 and EF440. Thereafter, for each of the studied species, a separate sequence sampling strategy was applied: In Scoliopterix, genomic PCR fragments were generated using the primer pairs EF120/Sco4-1 and Sco4-3/EF440. The genomic region including exons 1 and 2 was amplified by inverse PCR using primers Sco1-5, Sco2, and Sco4-3. The exon-intron structure at the 5' end of Scoliopterix eIF-2{gamma} was determined after RT-PCR using the primers Sco5-1, Sco5-2 (5' to the putative translation start point), and primer Sco8-2 (inside exon 3). Similarly, the splicing of Su(var)3-9 was determined by RT-PCR using the primers Sco5-1, Sco5-2, Sco1-4, and Sco4-2 for the 5' end; and Sco4-3, Sco4-4, Sco8-1, and Sco8-2 at the 3' end of the Su(var)3-9-specific exon (2a). 5' rapid amplification of cDNA ends (RACE) and 3' RACE products could not be generated from this species.



View larger version (17K):
In this window
In a new window
Download PPT slide
 
Figure 1. The Su(var)3-9/eIF-2{gamma} dicistronic transcription unit of four insect species and the eIF-2{gamma} gene of the centipede Lithobius. The commonly accepted phylogenetic relationships (FORTEY and THOMAS 1998 Down) of the analyzed arthropod species are indicated by the tree. Dm, D. melanogaster; Sl, S. libatrix; Ca, C. arietis; Ld, L. decemlineata; Lf, L. forficatus; and pA, poly(A) tail. The nomenclature of the exon-intron structure is given at the top. Introns are in open boxes and exons are in solid boxes. The eIF-2{gamma} transcript part of L. decemlineata containing undetermined intron positions is in a shaded box. All eIF-2{gamma} exons are in yellow boxes, the chromo domain coding regions are indicated in red, the SAC domain coding regions are in green, and the SET domain coding regions are marked blue. Positions and directions of the used degenerate PCR primers are shown at the D.melanogaster gene structure. Positions of the used species-specific primers are given under each exon-intron scheme. Downstream-directed oligonucleotides are red and upstream-directed oligonucleotides are blue.


 
View this table:
In this window
In a new window

 
Table 1. Primers used for PCR analysis

In Clytus, genomic PCR fragments were amplified using the primer pairs EF120/Clyrev1 and Cly3-9-4/EF440. The sequence of the eIF-2{gamma} transcript was determined by overlapping 5' RACE (primers ClyEF2 and ClyEF3) and 3' RACE (primers Cly5, ClyEF1, and ClyEF4). With the help of primer pairs ClyEF4/ClyEF8 and ClyEF5/ClyEF7, the position and sizes of introns 3, 4, 5, and 6 were determined. The exon-intron structure of Su(var)3-9 was analyzed by RT-PCR using primer pairs Cly5/Clyrev1, Cly3-9-4/ClyEF2, and Cly3-9-5/ClyEF7.

In Leptinotarsa, the sequence of the eIF-2{gamma}-transcript was determined with overlapping 5' RACE (primers Lep2 and Lep7) and 3' RACE (primers Lep5, Lep1, and Lep15) experiments. The sequence of exons 1 and 2 and of introns 1 and 2a, as well as the 5' part of the Su(var)3-9-specific exon 2a were determined using inverse PCR. Circularized genomic DNA was amplified using primer Lepinv in combination with Lep8, Lepint1, Lep9, Lep10, Lep11, and Lep12. The exon-intron structure of Su(var)3-9 was determined by RT-PCR using primer pairs Lep5/Lep13, Lep14/Lep2, and Lep17/Lep16.

For the centipede L. forficatus a 1.7-kb genomic and a 320-bp cDNA product were amplified by genomic PCR and RT-PCR using primer pair EF120/EF440.

Sequences were determined by direct sequencing or sequencing of two or three independent clones from different genomic PCR reactions. In addition, all transcribed regions were sequenced as RT-PCR products (directly or as a clone). Therefore, the transcript sequences were determined from three independent specimens of each species. PCR fragments were cloned using the pGEM-T PCR cloning kit (Promega, Madison, WI) according to the manufacturer's conditions.

5' RACE, 3' RACE, and RT-PCR:
5' RACE experiments were carried out on 1–2 µg total RNA using the 5' RACE kit version 2 (Life Technologies) according to the manufacturer's instructions. 3' RACE was carried out on 1–2 µg total RNA with 200 units M-MLV reverse transcriptase (Life Technologies). For first-strand cDNA synthesis a poly(T)-primer with anchor sequences was used. After second-strand synthesis cDNA was amplified with anchor oligo and gene-specific primers. The 3' RACE amplification was performed with a nested gene-specific primer. Reverse transcription of total RNA was carried out with M-MLV reverse transcriptase (Life Technologies) and subsequent PCR was done using standard conditions.

Sequencing:
PCR products from 5' RACE, 3' RACE, and RT-PCR were gel eluted and directly sequenced using an ABI 377 sequencer. Genomic PCR fragments from D. erecta and products of inverse genomic PCR were directly sequenced. For DNA sequence analysis, MacVector 5.0 (Oxford Molecular, Palo Alto, CA) software was used. The GenBank/EMBL numbers of the genomic and cDNA sequences are, for D. melanogaster, AJ290956; D. erecta, AJ290957; L. forficatus, AJ290958; S. libatrix, AJ290959 and AJ290960; C. arietus, AJ290961, AJ290962, and AJ290963; and L. decemlineata, AJ290964 and AJ290965.

Phylogenetic analysis:
Multiple protein sequence alignments were performed by means of the ClustalW algorithm of the SeqPup program (D. G. Gilbert), with corrections made by eye. For phylogenetic analysis we used the maximum-likelihood tree reconstruction method of the PUZZLE 4.0 program (STRIMMER and HAESELER 1997 Down) on the basis of the JTT model of amino acid substitution (JONES et al. 1992 Down). Quartet puzzling (1000 steps) was performed to infer support values for internal branches. Trees were drawn using Treeview (R. Page). The alignments were also analyzed by bootstrapping (branch-and-bound search method, 100 steps) with the maximum parsimony algorithm of PAUP 3.1 (SWOFFORD 1993 Down). Alignments are accessible at http://www.uni-leipzig.de/~genetics/ForschVK.htm.


*  RESULTS
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

The Su(var)3-9/eIF-2{gamma} dicistronic transcription unit of holometabolic insects:
The fusion of the two functionally unrelated Su(var)3-9 and eIF-2{gamma} genes into a dicistronic transcription unit was first indicated in D. melanogaster (TSCHIERSCH et al. 1994 Down). The eIF-2{gamma} locus of D. melanogaster consists of five exons (Fig 1). Most of the ORF of the SU(VAR)3-9 heterochromatin protein is situated within the large intron 2 of eIF-2{gamma}. 5' RACE experiments proved that both the Su(var)3-9 and eIF-2{gamma} mRNAs use the same transcription initiation site 109 bp before the translation initiation codon of both proteins (accession no. AJ290956), suggesting that the two mRNAs for the unrelated proteins are transcribed as a single pre-mRNA, which is alternatively spliced. The two genes share a common promoter and both mRNAs contain the first two exons, resulting in an 80-amino-acid N-terminal extension of SU(VAR)3-9 in D. melanogaster.

An identical genomic organization of Su(var)3-9/eIF-2{gamma} was found in D. erecta, which is 9–15 million years evolutionarily distant (POWELL 1997 Down). To study the evolutionary conservation of this unusual gene structure within insects, we analyzed C. arietis, L. decemlineata (both Coleoptera), and S. libatrix (Lepidoptera). Genomic and cDNA fragments of the Su(var)3-9/eIF-2{gamma} region in these species were isolated by a combination of genomic PCR using degenerated primers, inverse genomic PCR, 5' and 3' RACE, as well as RT-PCR (cf. MATERIALS AND METHODS).

Sequence analysis of these DNA fragments proved the location of the SU(VAR)3-9 ORF within intron 2 of eIF-2{gamma} in all these species (Fig 1). In 5' RACE experiments the transcription initiation site of eIF-2{gamma} was determined in Clytus and Leptinotarsa. Sequence comparison with Su(var)3-9-specific RT-PCR fragments showed that exons 1 and 2 are present in both the Su(var)3-9 and the eIF-2{gamma} mRNAs.

The cDNA sequences of eIF-2{gamma} were completed with 3' RACE or RT-PCR experiments. Surprisingly, cDNA sequences that cover exon 2a [Su(var)3-9] and exon 3 (eIF-2{gamma}) could be amplified by RT-PCR in all studied coleopteran and lepidopteran species. Direct sequencing of these RT-PCR amplificates yielded the interesting result that in all the non-Drosophilid species the Su(var)3-9 exon 2a was fused in-frame with exon 3 of eIF-2{gamma}. This was proved by RT-PCR for every species using two different primer pairs. First, the region between the SET domain-encoding sequence of Su(var)3-9 and exon 3 of eIF-2{gamma} was amplified. Afterward, for the two coleopteran species primer pairs were used that amplified the sequence of the fusion transcript between Su(var)3-9 exon 2a (Cly3-9-5 or Lep17 and ClyEF7 or Lep16, respectively) and the 3' untranslated region of eIF-2{gamma} (cf. Fig 1). In C. arietis beside the Su(var)3-9-eIF-2{gamma} in-frame fusion transcript, a polyadenylated Su(var)3-9 transcript variant not fused with eIF-2{gamma} exons 3–7 has been detected in 3' RACE experiments. Within Su(var)3-9 exon 2a in each of the studied species a specific 5' splice site is found (Fig 2). The in-frame fusion of the SU(VAR)3-9 and eIF-2{gamma} ORFs results in small terminal deletions in the ORF of exon 2a. Furthermore, in S. libatrix two different amplificates were isolated, indicating the usage of two alternative 5' splice sites (Fig 2). This could be proved by an RT-PCR plot using primers Sco4-4/Sco8-2 for PCR and a 300-bp genomic fragment covering exon 3 within eIF-2{gamma} for hybridization (Fig 3). According to the Drosophila 5' splice site scoring table (MOUNT 1993 Down), the best possible splice site between the SET domain and the conserved stop is used. In the Drosophila species studied no splice consensus sequence is found in the corresponding region of Su(var)3-9. Remarkably, only three single nucleotide substitutions between Clytus and Leptinotarsa cause the use of a slightly differently positioned 5' splice site.



View larger version (26K):
In this window
In a new window
Download PPT slide
 
Figure 2. Splicing of intron 2b in Su(var)3-9/eIF-2{gamma}. For nomenclature see Fig 1. Specific exons of Su(var)3-9 are green; exons of eIF-2{gamma} are yellow. The out-of-frame ORF of the pink boxed eIF-2{gamma} exon 3 sequence in Scoliopterix is in parentheses. Note that exon 3 in Scoliopterix is identical in both cases, but used in different reading frames. Also in Clytus two splice variants of Su(var)3-9 exist, one with and one without exons 3–7.



View larger version (54K):
In this window
In a new window
Download PPT slide
 
Figure 3. RT-PCR amplification of Scoliopterix Su(var)3-9 and eIF-2{gamma} transcripts. Oligo(dT)-primed cDNA from adults was amplified by using the Su(var)3-9-specific primer pair Sco4-4 and Sco8-2 or the eIF-2{gamma}-specific primer pair Sco5-2 and Sco8-2, respectively. The amplified regions span the intron 2b of Su(var)3-9 or introns 1 and 2 of eIF-2{gamma}, respectively. Fragments of 451 and 264 bp correspond to the major and minor splice variants of Su(var)3-9. The 351-bp fragment represents the transcript of eIF-2{gamma}. The RT-PCR reaction separated on a 1% agarose gel was hybridized after blotting with the indicated eIF-2{gamma}-exon3-specific 300-bp fragment. The three additional fragments in the Su(var)3-9 lane might indicate the presence of three other transcript variants. The additional signal in the eIF-2{gamma} lane is likely due to a partial spliced transcript containing intron 1.

The dicistronic organization of Su(var)3-9/eIF-2{gamma} was found in all the holometabolic insect species studied (Fig 1). The mRNAs of Su(var)3-9 and eIF-2{gamma} in holometabolic insects are produced by alternative splicing, resulting in an N-terminal fusion of the SU(VAR)3-9 protein with the first 80 amino acids of eIF-2{gamma}. A further fusion of SU(VAR)3-9 at the C terminus with the eIF-2{gamma} protein appears to be caused by the usage of a 5' splice site at the 3' end of the SU(VAR)3-9 ORF in the coleopteran and lepidopteran species (Fig 1 and Fig 2).

Phylogenetic analysis of SU(VAR)3-9 orthologs:
Because of high sequence conservation between all known eIF-2{gamma} genes, the first AUG used as a start codon can unambiguously be deduced. There is evidence for the use of the same start site of translation in SU(VAR)3-9. The first AUG in Su(var)3-9 mRNAs is in the same sequence context as in the eIF-2{gamma} transcripts. In both Leptinotarsa and Scoliopterix, in-frame stop codons are found 5' to the first AUG codon. No other methionine codon N-terminal to the chromobox is conserved between all the studied insect species. Furthermore, in Clytus and Leptinotarsa no other AUG codon is found between the putative start AUG and the chromobox.

The sequence data available allow detailed analysis of conserved and more variable regions within SU(VAR)3-9 and eIF-2{gamma} (Fig 4). No significant sequence conservation between different insect orders is found within regions II, IV, and VII, which are located between the conserved regions of SU(VAR)3-9. The highest degree of sequence identity between all the known SU(VAR)3-9 orthologs is found in the SET domain (35%), followed by the SET-domain-associated cysteine-rich (SAC; 16%) and chromo domains (12%). Phylogenetic trees were computed using the complete SU(VAR)3-9 sequence or individual regions within the gene. If the SET and SAC domains are computed together (Fig 5), all known SU(VAR)3-9 orthologs are branched together with 93% quartet puzzle and 67% bootstrap support excluding other similar SET domain proteins as Dm E(Z) (JONES and GELBART 1993 Down), Hs G9a (MILNER and CAMPBELL 1993 Down), and Ce R05D3.11 (accession no. P34544). Our results suggest a common origin of all SU(VAR)3-9 orthologs, independent of the diagnostic feature, which is the occurrence of both the chromo and SET domains within all SU(VAR)3-9 proteins.



View larger version (90K):
In this window
In a new window
Download PPT slide
 
Figure 4. Alignment of SU(VAR)3-9 orthologs. Above each amino acid sequence the regional subdivision and the degree of conservation are shown. Sequences of conserved regions are colored: The common region with eIF-2{gamma} is yellow boxed, the chromo domain is red, the SAC domain is green, and the SET domain is blue. A block of a weakly conserved sequence [consensus E(RL)(LV)(SQ)(EF)] immediately C-terminal to the common region with eIF-2{gamma} is gray boxed. The SAC domain (HUANG et al. 1998 Down) contains parts before and after the SET domain. Sequences that are black boxed at the C terminus are deleted as a result of the in-frame fusion of SU(VAR)3-9 and eIF-2{gamma} ORFs in the indicated species. Abbreviations (in addition to Fig 1): Sp Clr4p, S. pombe Clr4p (IVANOVA et al. 1998 Down); Dm Suv39, D. melanogaster Su(var)3-9 (TSCHIERSCH et al. 1994 Down); Hs SUV39H1, Homo sapiens SUV39H1 (AAGAARD et al. 1999 Down).



View larger version (12K):
In this window
In a new window
Download PPT slide
 
Figure 5. SAC and SET domain-based tree of SU(VAR)3-9-related protein sequences. Numbers refer to support above (quartet puzzling) and below (bootstrap) internal branches, and branch length reflects maximum-likelihood distances. Abbreviations (in addition to Fig 1 and Fig 4): Dm E(Z), D. melanogaster E(Z) (JONES and GELBART 1993 Down); Hs G9a, H. sapiens G9a (MILNER and CAMPBELL 1993 Down); Ce R05D3.11, C. elegans ORF (accession no. P34544).

Characterization of the eIF-2{gamma} gene structure in arthropods:
The eIF-2{gamma} gene was isolated from Saccharomyces cerevisiae, S. pombe, and humans (HANNIG et al. 1993 Down; GASPAR et al. 1994 Down; DORRIS et al. 1995 Down). Further eIF-2{gamma} homologous sequences could be identified by BLAST searches (ALTSCHUL et al. 1997 Down) in Arabidopsis thaliana [two different genes, accession nos. AC002411 and AL021713, verified by overlapping expressed sequence tags (EST)] and C. elegans (genomic clone Y74C10, verified by overlapping EST). We isolated eIF-2{gamma} homologous sequences from five holometabolic insects (D. melanogaster, D. erecta, S. libatrix, C. arietis, and L. decemlineata) as well as from the centipede L. forficatus (Fig 6). In all analyzed insect species eIF-2{gamma} is associated with Su(var)3-9 within a dicistronic transcription unit. In contrast, in all noninsect species studied these genes are independent.



View larger version (65K):
In this window
In a new window
Download PPT slide
 
Figure 6. eIF-2{gamma} alignment. Above the amino acid sequence the three parts of the GTP binding domain (NARANDA et al. 1995 Down) are boxed in gray. The N-terminal prolongation in budding yeast is boxed. A deletion of this region is without consequences (ERICKSON et al. 1997 Down). In all sequences, identified locations of loss-of-function point mutations are shaded gray (DORRIS et al. 1995 Down; NARANDA et al. 1995 Down; ERICKSON and HANNIG 1996 Down; ERICKSON et al. 1997 Down). Known intron positions are boxed in black. Under the sequence of the N-terminal amino acid block a secondary structure prediction (ROST 1996 Down) for the eIF-2{gamma} proteins is shown. "Disturbed" indicates the prediction of secondary structure for the N-terminal eIF-2{gamma} region fused to SU(VAR)3-9. {alpha}-helices and ß-sheets are minimal consensus predictions of structures for all shown insect proteins. Abbreviations (in addition to Fig 1 and Fig 4): Sc GCD11p, S. cerevisiae eIF-2{gamma} (HANNIG et al. 1993 Down); Ce, C. elegans.

A maximum-likelihood tree that is based on eIF-2{gamma} amino acid sequences is shown in Fig 7 and is consistent with a recent consensus species phylogeny (MADDISON 1997 Down). However, KEELING et al. 1998 Down places C. elegans on different nodes in an eIF-2{gamma} gene tree. This may be due to a partially incorrect protein sequence contained in the WORMPEP (rel. 16) database (Y39G10A 246.C). We corrected this sequence with the help of overlapping ESTs and improved the alignment. The resulting perfect agreement of the eIF-2{gamma} gene tree with the species tree argues for a true orthologue relation of all compared eIF-2{gamma} sequences, irrespective of their quite distinct gene structure in insects, and underlines the usefulness of the eIF-2{gamma} sequence and gene structure as an excellent molecular marker for phylogenetics.



View larger version (9K):
In this window
In a new window
Download PPT slide
 
Figure 7. Maximum-likelihood tree of eIF-2{gamma} protein sequences. Numbers refer to support above (quartet puzzling) and below (bootstrap) internal branches, and branch length reflects maximum-likelihood distances.


*  DISCUSSION
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

In only a few cases have polycistronic transcription units in higher eukaryotes been found where unrelated proteins are specified by a single pre-mRNA after alternative splicing (BLUMENTHAL 1998 Down). The unc-17/cha-1 dicistronic cluster, which was first discovered in C. elegans (ALFONSO et al. 1994 Down), is also found in insects and mammals (BERRARD et al. 1995 Down; KITAMOTO et al. 1998 Down), indicating an early origin during evolution as well as a functional relevance of this structure. In contrast, our analysis of the Su(var)3-9/eIF-2{gamma} dicistronic transcription unit showed that it is only found in insects. Therefore, we conclude that the fusion of these two functionally completely unrelated genes into a dicistronic unit arose before radiation of holometabolic insects after the crown arthropod lines (chelicerates, crustaceans, myriapods, and hexapods) had been separated. Suggesting a monophyletic origin of this gene arrangement, we have to place the event between 280 and 545 million years ago (FORTEY and THOMAS 1998 Down).

In contrast to already known cases of polycistronic clusters (ALFONSO et al. 1994 Down; ZHANG et al. 1999 Down) where the shared exons are noncoding, the fused Su(var)3-9 and eIF-2{gamma} genes are unique in sharing N-terminally two coding exons. The insertion of Su(var)3-9 into an intron of eIF-2{gamma} could be caused by either an insertion of an intron-free Su(var)3-9 gene by retroinsertion or via a translocation-like process. The ancient Su(var)3-9 gene has been lost because in none of the studied species has any indication for other Su(var)3-9 or eIF-2{gamma} homologous sequences been detected and it is unlikely that gene duplications exist.

In the dicistronic transcription unit SU(VAR)3-9 becomes fused with the 80 N-terminal amino acids of eIF-2{gamma}. Our data suggest that the same translation start site is used for both proteins, also indicating that the original Su(var)3-9 gene was not completely inserted into eIF-2{gamma}. Probably, the ancient promoter and at least one exon remained at the original site. In all studied species such as yeast, D. melanogaster, and mammals, the SU(VAR)3-9 protein is associated with heterochromatin and is connected with gene silencing (TSCHIERSCH et al. 1994 Down; IVANOVA et al. 1998 Down; AAGAARD et al. 1999 Down). This functional conservation indicates that the N-terminal addition of eIF-2{gamma} amino acid sequences to SU(VAR)3-9 in insects does not interfere with its function. This is also supported by studies with transgenic Drosophila lines expressing N-terminally truncated SU(VAR)3-9 (AAGAARD et al. 1999 Down). Overexpression of both the Drosophila full-length as well as the N-terminally truncated protein results in enhancement of gene silencing in position-effect variegation.

In Coleopterans and Lepidopterans an in-frame fusion of the SU(VAR)3-9 and eIF-2{gamma} encoding sequences raises the question about possible implications for function of the deduced fusion protein. The N-terminal ~200 amino acids of eIF-2{gamma} represent the conserved G domain, involved in both GTP and MettRNA binding (DORRIS et al. 1995 Down; ERICKSON and HANNIG 1996 Down; ERICKSON et al. 1997). In the putative fusion protein, amino acids of the G domain essential for GTP and MettRNA binding become separated by the inserted SU(VAR)3-9 sequence. Structure predictions (PredictProtein server; http://www2.ebi.ac.uk/~rost/predictprotein/; ROST 1996 Down) indicate a complete disruption of the conserved secondary structure of the whole N-terminal region of eIF-2{gamma} within the putative SU(VAR)3-9/eIF-2{gamma} fusion protein (Fig 6). Furthermore, in the SU(VAR)3-9-specific sequence immediately following the N-terminal 80–82 amino acids of eIF-2{gamma} in all the species studied, a helical structure is predicted for the first 5–10 amino acids. In this region a weak sequence conservation with a consensus of E(R/K)(L/V)(S/Q)(E/F) is found (gray sequence block in Fig 4), which could be efficient in total disruption of protein secondary structure within the N-terminal part of the eIF-2{gamma} G domain fused to SU(VAR)3-9.

Within all Su(var)3-9 genes the position of the stop codon is conserved, indicating that this position predates the evolution of the dicistronic gene structure. In the coleopteran and lepidopteran species studied, where a transcript with a complete in-frame fusion of the SU(VAR)3-9 and eIF-2{gamma} ORFs is found, a nonconserved 5' splice donor site located at the 3' end of the Su(var)3-9 ORF is used. In the putative SU(VAR)3-9/eIF-2{gamma} chimeric protein of coleopteran and lepidopteran species ~470 most highly conserved amino acid positions of eIF-2{gamma} are added C-terminally to the putative SU(VAR)3-9 proteins. Whether such a fusion protein is stable or post-translationally processed is not yet known. However, the fusions of SU(VAR)3-9 with eIF-2{gamma} sequences might also have promoted modifications of SU(VAR)3-9 function.

The dicistronic Su(var)3-9/eIF-2{gamma} gene structure in holometabolic insects represents an exceptionally useful tool for phylogenetic analysis within arthropods. There are considerable differences in phylogenetic concepts within this phylum (FORTEY and THOMAS 1998 Down). The fusion of the two conserved Su(var)3-9 and eIF-2{gamma} genes into a dicistronic transcription unit is unlikely to occur several times independently. This gene arrangement therefore should represent a synapomorphy (shared derived character) for a monophyletic group of arthropods. Both the genomic organization as well as sequence conservation of Su(var)3-9 and eIF-2{gamma} allow a comprehensive molecular analysis of arthropod systematics and evolution.


*  ACKNOWLEDGMENTS

We are grateful to Andreas Fischer for supplying specimens. We thank Dr. Michael Ashburner for critical reading of the manuscript and helpful comments. This work was supported by grants from the Deutsche Forschungsgemeinschaft and the Fonds der Chemischen Industrie (to G.R.).

Manuscript received January 25, 2000; Accepted for publication July 13, 2000.


*  LITERATURE CITED
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

AAGAARD, L., G. LAIBLE, P. SELENKO, M. SCHMID, and R. DORN et al., 1999  Functional mammalian homologues of the Drosophila PEV-modifier Su(var)3-9 encode centromere-associated proteins that complex with the heterochromatin component M31. EMBO J. 18:1923-1938[Medline].

ALFONSO, A., K. GRUNDAHL, J. R. MCMANUS, J. M. ASBURY, and J. B. RAND, 1994  Alternative splicing leads to two cholinergic proteins in Caenorhabditis elegans.. J. Mol. Biol. 241:627-630[Medline].

ALTSCHUL, S. F., T. L. MADDEN, A. A. SCHÄFFER, J. ZHANG, and Z. ZHANG et al., 1997  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25:3389-3402[Abstract/Free Full Text].

BERRARD, S., H. VAROQUI, R. CERVINI, M. ISRAEL, and J. MALLET et al., 1995  Coregulation of two embedded gene products, choline acetyltransferase and the vesicular acetylcholine transporter. J. Neurochem. 65:939-942[Medline].

BLUMENTHAL, T., 1998  Gene clusters and polycistronic transcription in eukaryotes. Bioessays 20:480-487[Medline].

BROGNA, S. and M. ASHBURNER, 1997  The Adh-related gene of Drosophila melanogaster is expressed as a functional dicistronic messenger RNA: multigenic transcription in higher organisms. EMBO J. 16:2023-2031[Medline].

CLEARD, F., M. DELATTRE, and P. SPIERER, 1997  SU(VAR)3-7, a Drosophila heterochromatin-associated protein and companion of HP1 in the genomic silencing of position-effect variegation. EMBO J. 17:5280-5288.

DORRIS, D. R., F. L. ERICKSON, and E. M. HANNIG, 1995  Mutations in GCD11, the structural gene for eIF-2{gamma} in yeast, alter translational regulation of GCN4 and the selection of the start site for protein synthesis. EMBO J. 14:2239-2249[Medline].

EHRMANN, I. E., P. S. ELLIS, S. MAZEYRAT, S. DUTHIE, and N. BROCKDORFF et al., 1998  Characterization of genes encoding translation initiation factor eIF-2{gamma} in mouse and human: sex chromosome localization, escape from X-inactivation and evolution. Hum. Mol. Genet. 7:1725-1737[Abstract/Free Full Text].

EISSENBERG, J. C., G. D. MORRIS, G. REUTER, and T. HARTNETT, 1992  The heterochromatin-associated protein HP-1 is an essential protein in Drosophila with dosage-dependent effects on position-effect variegation. Genetics 131:345-352[Abstract].

ERICKSON, F. L. and E. M. HANNIG, 1996  Ligand interactions with eukaryotic translation initiation factor 2: role of the {gamma}-subunit. EMBO J. 15:6311-6320[Medline].

ERICKSON, F. L., L. D. HARDING, D. R. DORRIS, and E. M. HANNIG, 1997  Functional analysis of homologs of translation initiation factor 2{gamma} in yeast. Mol. Gen. Genet. 253:711-719[Medline].

FORTEY, R. A., and R. H. THOMAS, 1998 Arthropod Relationships. Chapman & Hall, London.

GASPAR, N. J., T. G. KINZY, B. J. SCHERER, M. HÜMBELIN, and J. W. B. HERSHEY et al., 1994  Translation initiation factor eIF-2: cloning and expression of the human cDNA encoding the {gamma}-subunit. J. Biol. Chem. 269:3415-3422[Abstract/Free Full Text].

GERAGHTY, M. T., L. C. BRODY, L. S. MARTIN, M. MARBLE, and W. KEARNS et al., 1993  The isolation of cDNAs from OATL1 at Xp11.2 using a 480-kb YAC. Genomics 16:440-446[Medline].

HANNIG, E. M., A. M. CIGAN, B. A. FREEMAN, and T. G. KINZY, 1993  GCD11, a negative regulator of GCN4 expression, encodes the {gamma} subunit of eIF-2 in Saccharomyces cerevisiae.. Mol. Cell. Biol. 13:506-520[Abstract/Free Full Text].

HUANG, N., E. V. BAUR, J.-M. GARNIER, T. LEROUGE, and J.-L. VONESCH et al., 1998  Two distinct nuclear receptor interaction domains in NSD1, a novel SET protein that exhibits characteristics of both corepressors and coactivators. EMBO J. 17:3398-3412[Medline].

IVANOVA, A. V., M. J. BONADUCE, S. V. IVANOV, and A. J. S. KLAR, 1998  The chromo and SET domains of the Clr4 protein are essential for silencing in fission yeast. Nat. Genet. 19:192-195[Medline].

JAMES, T. C. and S. C. R. ELGIN, 1986  Identification of a nonhistone chromosomal protein associated with heterochromatin in Drosophila melanogaster and its gene. Mol. Cell. Biol. 6:3862-3872[Abstract/Free Full Text].

JONES, R. S. and W. M. GELBART, 1993  The Drosophila polycomb-group gene enhancer of zeste contains a region with sequence similarity to trithorax. Mol. Cell. Biol. 13:6357-6366[Abstract/Free Full Text].

JONES, D. T., W. R. TAYLOR, and J. M. THORNTON, 1992  The rapid generation of mutation data matrices from protein sequences. Comput. Appl. Biosci. 8:275-282[Abstract/Free Full Text].

KEELING, P. J., N. M. FAST, and G. I. MCFADDEN, 1998  Evolutionary relationship between translation initiation factor eIF-2{gamma} and selenocysteine-specific elongation factor SELB: change of function in translation factors. J. Mol. Evol. 47:649-655[Medline].

KITAMOTO, T., W. WANG, and P. M. SALVATERRA, 1998  Structure and organization of the Drosophila cholinergic locus. J. Biol. Chem. 273:2706-2713[Abstract/Free Full Text].

LEE, S. J., 1991  Expression of growth/differentiation factor 1 in the nervous system: conservation of a dicistronic structure. Proc. Natl. Acad. Sci. USA 88:4250-4254[Abstract/Free Full Text].

MADDISON, D. R., 1997 The Tree of Life homepage. http://phylogeny.arizona.edu/tree

MILNER, C. M. and D. R. CAMPBELL, 1993  The G9a gene in the human major histocompatibility complex encodes a novel protein containing ankyrin-like repeats. Biochem. J. 290:811-818.

MOUNT, S. M., 1993 Messenger RNA splicing signals in Drosophila genes, pp. 333–358 in An Atlas of Drosophila Genes, edited by G. MARONI. Oxford University Press, Oxford/New York.

NARANDA, T., I. SIRANGELO, B. J. FABBRI, and J. W. B. HERSHEY, 1995  Mutations in the NKXD consensus element indicate that GTP binds to the {gamma}-subunit of translation initiation factor eIF2. FEBS Lett. 372:249-252[Medline].

POWELL, J. R., 1997 Progress and Prospects in Evolutionary Biology: The Drosophila Model. Oxford University Press, Oxford/New York.

REUTER, G. and P. SPIERER, 1992  Position-effect variegation and chromatin proteins. Bioessays 14:605-612[Medline].

ROST, B., 1996  PHD: predicting one-dimensional protein structure by profile-based neural networks. Methods Enzymol. 266:525-539[Medline].

SCHOTTA, G. and G. REUTER, 2000  Controlled expression of tagged protein proteins in Drosophila using a new modular P-element vector system. Mol. Gen. Genet. 262:916-920[Medline].

STRIMMER, K., and A. V. HAESELER, 1997 PUZZLE 4.0: Maximum Likelihood Analysis for Nucleotide, Amino Acid, and Two-State Data. http://evolution.genetics.washington.edu/phylip/software.html

SWOFFORD, D. L., 1993 PAUP: Phylogenetic Analysis Using Parsimony. Illinois Natural History Survey, Champaign, IL.

TSCHIERSCH, B., A. HOFMANN, V. KRAUSS, R. DORN, and G. KORGE et al., 1994  The protein encoded by the Drosophila position-effect variegation suppressor gene Su(var)3-9 combines domains of antagonistic regulators of homeotic gene complexes. EMBO J. 13:3822-3831[Medline].

WALLRATH, L. L., 1998  Unfolding the mysteries of heterochromatin. Curr. Opin. Genet. Dev. 8:147-153[Medline].

WEILER, K. S. and B. T. WAKIMOTO, 1995  Heterochromatin and gene expression in Drosophila. Annu. Rev. Genet. 29:577-605[Medline].

ZHANG, Y. Q., J. ROOTE, S. BROGNA, A. W. DAVIS, and D. N. BARBASH et al., 1999  Stress sensitive B encodes an adenine nucleotide translocase in Drosophila melanogaster.. Genetics 153:891-903[Abstract/Free Full Text].

ZORIO, D. A. R., N. N. CHENG, T. BLUMENTHAL, and J. SPIETH, 1994  Operons as a common form of chromosomal organization in C. elegans.. Nature 372:270-272[Medline].




This article has been cited by other articles:


Home page
Physiol. GenomicsHome page
A. Sivakumar, C. Wilton, and L. Holm
From sequences to a functional unit
Physiol Genomics, March 13, 2006; 25(1): 1 - 8.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
V. Krauss, M. Pecyna, K. Kurz, and H. Sass
Phylogenetic Mapping of Intron Positions: A Case Study of Translation Initiation Factor eIF2{gamma}
Mol. Biol. Evol., January 1, 2005; 22(1): 74 - 84.
[Abstract] [Full Text] [PDF]