Genetics, Vol. 164, 433-442, June 2003, Copyright © 2003

Phospholipase C-{gamma} Contains Introns Shared by src Homology 2 Domains in Many Unrelated Proteins

Charlene M. Manning1,a, Wendy R. Mathews2,a, Leah P. Ficoa, and Justin R. Thackeraya
a Biology Department, Clark University, Worcester, Massachusetts 01610

Corresponding author: Justin R. Thackeray, Clark University, 950 Main St., Worcester, MA 01610., jthackeray{at}clarku.edu (E-mail)

Communicating editor: J. HEY


*  ABSTRACT
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

Many proteins with novel functions were created by exon shuffling around the time of the metazoan radiation. Phospholipase C-{gamma} (PLC-{gamma}) is typical of proteins that appeared at this time, containing several different modules that probably originated elsewhere. To gain insight into both PLC-{gamma} evolution and structure-function relationships within the Drosophila PLC-{gamma} encoded by small wing (sl), we cloned and sequenced the PLC-{gamma} homologs from Drosophila pseudoobscura and D. virilis and compared their gene structure and predicted amino acid sequences with PLC-{gamma} homologs in other animals. PLC-{gamma} has been well conserved throughout, although structural differences suggest that the role of tyrosine phosphorylation in enzyme activation differs between vertebrates and invertebrates. Comparison of intron positions demonstrates that extensive intron loss has occurred during invertebrate evolution and also reveals the presence of conserved introns in both the N- and C-terminal PLC-{gamma} SH2 domains that are present in SH2 domains in many other genes. These and other conserved SH2 introns suggest that the SH2 domains in PLC-{gamma} are derived from an ancestral domain that was shuffled not only into PLC-{gamma}, but also into many other unrelated genes during animal evolution.


FOUR distinct types of phospholipase C (PLC) protein, ß, {gamma}, {delta}, and {epsilon}, are produced in mammals by a rather diverse family of more than 10 genes. All PLCs catalyze the hydrolysis of the membrane phospholipid phosphatidyl inositol 4,5-bisphosphate [PI(4,5)P2] into inositol 1,4,5-trisphosphate (InsP3) and diacylglycerol. The former is bound by specific receptors embedded in the endoplasmic reticular membrane, leading to a transient increase in intracellular calcium, while the latter is a direct activator of protein kinase C. The pattern of expression, mechanism of activation, and cellular function vary considerably among the four types (reviewed by REBECCHI and PENTYALA 2000 Down; LOPEZ et al. 2001 Down; SONG et al. 2001 Down). All PLC proteins have three domains in common: a C2 domain and catalytic domains X and Y; three of the PLC types, ß, {gamma}, and {delta}, also share an N-terminal pleckstrin homology (PH) domain and EF hands. PLC-{delta} may be the ancestral form, because ß, {gamma}, and {epsilon} types are absent from plants and simple eukaryotes such as yeast.

PLC-{gamma} is particularly interesting from an evolutionary standpoint, because it has a central region between the X and Y catalytic domains that is unique among PLC subtypes. This central region contains one src homology 3 (SH3) and two SH2 domains within a split PH domain, implying a series of shufflings and duplications to produce the modern PLC-{gamma} structure from an ancestral PLC form. These events were apparently completed before the parazoan-eumetazoan split, because the sponge Ephydatia fluviatilis has a PLC-{gamma} homolog with identical structure to all other animal forms (KOYANAGI et al. 1998 Down). The SH2 and SH3 domains of PLC-{gamma} are typical examples of the many widely distributed modules that are found in many different proteins involved in signal transduction; each domain is presumed to have arisen once and then been spread both by gene duplication and by being co-opted into existing genes by exon shuffling via retrotransposition, illegitimate recombination, or long interspersed nuclear element-mediated 3' transduction (LONG 2001 Down). In the case of PLC-{gamma} these acquired domains permit specific protein:protein interactions: the SH2 domains are thought to allow PLC-{gamma} to bind to specific phosphorylated tyrosine residues on the cytoplasmic face of activated receptor tyrosine kinases, whereas the SH3 domain is essential in allowing PLC-{gamma} to activate the phosphatidylinositol-3-OH kinase [PI(3)K] enhancer (YE et al. 2002 Down).

PLC-{gamma} is involved in regulating many aspects of cell physiology, including proliferation, differentiation, and motility (reviewed by REBECCHI and PENTYALA 2000 Down). PLC-{gamma} activation is triggered by the binding of a wide variety of growth factors, cytokines, and immunoglobulins to their membrane-bound receptor. Two distinct PLC-{gamma} homologs encoded by different genes—PLC-{gamma}1 and PLC-{gamma}2—are present in mammals, whereas all invertebrates described to date have only a single homolog. In Drosophila, the PLC-{gamma} homolog is encoded by small wing (sl; EMORI et al. 1994 Down; THACKERAY et al. 1998 Down), a gene first identified by Bridges in 1915 (LINDSLEY and ZIMM 1992 Down). Flies lacking sl function are viable, with mild defects in eye and wing development due to overactivation of the epidermal growth factor receptor signaling pathway (THACKERAY et al. 1998 Down). This weak phenotype contrasts markedly with the more significant effects resulting from loss of PLC-{gamma} function in vertebrates: PLC-{gamma}1 knockout mice die in early embryogenesis (JI et al. 1997 Down), while mice lacking PLC-{gamma}2 are viable, but have defects in B cell development (HASHIMOTO et al. 2000 Down; WANG et al. 2000 Down). The amino acid sequence of Sl is generally similar to its mammalian homologs (EMORI et al. 1994 Down), suggesting that its mode of activation and cellular function are conserved. However, subtle differences in its activation may exist; for example, two of the three tyrosines in mammalian PLC-{gamma}1, which become phosphorylated during activation, are missing in Sl. In an effort to determine the functional significance of these and other differences between vertebrate and invertebrate PLC-{gamma}, we cloned and sequenced two additional Drosophila PLC-{gamma} homologs. We present here a comparison of both the amino acid sequence and the gene structure of these sl homologs in Drosophila, as well as among a variety of other invertebrate and vertebrate PLC-{gamma} homologs. Our analysis not only sheds light on PLC-{gamma} function and evolution, but also provides evidence for a common origin of SH2 domains in all proteins.


*  MATERIALS AND METHODS
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

{lambda}-library construction and screening:
{lambda}-libraries of Drosophila pseudoobcura and D. virilis genomic DNA were constructed in {lambda}FIXII (Stratagene, La Jolla, CA) and {lambda}GEM11 (Promega, Madison, WI), respectively. Briefly, genomic DNA was prepared from adults by standard methods, partially digested with DpnII, and separated on a 0.5% agarose gel. Fragments in the 10- to 20-kb range were recovered onto dialysis membrane and the ends partially filled in with dCTP and dTTP. The genomic fragments were then ligated overnight at 4° to XhoI-digested {lambda}DNA that had been partially filled in with dATP and dGTP. The ligated DNA was packaged in vitro using Packagene extracts according to the manufacturer's instructions (Stratagene). Because the D. virilis library did not contain any clones corresponding to the 3' end of the Drosophila PLC-{gamma} homolog (sl), we obtained a D. virilis {lambda}-genomic library from Thomas Kaufman (Indiana University) that was originally constructed by Ronald Blackman (BLACKMAN and MESELSON 1986 Down). Amplified libraries were initially screened with a 1.8-kb BamHI genomic fragment from D. melanogaster sl that includes the SH2 and SH3 domains of PLC-{gamma}. About 100,000 plaques from each species were lifted onto nylon membranes and probed with the radiolabeled BamHI fragment in standard filter hybridizations (SAMBROOK et al. 1989 Down). Posthybridization washes were performed at low stringency (2.5x SSC, 0.1% SDS at 65°). Positively hybridizing plaques were purified by two further rounds of plating and hybridization. Purified plaques were obtained and {lambda}-genomic DNA was extracted by the plate lysate method (SAMBROOK et al. 1989 Down).

DNA sequencing:
Restriction fragments containing sl exons were identified by Southern blot hybridization using a variety of probes corresponding to the sl open reading frame (ORF). For the D. virilis gene, several of these fragments were subcloned into pBluescript KS+ (Stratagene) and sequenced on both strands by primer walking. Most of the D. pseudoobscura gene was sequenced by transposon tagging of a 4.8-kb PstI/HindIII subclone with the EZ::TN<KAN-2> kit, using the conditions recommended by the manufacturer (Epicentre Technologies, Madison, WI). The 5' end of the D. pseudoobscura gene sequence was determined by primer walking within an overlapping 2.1-kb PstI/SacI fragment. For each species, the subcloned fragments included the entire sl homolog. Sequencing reactions on purified plasmid DNA were performed using the ABI Prism Big Dye terminator cycle sequencing kit (Applied Biosystems, Foster City, CA). Sequencing reactions were purified by ethanol precipitation with pellet paint (Novagen, Madison, WI) and separated on a Perkin-Elmer (Norwalk, CT) ABI Prism 377 DNA sequencer. Sequences were edited with Sequencher 4.1 (Gene Codes, Ann Arbor, MI) and further analyzed with MacVector 6.01 (Accelrys, Madison, WI).

Phylogenetic analysis:
Amino acid sequences were aligned using either ClustalW 1.4 (THOMPSON et al. 1994 Down) or ClustalX 1.6 (THOMPSON et al. 1997 Down) and in some cases improved by eye. Unrooted trees were generated from the aligned sequences by PAUP (version 4.0b10; SWOFFORD 2002 Down) using bootstrap with heuristic search and 1000 replicates; gaps were treated as missing; parsimony was used as the optimality criterion; starting trees were obtained by stepwise addition; and branch swapping was by tree-bisection reconnection.

Tyrosine phosphorylation prediction:
The probability of a given tyrosine being phosphorylated was determined using the NetPhos algorithm (http://www.cbs.dtu.dk/services/NetPhos/), which uses a neural network method to compare tyrosines and their local context to tyrosines known to be phosphorylated (BLOM et al. 1999 Down). A score >0.5 predicts that a tyrosine is in a context that will be phosphorylated in vivo.


*  RESULTS
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

Isolation of sl homologs from D. virilis and D. pseudoobscura:
We constructed {lambda}-libraries of D. pseudoobscura and D. virilis genomic DNA and screened them with various genomic probes from the D. melanogaster PLC-{gamma} homolog encoded by sl. Several clones were isolated and mapped by Southern blotting and various restriction fragments were subcloned and sequenced from each species. We compared the two putative sl homologs with the D. melanogaster sl sequence by dot-matrix analysis and identified sequences corresponding to all four sl exons. Each presumed exon shows a high level of DNA sequence identity to D. melanogaster sl (data not shown) and the splice junctions all conform to the well-established consensus sequences previously described in Drosophila (MOUNT et al. 1992 Down). The intron/exon structure has remained unchanged, with minor differences in length of introns 1 and 2 between the three species. The conserved exon/intron structure and very similar DNA sequences strongly suggest that the genes we describe here are the sl homologs in D. pseudoobscura and D. virilis.

Phylogenetic analysis of PLC-{gamma} proteins:
The PLC-{gamma} homolog encoded by sl is the only one present in the now almost-complete D. melanogaster genome sequence, and only a single gene has been identified in the other invertebrate genomes sequenced to date. By contrast, a gene duplication event produced separate {gamma}1 and {gamma}2 subtypes at some point in the vertebrate lineage; each subtype has identical domain structure and similar sequence, but distinct functions. To reveal the relationships between PLC-{gamma} homologs, we produced a translation of the putative ORFs from both D. pseudoobscura and D. virilis and compared them to all other complete PLC-{gamma} homologs in the GenBank database, including D. melanogaster, the mosquito Anopheles gambiae, the sponge E. fluviatilis, the nematode Caenorhabditis elegans, the cow Bos taurus ({gamma}1), the rat Rattus rattus ({gamma}1 and {gamma}2), and Homo sapiens ({gamma}1 and {gamma}2). The best tree from this comparison is shown in Fig 1A. The tree divides the taxa unambiguously into a mammalian clade and a dipteran clade and gives moderate support to placing the nematode C. elegans closer to the arthropods than to the mammals. The branching arrangement among species of Drosophila is consistent with previous molecular phylogenies of these species (e.g., RUSSO et al. 1995 Down). The phylogram also suggests that the PLC-{gamma}1/{gamma}2 duplication event occurred long before the mammalian radiation, because there is an ~10-fold greater number of changes between the PLC-{gamma}1/{gamma}2 duplication and the mammalian radiation (383 and 401 changes in the {gamma}1 and {gamma}2 branches, respectively) compared to between 21 and 62 changes in each lineage since this radiation. Although we cannot assume that rates of evolution among paralogs in the vertebrate lineage have been constant, at the very least this disparity suggests that the PLC-{gamma}1/{gamma}2 duplication did not immediately precede the mammalian radiation.




View larger version (47K):
In this window
In a new window
Download PPT slide
 
Figure 1. (A) Phylogram of PLC-{gamma} homologs. The number of changes between each sequence is represented by the length of each branch, and the single best tree is shown. Numbers at each node indicate bootstrap values. (B) Alignment of vertebrate and invertebrate PLC-{gamma} proteins in the region where tyrosine phosphorylation occurs in mammalian PLC-{gamma}. The amino acid sequence between the C-terminal SH2 domain and the SH3 domain was aligned using ClustalW 1.4 and improved by eye. Dark-gray shading indicates a position where at least half the proteins have same amino acid; light-gray shading indicates a position where at least half the proteins have an amino acid with similar biochemical properties. The tyrosines known to be phosphorylated in human PLC-{gamma}1 and PLC-{gamma}2 are indicated by a circled P above the alignment.

Sequence conservation among PLC-{gamma} proteins:
The 11 PLC-{gamma} homologs shown in Fig 1A are well conserved throughout the sequence, with the highest level in the X and Y catalytic domains (data not shown; full alignment is available at http://prism.clarku.edu/departments/biol/faculty/Thackeray/PLCgamma.pdf). This is particularly noticeable in the N-terminal half of region X, where there are two blocks of 9 and 10 amino acids, each with 100% identity among all 11 species, the two largest contiguous blocks anywhere in the alignment. The two PH domains show a lower level of conservation than the catalytic regions, especially in C. elegans, in which most of the sequence fails to align clearly with the other species. However, despite the low level of conservation, two PH domains are recognized in the C. elegans sequence by an InterPro motif scan (data not shown). The SH3, C2, and both of the SH2 domains all show consistently high levels of sequence identity, confirming their critical importance to PLC-{gamma} function. The region with the weakest level of conservation lies between the N-terminal PH domain and region X, where four EF hand motifs were previously identified in PLC-{gamma}.

Tyrosine phosphorylation sites in PLC-{gamma}:
PLC-{gamma}1 becomes phosphorylated on three tyrosines, Y771, Y783, and Y1254, during its activation in growth factor signaling pathways (KIM et al. 1990 Down; WAHL et al. 1990 Down), although Y783 appears to be the only one required for activation in vivo (KIM et al. 1991 Down) and no tyrosine corresponding to Y1254 exists in PLC-{gamma}2. Fig 1B shows an alignment of a 50-amino-acid region from 12 PLC-{gamma} homologs, centered on the region homologous to Y771/Y783 of human PLC-{gamma}1. This alignment includes additional sequences homologous to PLC-{gamma} from the toad Xenopus laevis and the purple sea urchin Paracentrotus lividus, which were excluded from the full alignment because they are incomplete; however, because both proteins contain the X and Y catalytic domains and all three SH domains, they are very likely to be true PLC-{gamma} homologs. Although the alignment is excellent among the five vertebrate sequences, as would be expected given the demonstrated importance of tyrosine phosphorylation in regulating vertebrate PLC-{gamma} activity, the alignment is surprisingly weak in the seven invertebrate proteins. With the exception of the sponge, E. fluviatilis, all of the invertebrates lack a tyrosine at the position homologous to Y771. Although the sponge sequence aligns poorly with the other proteins in this region, it has four tyrosines arranged as two paired residues in this region, two of which align near Y771 and are each in a context that suggests that they are likely to be phosphorylated in vivo (NetPhos scores are 0.595 and 0.787 for the N- and C-terminal residues, respectively). By contrast, the presence of a tyrosine in a conserved context at a position homologous to mammalian PLC-{gamma}1 Y783 in all the invertebrates strongly suggests that this tyrosine acts as a regulatory site in all PLC-{gamma} homologs. However, while each tyrosine in the A. gambiae, P. lividus, E. fluviatilis, and C. elegans sequences at this position has a relatively high NetPhos score that suggests that phosphorylation is likely (0.749, 0.909, 0.491, 0.420, respectively), each of the Drosophila tyrosines at this position has a much lower NetPhos score (0.154).

Evolution of gene structure in PLC-{gamma} homologs:
When comparing the Drosophila PLC-{gamma} homologs, we noted that the second and third introns almost precisely bracket the central region containing the SH2-SH2-SH3 domain, which is unique among PLC subtypes. Although these two introns are not symmetric as would be expected if these domains were originally inserted as a unit (LONG et al. 1995 Down), there is evidence that "sliding" of introns occurs occasionally (ROGOZIN et al. 2000 Down); this pair of introns might therefore be vestiges of the shuffling events that presumably created the {gamma}-form from an ancestral PLC gene lacking SH domains. To reveal whether these introns are conserved in other PLC-{gamma} genes, we examined the homologs in five species for which the complete gene structure is available: A. gambiae, C. elegans, Mus musculus ({gamma}2), and H. sapiens ({gamma}1 and {gamma}2). The PLC-{gamma} gene has changed in size quite considerably during evolution; the three invertebrate genes are relatively small at 4.2, 5.7, and 6.2 kb in D. melanogaster, A. gambiae, and C. elegans, respectively, whereas the mammalian genes range from 30.5 kb for mouse PLC-{gamma}2 to 36.9 and 172 kb for human PLC-{gamma}1 and PLC-{gamma}2. The number of introns correlates well with gene size: only 3 introns are in the small Drosophila genes, 8 in the slightly larger Anopheles gene, 11 in C. elegans, and 31 in all three mammalian genes.

The position of every intron in all eight PLC-{gamma} genes for which the gene structure has been determined is shown in Fig 2A. Because the level of conservation across much of the PLC-{gamma} protein is high, in most cases it is possible to determine whether two introns at homologous positions are identical by descent or simply happen to have been inserted at similar locations since the species diverged. This can be determined by comparing both the amino acid alignment at the splice site and the intron phase. We found that all 31 introns in each of the three mammalian genes are at conserved locations; similarly, all three Drosophila introns are at invariant positions within that genus (data not shown). We also found that 7 of the 31 mammalian introns are also present in one of the other species; these are indicated as introns a–g in Fig 2A, and their exact position is shown in alignments in Fig 2B. Four additional sites (marked with shaded triangles in Fig 2A) probably represent shared intron positions, because they are in the same phase at equivalent regions of the protein; however, the alignment in these regions is too weak to be certain whether they are identical by descent. One of this uncertain group is the intron at the C-terminal end of the SH3 domain that we first identified in D. melanogaster. Four of the 8 introns in A. gambiae (marked as a, c, f, and g in Fig 2A and Fig B) and 4 of the 11 in C. elegans (b, d, e, and g in Fig 2A and Fig B) are shared with an intron in the mammalian PLC-{gamma} genes; one of this group (g) is shared between A. gambiae, C. elegans, and the mammalian genes. Although the phase 0 intron at the C-terminal end of the SH2-SH2-SH3 region may be conserved among PLC-{gamma} genes, the one near the N-terminal end of the first SH2 domain seems to be unique to the Drosophila lineage.





View larger version (83K):
In this window
In a new window
Download PPT slide
 
Figure 2. (A) Intron positions in eight PLC-{gamma} genes. The location of every intron in each species is shown by a triangle above a schematic representation of the PLC-{gamma} protein; the phase of splicing for each intron is indicated by the number above it. Solid triangles indicate a conserved intron in more than one species; shaded triangles indicate introns that may be conserved, but poor local alignment prevents them from being definitively identified as conserved. Each conserved intron is given a letter beneath the protein. Because the length of the protein differs in each species, particularly at the C-terminal end, the length of each domain is approximate. "D. mel/pse/vir" indicates introns in D. melanogaster, D. pseudoobscura, and D. virilis; all three species have introns of the same phase at identical locations. Similarly, all 31 introns in the human PLC-{gamma}1 and PLC-{gamma}2 and in the mouse PLC-{gamma}1 break the gene at the homologous codons in the same phase. The central conserved introns in the N- and C-terminal SH2 domains of the vertebrate genes are indicated by solid circles. (B) Protein alignments around the conserved introns. For each of the conserved introns shown in A, the ClustalW 1.4 alignment of amino acid sequence around it is shown, together with the position of an intron indicated by a circle. (C) Alignment of SH2 domain sequences around a conserved intron from human PLC-{gamma}1 and PLC-{gamma}2 proteins. The alignment was generated as described above; intron positions are indicated following the scheme used in B.

Conserved introns in the SH2 domains of PLC-{gamma} and many diverse proteins:
When we compared PLC-{gamma} intron positions between species, we noted that an intron within the mammalian N-terminal SH2 domains breaks at the homologous residue and in the same phase as the conserved intron f in the C-terminal SH2 domain (each of these two sites is identified by a solid circle in Fig 2A). Because this intron breaks the seventh amino acid of the conserved C ß-strand of SH2 domains (nomenclature of KURIYAN and COWBURN 1997 Down), we refer to it as the ßC7 intron. An amino acid alignment around the ßC7 intron of both N- and C-terminal SH2 domains from mouse and human PLC-{gamma}1 and PLC-{gamma}2 is shown in Fig 2C. Although the codon split by this intron encodes different amino acids in the N-terminal and C-terminal SH2 domains (tryptophan and arginine, respectively), the identical phase and quality of the alignment on either side of the ßC7 intron strongly suggest that the introns in question are homologous.

A conserved intron in the N- and C-terminal SH2 domains of PLC-{gamma} implies that the two domains are derived from the same ancestral SH2 either by duplication within an ancestral PLC-{gamma} gene or by two independent rounds of "exon shuffling" from exterior sources, which both happened to carry an intron at the same position. To distinguish between these two models, we compared both the protein sequence and the intron position from additional mammalian genes containing an SH2 domain, identified at random in the GenBank database. Human and mouse genes were targeted both because our data (above) and that of others (e.g., SCHMITT and BROWER 2001 Down) suggest that there is less selective pressure to lose introns in mammals and because the genome projects from these species provide cDNA evidence for gene structures. From a random sample of 30 SH2-domain-containing proteins we identified 27 SH2 domains that contain at least one intron. An alignment of these SH2 domains together with the five in PLC-{gamma} homologs that have an intron is shown in Fig 3, as well as the position of all introns within them. We found the ßC7 intron in a further 9 SH2 domains: six adaptor proteins (Eat2, SAP, TSAd, Shb, Shd, and a Shb homolog), a nonreceptor tyrosine phosphatase (PTPN6) and both SH2 domains in the PI 3-kinase regulatory subunit, p85. This means that the presence of the ßC7 intron in both of the PLC-{gamma} SH2 domains is not proof of a duplication event having occurred within PLC-{gamma}, because the prevalence of the ßC7 intron strongly suggests that it is likely to have been present in an ancestral SH2 domain.



View larger version (105K):
In this window
In a new window
Download PPT slide
 
Figure 3. Position of all introns within 32 SH2 domains from 29 proteins. The SH2 domains were identified at random among mouse and human proteins from the GenBank database using Entrez and the gene structure was obtained from LocusLink. The domains were aligned using ClustalW 1.4 and improved in some places by eye; none of the changes affected regions around putative conserved introns. The position of each intron is indicated by a triangle, with a number indicating the phase of splicing. The affected amino acid is indicated by a circle in the case of phase 1 and 2 introns and by an oval between the affected amino acids in the case of phase 0 introns. Conserved introns are indicated by a solid triangle, possibly conserved introns are indicated by a shaded triangle, and open triangles represent introns that appear to be unique to one lineage. Blnk, mouse, NP_032554; Clnk, human, XP_093920; Eat2, human, XP_086490; Hck, human, XP_009539; IRS-5, mouse, NP_061295; Jak1, human, XM_059180; Jak2, human, AF058925; Nck2, human, XP_087122; PI3Kp85R, human, XP_027982; PLC-{gamma}, A. gambiae, EAA05135; PLC-{gamma}, C. elegans, NP_496205, Wormbase T01E8.3; PLC-{gamma}, D. melanogaster, A53970, FBgn0003416; PLC-{gamma}1, human, CAA18537; PTPN6, human, NP_536859; SAP, mouse, NP_035494; SH2D3C, human, NP_005480; ShbHom, human, XP_060121; Shb, human, XP_032304; Shc, human, NP_003020; Shd, human, XP_031857; Stap1, mouse, NP_064376; Stat1, human, NP_644671; Stat3, mouse, NP_035616; Supt6h, human, XP_017037; Syp, human, XM_069074; Tensin, human, XP_029631; TSAd, mouse, AAH34847; Tyk2, human, XP_008893; Zap70, human, XP_047776.

Another very common conserved intron, ßA0, is present at the exact N-terminal end of the classically defined SH2 domain and is in the same phase as ßC7 (Fig 3). The ßA0 intron is present in 11 of the 32 SH2 domains and is spread among just as disparate a group of proteins as ßC7: a nonreceptor tyrosine kinase (Hck), a nonreceptor tyrosine phosphatase (PTPN6), a phospholipase (PLC-{gamma}), two transcription factors (Stat1 and -3), and several adaptor proteins (SH2D3C, Stap1, Shb, Shd, and a Shb homolog). In all, 53% of the SH2 domains in this group (17 out of 32) have one or both of the ßC7 and ßA0 introns. Five additional introns are shared by between 2 and 5 SH2 domains in each case. The pattern of shared intron positions shows that 22 of the 32 SH2 domains we examined can be related to one another by one or more shared intron sites (Fig 4). This demonstrates that SH2 domains not only are similar in sequence, but also share a common evolutionary heritage that is still visible in their gene structure.



View larger version (36K):
In this window
In a new window
Download PPT slide
 
Figure 4. Venn diagram showing interrelationships between SH2 domains determined by shared introns.

Relationship between PLC-{gamma} SH2 domains:
If the two PLC-{gamma} SH2 domains were produced by an initial recruitment event from another gene followed by a duplication, they should be more similar to each other than to SH2 domains in other proteins; whereas if they were derived by two independent recruitment events, they should be no more alike than SH2 domains in other proteins. We compared the human PLC-{gamma} SH2 domains to the SH2 domains from the set of 27 randomly selected proteins described above (i.e., excluding SH2 domains from other PLC-{gamma} proteins) using PAUP (Fig 5). Probably because SH2 domains are short and the alignment is weak in many places, most nodes were supported very weakly by bootstrap analysis. Although this analysis did not separate the PLC-{gamma} N- and C-terminal SH2 domains into a separate branch, they are their closest relatives in terms of the number of changes between them, consistent with a duplicative model of origin. The only SH2 domains grouped with a high level of confidence by the phylogenetic analysis were the Shd, Shb, and Shb homolog adaptors, the transcription factors Stat1 and Stat3, the N-terminal SH2 domains from the tyrosine phosphatases Syp and PTPN6, and the adaptors Eat-2 and SAP. We also examined whether the SH2 duplication in PLC-{gamma} occurred more than once during animal evolution. When we compared all SH2 domains from human PLC-{gamma}1, E. fluviatilis, D. melanogaster, and C. elegans, the N-terminal and C-terminal domains separate into distinct clades (not shown) with a moderate level of support (75) suggested by bootstrap analysis. This is consistent with a single SH2 duplication event occurring in the PLC-{gamma} gene of a common ancestor of all these species.



View larger version (20K):
In this window
In a new window
Download PPT slide
 
Figure 5. Unrooted phylogram of 29 mammalian SH2 domains. The number of changes between each sequence is represented by the length of each branch. The phylogram shown is one of three best trees; nodes with a bootstrap value >50 are indicated and the corresponding branches are shown with thick lines. The distribution of the ßC7 and ßA0 introns is also indicated.


*  DISCUSSION
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

We isolated the PLC-{gamma} homologs from D. pseudoobscura and D. virilis and compared both the gene structure and the protein sequence with other PLC-{gamma} homologs from another insect, a nematode, and several vertebrates. PLC-{gamma} is unusual among large proteins, in that almost the entire ORF is composed of recognized domains common to many other signaling proteins, which is reflected in a high level of conservation throughout. The most highly conserved parts of PLC-{gamma} are the catalytic domains, particularly within region X, where several residues have been proposed to either coordinate a calcium ion or stabilize the transition state during catalysis in PLC-{delta} (ESSEN et al. 1997 Down). This strongly suggests that the same catalytic mechanism is also used by the PLC-{gamma} forms. The importance of this region is also underscored by the fact that two mutations of sl, which encodes the Drosophila PLC-{gamma} homolog, also lie within these catalytic domains (MANKIDY et al. 2003 Down).

Although it has been known for some time that both PLC-{gamma}1 and -{gamma}2 become phosphorylated on tyrosine residues following activation, the function of this phosphorylation remains unclear (reviewed by REBECCHI and PENTYALA 2000 Down). Our analysis shows that the pattern of PLC-{gamma} phosphorylation must differ between vertebrates and invertebrates, because none of the invertebrate PLC-{gamma} homologs have tyrosines that correspond to Y771, which is phosphorylated in PLC-{gamma}1, or to Y1254 in the C-terminal tail, which is phosphorylated in PLC-{gamma}1. Our results are therefore consistent with reports that Y783 has the major role in regulation of PLC-{gamma} activation (KIM et al. 1991 Down) and suggest that Y771 and Y1254 are involved in vertebrate-specific modes of PLC-{gamma} regulation. The relatively low NetPhos score for the tyrosine homologous to Y783 in all three Drosophila species may indicate different substrate preferences among Drosophila tyrosine kinases; it is also possible that this tyrosine is not phosphorylated in Drosophila. We are currently testing, by in vitro mutagenesis of this residue in a genomic transformation construct, whether phosphorylation of this tyrosine is necessary for Sl function (M. STAPLES and J. THACKERAY, unpublished data).

An interesting and still unanswered question is the date of the PLC-{gamma} gene duplication in the vertebrate lineage. Our phylogenetic analysis suggests that it occurred after the vertebrate/invertebrate split, but well before the mammalian radiation. The PLC-{gamma}1 and PLC-{gamma}2 isoforms have partially overlapping patterns of expression, but do show significant specialization; in particular, the PLC-{gamma}2 isoform is a major contributor to immune system function. For example, transgenic mice carrying a knockout mutation in PLC-{gamma}2 are viable, but show very specific defects in B-cell development (HASHIMOTO et al. 2000 Down; WANG et al. 2000 Down), whereas PLC-{gamma}1 knockout mice die in early development (JI et al. 1997 Down). B cells play a key role in adaptive immunity, a physiological response to infection that invertebrates lack, which is thought to have arisen early in vertebrate evolution (HUGHES and YEAGER 1997 Down). Although the evidence is currently circumstantial, it may be that early vertebrates found a use for a duplicated PLC-{gamma} gene as a regulator of development in the many specialized cell types needed for the adaptive immune response.

There has been much discussion in the literature as to whether the recent evolution of genes primarily reflects loss or gain of introns. Comparison of orthologous genes between puffer fish and humans suggests that intron number and position have changed little in the vertebrate lineage, although intron size has expanded greatly in Homo (ELGAR et al. 1996 Down; YEO et al. 1997 Down). A recent study of ß-integrin genes in the coral Acropora millepora, Drosophila, C. elegans, and humans suggests that intron loss has occurred frequently within the human lineage, but to an even greater extent within the fly and nematode lineages (SCHMITT and BROWER 2001 Down). All but 1 of the 26 coral ß-integrin introns were found in at least one other phylum, suggesting that the coral gene structure represents the ancestral state; without it, many of the introns in the remaining species would have appeared to be unique to one lineage. Our comparison of PLC-{gamma} homologs fits remarkably well with this picture. First, the fact that the human PLC-{gamma}1 and PLC-{gamma}2 genes have 31 identical introns demonstrates a complete lack of change in gene structure for several hundred million years in the vertebrate lineage. Our analysis also shows that the last common ancestor of nematodes, insects, and vertebrates must have had a minimum of 8 PLC-{gamma} introns—the 7 shown in Fig 2A, plus the additional ßC7 intron in the N-terminal SH2 domain. At the very least, therefore, we can say that there has been substantial loss of introns in the Drosophila PLC-{gamma} lineage, because all 8 of these ancestral introns are missing. The structure of a PLC-{gamma} gene from a sponge or coral will be needed to determine whether the many apparently unique PLC-{gamma} introns are actually derived from an ancestral gene, as is the case in the ß-integrin genes.

During our comparison of PLC-{gamma} genes, we discovered an intron, ßC7, that is present not only in the human and mosquito C-terminal SH2 domains, but also in the human C-terminal SH2 domain and SH2 domains from nine other unrelated proteins, including a phosphatase, a kinase, and several adaptor proteins. Another intron at the SH2 N-terminal end, ßA0, was found at almost exactly the same frequency in the SH2 domain sample, not only in PLC-{gamma} and the other proteins just mentioned for ßC7, but also in two transcription factors. To be present in such a diverse group of proteins indicates that these are very old introns indeed, present in a very early SH2 domain that was subsequently co-opted by a varied collection of proteins during eukaryotic evolution. The majority of SH2 domains in our sample, 22 of 32, can be linked together by one or more common introns, demonstrating that most, if not all, SH2 domains are the result of divergent and not convergent evolution.

How ancient are the conserved SH2 introns? Although ßC7 and ßA0 may well represent introns from one of the earliest SH2 domains to appear in evolution, there are three reasons to believe that these introns do not represent ancient introns in the sense implied by the exon theory of genes (GILBERT et al. 1997 Down). First, no SH2 domains have been identified in yeast (HUNTER and PLOWMAN 1997 Down) and none have been found in plants, suggesting that this domain arose more recently than the period during which these primal exon-shuffling events occurred. Second, the 40-amino-acid region between ßA0 and ßC7 contains four three-dimensional features: three forming short ß-sheets and one an {alpha}-helical domain, whereas the exon theory suggests that ancient introns are correlated with protein features 15, 22, or 30 amino acids long (GILBERT et al. 1997 Down). Third, both the ßA0 and the ßC7 are phase 2 introns, whereas phase 0 introns are thought to have predominated in the original exon-shuffling events. Major exon shuffling involving many families of signaling proteins is thought to have occurred before the parazoan-eumetazoan split (SUGA et al. 2001 Down), so it may be that the ßA0 and ßC7 introns were present in a proto-SH2 domain before these reshufflings took place, allowing them to be spread to many varied and still extant proteins. The fact that one of the conserved introns, ßA0, is at the exact N-terminal end of the normally defined SH2 domain is highly unlikely to be coincidental, marking the N-terminal boundary of the shuffled region. This raises the question of what has happened at the C-terminal end; although we have found no evidence for a conserved intron at this end, it may be that the originally shuffled exon containing the C-terminal half of the SH2 domain contained some unnecessary sequence that has drifted in size and sequence sufficiently that it can no longer be recognized.

In any ancient family of proteins, divergent evolution will inevitably lead to some members being difficult to identify as belonging to the group without resorting to comparisons of three-dimensional structure. This idea has recently been applied to SH2 domains, demonstrating that domains in the Janus kinase (Jak) family are indeed bona fide SH2 domains (AL-LAZIKANI et al. 2001 Down). Our data confirm that the SH2 domains in the Janus kinase family members Jak1, Jak2 (and Jak3, because the introns in Jak2 and Jak3 are identical; data not shown), and Tyk2 can be linked to the main SH2 line by the shared {alpha}A2 intron. This demonstrates the utility of intron position comparison: it can confirm evolutionary connections even when overall amino acid similarity is low. Another intriguing example in our data set is SupT6H, which clearly shares a phase 0 intron with Blnk, a well-established adaptor protein with a functional SH2 domain (YABLONSKI and WEISS 2001 Down). SupT6H encodes a protein involved in transcriptional regulation and associates with both the phosphorylated form of RNA polymerase II and the actively expressed chromatin (KAPLAN et al. 2000 Down). A "degenerate" SH2 domain was reported in the human SupT6H sequence when it was first sequenced (CHIANG et al. 1996 Down), and it is tempting to speculate that, although the SH2 domain has indeed diverged in sequence, in fact SupT6H contains an active SH2 domain involved in binding to a protein involved in transcriptional regulation, possibly RNA polymerase II itself. Tyrosine kinase activity has been demonstrated in chromatin (PALANGAT and ROY 1995 Down), so there may be a phosphotyrosine target for the SupT6H SH2 domain after all.


*  FOOTNOTES

Sequence data from this article have been deposited with the EMBL/GenBank Data libraries under the accession nos. AF543827 and AF543828. Back
1 Present address: Department of Genetics, HHMI, Harvard Medical School, 200 Longwood Ave., Boston, MA 02115. Back
2 Present address: Department of Biology, Johns Hopkins University, 3400 N. Charles St., Baltimore, MD 21218. Back


*  ACKNOWLEDGMENTS

We thank Thom Kaufman for sending us the D. virilis genomic library, Rishikesh Mankidy and Tetteh Abbeyquaye for experimental advice, Deborah Robertson and David Hibbett for assistance with the phylogenetic analysis and helpful discussions, and Manyuan Long for his comments on the manuscript. This work was supported by grant no. R15 GM-55883-01 from the National Institutes of Health to J.R.T.

Manuscript received September 10, 2002; Accepted for publication March 13, 2003.


*  LITERATURE CITED
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

AL-LAZIKANI, B., F. B. SHEINERMAN, and B. HONIG, 2001  Combining multiple structure and sequence alignments to improve sequence detection and alignment: application to the SH2 domains of Janus kinases. Proc. Natl. Acad. Sci. USA 98:14796-14801.[Abstract/Free Full Text]

BLACKMAN, R. K. and M. MESELSON, 1986  Interspecific nucleotide sequence comparisons used to identify regulatory and structural features of the Drosophila hsp82 gene. J. Mol. Biol. 188:499-515.[Medline]

BLOM, N., S. GAMMELTOFT, and S. BRUNAK, 1999  Sequence and structure-based prediction of eukaryotic protein phosphorylation sites. J. Mol. Biol. 294:1351-1362.[Medline]

CHIANG, P. W., S. WANG, P. SMITHIVAS, W. J. SONG, and S. RAMAMOORTHY et al., 1996  Identification and analysis of the human and murine putative chromatin structure regulator SUPT6H and Supt6h. Genomics 34:328-333.[Medline]

ELGAR, G., R. SANDFORD, S. APARICIO, A. MACRAE, and B. VENKATESH et al., 1996  Small is beautiful: comparative genomics with the pufferfish (Fugu rubripes). Trends Genet. 12:145-150.[Medline]

EMORI, Y., R. SUGAYA, H. AKIMARU, S.-I. HIGASHIJIMA, and E. SHISHIDO et al., 1994  Drosophila phospholipase C-{gamma} expressed predominantly in blastoderm cells at cellularization and in endodermal cells during later embryonic stages. J. Biol. Chem. 269:19474-19479.[Abstract/Free Full Text]

ESSEN, L. O., O. PERISIC, M. KATAN, Y. WU, and M. F. ROBERTS et al., 1997  Structural mapping of the catalytic mechanism for a mammalian phosphoinositide-specific phospholipase C. Biochemistry 36:1704-1718.[Medline]

GILBERT, W., S. J. DE SOUZA, and M. LONG, 1997  Origin of genes. Proc. Natl. Acad. Sci. USA 94:7698-7703.[Abstract/Free Full Text]

HASHIMOTO, A., K. TAKEDA, M. INABA, M. SEKIMATA, and T. KAISHO et al., 2000  Cutting edge: essential role of phospholipase C-{gamma}2 in B cell development and function. J. Immunol. 165:1738-1742.[Abstract/Free Full Text]

HUGHES, A. L. and M. YEAGER, 1997  Molecular evolution of the vertebrate immune system. Bioessays 19:777-786.[Medline]

HUNTER, T. and G. D. PLOWMAN, 1997  The protein kinases of budding yeast: six score and more. Trends Biochem. Sci. 22:18-22.[Medline]

JI, Q. S., G. E. WINNIER, K. D. NISWENDER, D. HORSTMAN, and R. WISDOM et al., 1997  Essential role of the tyrosine kinase substrate phospholipase C-{gamma}1 in mammalian growth and development. Proc. Natl. Acad. Sci. USA 94:2999-3003.[Abstract/Free Full Text]

KAPLAN, C. D., J. R. MORRIS, C. WU, and F. WINSTON, 2000  Spt5 and spt6 are associated with active transcription and have characteristics of general elongation factors in D. melanogaster. Genes Dev. 14:2623-2634.[Abstract/Free Full Text]

KIM, H. K., J. W. KIM, A. ZILBERSTEIN, B. MARGOLIS, and J. G. KIM et al., 1991  PDGF stimulation of inositol phospholipid hydrolysis requires PLC-{gamma}1 phosphorylation on tyrosine residues 783 and 1254. Cell 65:435-441.[Medline]

KIM, J. W., S. S. SIM, U. H. KIM, S. NISHIBE, and M. I. WAHL et al., 1990  Tyrosine residues in bovine phospholipase C-{gamma} phosphorylated by the epidermal growth factor receptor in vitro. J. Biol. Chem. 265:3940-3943.[Abstract/Free Full Text]

KOYANAGI, M., K. ONO, H. SUGA, N. IWABE, and T. MIYATA, 1998  Phospholipase C cDNAs from sponge and hydra: antiquity of genes involved in the inositol phospholipid signaling pathway. FEBS Lett. 439:66-70.[Medline]

KURIYAN, J. and D. COWBURN, 1997  Modular peptide recognition domains in eukaryotic signaling. Annu. Rev. Biophys. Biomol. Struct. 26:259-288.[Medline]

LINDSLEY, D. L., and G. G. ZIMM, 1992 The Genome of Drosophila melanogaster. Academic Press, San Diego.

LONG, M., 2001  Evolution of novel genes. Curr. Opin. Genet. Dev. 11:673-680.[Medline]

LONG, M., C. ROSENBERG, and W. GILBERT, 1995  Intron phase correlations and the evolution of the intron/exon structure of genes. Proc. Natl. Acad. Sci. USA 92:12495-12499.[Abstract/Free Full Text]

LOPEZ, I., E. C. MAK, J. DING, H. E. HAMM, and J. W. LOMASNEY, 2001  A novel bifunctional phospholipase c that is regulated by G{alpha}12 and stimulates the Ras/mitogen-activated protein kinase pathway. J. Biol. Chem. 276:2758-2765.[Abstract/Free Full Text]

MANKIDY, R., J. HASTINGS, and J. R. THACKERAY, 2003  Distinct phospholipase C-{gamma}-dependent signaling pathways in the Drosophila eye and wing are revealed by a new small wing allele. Genetics 164:553-563.[Abstract/Free Full Text]

MOUNT, S. M., C. BURKS, G. HERTZ, G. D. STORMO, and O. WHITE et al., 1992  Splicing signals in Drosophila: intron size, information content, and consensus sequences. Nucleic Acids Res. 20:4255-4262.[Abstract/Free Full Text]

PALANGAT, M. and D. ROY, 1995  Phosphorylation of tyrosine residues of RNA polymerase II and other nuclear proteins by active chromatin tyrosine kinase(s). Biochem. Biophys. Res. Commun. 209:356-364.[Medline]

REBECCHI, M. J. and S. N. PENTYALA, 2000  Structure, function, and control of phosphoinositide-specific phospholipase C. Physiol. Rev. 80:1291-1335.[Abstract/Free Full Text]

ROGOZIN, I. B., J. LYONS-WEILER, and E. V. KOONIN, 2000  Intron sliding in conserved gene families. Trends Genet. 16:430-432.[Medline]

RUSSO, C. A., N. TAKEZAKI, and M. NEI, 1995  Molecular phylogeny and divergence times of drosophilid species. Mol. Biol. Evol. 12:391-404.[Abstract]

SAMBROOK, J., E. F. FRITSCH and T. MANIATIS, 1989 Molecular Cloning: A Laboratory Manual. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY.

SCHMITT, D. M. and D. L. BROWER, 2001  Intron dynamics and the evolution of integrin ß-subunit genes: maintenance of an ancestral gene structure in the coral, Acropora millepora. J. Mol. Evol. 53:703-710.[Medline]

SONG, C., C. D. HU, M. MASAGO, K. KARIYAI, and Y. YAMAWAKI-KATAOKA et al., 2001  Regulation of a novel human phospholipase C, PLC{epsilon}, through membrane targeting by Ras. J. Biol. Chem. 276:2752-2757.[Abstract/Free Full Text]

SUGA, H., K. KATOH, and T. MIYATA, 2001  Sponge homologs of vertebrate protein tyrosine kinases and frequent domain shufflings in the early evolution of animals before the parazoan-eumetazoan split. Gene 280:195-201.[Medline]

SWOFFORD, D. L., 2002 PAUP*. Phylogenetic Analysis Using Parsimony. Sinauer Associates, Sunderland, MA.

THACKERAY, J. R., P. C. GAINES, P. EBERT, and J. R. CARLSON, 1998  small wing encodes a phospholipase C-{gamma} that acts as a negative regulator of R7 development in Drosophila. Development 125:5033-5042.[Abstract]

THOMPSON, J. D., D. G. HIGGINS, and T. J. GIBSON, 1994  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22:4673-4680.[Abstract/Free Full Text]

THOMPSON, J. D., T. J. GIBSON, F. PLEWNIAK, F. JEANMOUGIN, and D. G. HIGGINS, 1997  The ClustalX windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 24:4876-4882.

WAHL, M. I., S. NISHIBE, J. W. KIM, H. KIM, and S. G. RHEE et al., 1990  Identification of two epidermal growth factor-sensitive tyrosine phosphorylation sites of phospholipase C-{gamma} in intact HSC-1 cells. J. Biol. Chem. 265:3944-3948.[Abstract/Free Full Text]

WANG, D., J. FENG, R. WEN, J. C. MARINE, and M. Y. SANGSTER et al., 2000  Phospholipase C{gamma}2 is essential in the functions of B cell and several Fc receptors. Immunity 13:25-35.[Medline]

YABLONSKI, D. and A. WEISS, 2001  Mechanisms of signaling by the hematopoietic-specific adaptor proteins, SLP-76 and LAT and their B cell counterpart, BLNK/SLP-65. Adv. Immunol. 79:93-128.[Medline]

YE, K., B. AGHDASI, H. R. LUO, J. L. MORIARITY, and F. Y. WU et al., 2002  Phospholipase C{gamma}1 is a physiological guanine nucleotide exchange factor for the nuclear GTPase PIKE. Nature 415:541-544.[Medline]

YEO, G. S., G. ELGAR, R. SANDFORD, and S. BRENNER, 1997  Cloning and sequencing of complement component C9 and its linkage to DOC-2 in the pufferfish Fugu rubripes. Gene 200:203-211.[Medline]




This article has been cited by other articles:


Home page
J. Neurosci.Home page
S. Banerjee, R. Joshi, G. Venkiteswaran, N. Agrawal, S. Srikanth, F. Alam, and G. Hasan
Compensation of inositol 1,4,5-trisphosphate receptor function by altering sarco-endoplasmic reticulum calcium ATPase activity in the Drosophila flight circuit.
J. Neurosci., August 9, 2006; 26(32): 8278 - 8288.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
R. Mankidy, J. Hastings, and J. R. Thackeray
Distinct Phospholipase C-{gamma}-Dependent Signaling Pathways in the Drosophila Eye and Wing Are Revealed by a New small wing Allele
Genetics, June 1, 2003; 164(2): 553 - 563.
[Abstract] [Full Text] [PDF]