Genetics, Vol. 164, 1519-1535, August 2003, Copyright © 2003

Diversity and Linkage of Genes in the Self-Incompatibility Gene Family in Arabidopsis lyrata

Deborah Charleswortha, Barbara K. Mable2,a, Mikkel H. Schierupb, Carolina Bartoloméa, and Philip Awadalla3,a
a Institute of Cell, Animal and Population Biology, University of Edinburgh, Ashworth Laboratories, Edinburgh EH9 3JT, United Kingdom
b Department of Ecology and Genetics, University of Aarhus, DK-8000 Aarhus C, Denmark

Corresponding author: Deborah Charlesworth, Animal and Population Biology, University of Edinburgh, Ashworth Laboratories, King's Bldgs., West Mains Rd., Edinburgh EH9 3JT, United Kingdom., deborah.charlesworth{at}ed.ac.uk (E-mail)

Communicating editor: M. K. UYENOYAMA


*  ABSTRACT
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

We report studies of seven members of the S-domain gene family in Arabidopsis lyrata, a member of the Brassicaceae that has a sporophytic self-incompatibility (SI) system. Orthologs for five loci are identifiable in the self-compatible relative A. thaliana. Like the Brassica stigmatic incompatibility protein locus (SRK), some of these genes have kinase domains. We show that several of these genes are unlinked to the putative A. lyrata SRK, Aly13. These genes have much lower nonsynonymous and synonymous polymorphism than Aly13 in the S-domains within natural populations, and differentiation between populations is higher, consistent with balancing selection at the Aly13 locus. One gene (Aly8) is linked to Aly13 and has high diversity. No departures from neutrality were detected for any of the loci. Comparing different loci within A. lyrata, sites corresponding to hypervariable regions in the Brassica S-loci (SLG and SRK) and in comparable regions of Aly13 have greater replacement site divergence than the rest of the S-domain. This suggests that the high polymorphism in these regions of incompatibility loci is due to balancing selection acting on sites within or near these regions, combined with low selective constraints.


IN Brassica, control of pollen-stigma interactions at the stigmatic interface involves highly polymorphic recognition genes of the self-incompatibility (SI) system. It is of interest to understand which regions of the proteins that these genes encode have recognition functions, how this affects the polymorphism in the coding sequence and surrounding genome regions, and how the two recognition genes maintain their coadaptation to produce functional incompatibility types. To understand the evolution of the self-incompatibility loci, it will be helpful to study them in the context of the gene families to which they belong. Doing this allows one to evaluate the possibility of exchanges between loci by gene conversion. It also makes it possible to compare sequence evolution of loci that are involved in incompatibility, and are thus under balancing selection, with similar sequences not under such selection.

The S-locus region contains loci belonging to two distinct gene families. "S-domain genes," members of the plant receptor-like protein-kinase gene family (BOYES and NASRALLAH 1993 Down; WALKER 1994 Down), are important for the stigma recognition functions, while pollen specificities are controlled by the SCR gene (S-locus cysteine rich, also called SP11), a member of the pollen coat protein PCP family (STEPHENSON et al. 1997 Down; DOUGHTY et al. 1998 Down; SCHOPFER et al. 1999 Down; TAKAYAMA et al. 2000 Down; VANOOSTHUYSE et al. 2001 Down). These different genes are linked in a region whose length differs among different haplotypes (BOYES and NASRALLAH 1993 Down; GORING and ROTHSTEIN 1996 Down; YU et al. 1996 Down; CONNER et al. 1998 Down). In the Brassica S-locus region, there are two S-domain genes, SLG (S-locus glycoprotein) and SRK (S-receptor kinase), with expression chiefly in stigma epidermal cells. The SRK gene is essential for self-incompatibility (GORING and ROTHSTEIN 1996 Down; CUI et al. 2000 Down; TAKASAKI et al. 2000 Down), while SLG, a closely linked S-domain gene without a kinase domain, is homologous to exon 1 of SRK and is nonessential for recognition, although it may have a role in the incompatibility phenotype (TAKASAKI et al. 2000 Down).

Further S-domain genes are known in Brassica and related plants (LUU et al. 2001 Down), most of them not linked to the S-locus (KAI et al. 2001 Down), although linked ones are found in some haplotypes (SUZUKI et al. 1997 Down, SUZUKI et al. 1999 Down; KUSABA et al. 2000 Down), some of them apparently pseudogenes (YU et al. 1996 Down; KAI et al. 2001 Down). Most of the functional members of this gene family are presumably involved in processes other than pollination, although in Brassica some encode secreted stigma glycoproteins (NASRALLAH and NASRALLAH 1993 Down) and some S-domain proteins are necessary for correct pollen-stigma adhesion (LUU et al. 1997 Down; TAKAYAMA et al. 2000 Down). In Arabidopsis thaliana the S-domain gene family has ~40 members (SHIU and BLEECKER 2001 Down), and the Brassica SRK sequences are most similar to the A. thaliana Ark genes (PASTUGLIA et al. 2002 Down). Other members of this gene family are not linked to the S-locus.

Studies of sequence diversity of Brassica S-locus genes have until recently concentrated on the SLG gene, but some data from the S-domains and a portion of the kinase domain of SRK of some haplotypes have been published (KUSABA et al. 1997 Down; NISHIO et al. 1997 Down). The broader view of the S-locus genes as members of a gene family has not been emphasized in Brassica, although some data are from loci other than SLG and SRK, and it is clear that both SRK and SLG genes are much more polymorphic than other S-domain genes that have been studied (DWYER et al. 1991 Down; HINATA et al. 1995 Down; KUSABA et al. 1997 Down; WATANABE et al. 1997 Down, WATANABE et al. 1998 Down; SAKAMOTO et al. 1998 Down). For loci other than SLG and SRK, sample sizes are very small. SCR also appears to be highly polymorphic (WATANABE et al. 2000 Down; KIMURA et al. 2002 Down), although no comparison of diversity with other members of this gene family has been published. The Brassica data are mainly from cultivars, not from random samples from natural populations.

Here we report results of population genetic studies of several S-domain loci in natural populations of A. lyrata. We characterize diversity at several different S-domain loci for comparison with SRK to establish whether SRK indeed has an unusually polymorphic S-domain, as expected for a gene under balancing selection. Balancing selection is not expected for S-domain genes that are not involved in SI (although they could have experienced other forms of selection, for instance, directional selection subsequent to gene duplication in the evolution of the gene family).

A. lyrata is a self-incompatible, predominantly diploid member of the Brassicaceae, but distantly related to Brassica (ROLLINS 1993 Down). Silent site divergence of putatively orthologous genes between A. lyrata or A. thaliana and Brassica ranges from ~0.2 to >1 without Jukes-Cantor correction (reviewed in WRIGHT et al. 2002 Down). We previously described several S-domain loci in A. lyrata (CHARLESWORTH et al. 2000 Down; SCHIERUP et al. 2001 Down). We refer to these as the Aly loci. Of these, Aly13 is highly polymorphic within A. lyrata, and segregation with incompatibility groups in families suggests that it is the ortholog of the Brassica SRK gene (CHARLESWORTH et al. 2000 Down; SCHIERUP et al. 2001 Down). The same gene was identified by KUSABA et al. 2001 Down, who isolated the complete sequence, including the kinase domain, from stigma mRNA of an A. lyrata plant heterozygous for two S-alleles and also showed cosegregation of sequence variants with the two progeny incompatibility groups in self-fertilized progeny; they named the gene A. lyrata SRK. Two sequences in our families closely match those of the two alleles in the plant studied by KUSABA et al. 2001 Down; Aly13-13's S-domain is almost identical to that of the SRKa allele, while Aly13-20 matches allele SRKb.

As expected for the S-locus, Aly13 sequences are exceptionally polymorphic at both synonymous and replacement sites (SCHIERUP et al. 2001 Down). S-domain diversity is even higher than in the S-domains of Brassica SRK or SLG loci (HINATA et al. 1995 Down; CHARLESWORTH and AWADALLA 1998 Down). To evaluate the variability, it is important to compare it with that of other loci. As currently few diversity data are from natural populations of plants, including A. lyrata, we have obtained new diversity data by studying other S-domain loci, which are the ideal "reference" loci for tests of whether Aly13 is more polymorphic than other loci.

Data from other Aly loci also allow us to compare levels of selective constraint in different regions of the S-domain. It is often suggested that the hypervariable (HV) regions in the extracellular S-domain are the most important for recognition (NASRALLAH and NASRALLAH 1989 Down; KIMURA et al. 2002 Down). However, this remains uncertain, and effects of amino acid differences elsewhere in the domain are also likely (NASRALLAH 1997 Down; MIEGE et al. 2001 Down). Variability in the S-locus region will be affected by a number of factors, and it is important to distinguish the possibilities clearly.

First, in regions of the protein where amino acid variants alter specificities, balancing selection will promote variation, so we expect high nonsynonymous diversity in the genomic sequences, as observed in both Brassica (e.g., NASRALLAH and NASRALLAH 1989 Down) and A. lyrata (SCHIERUP et al. 2001 Down). Second, closely linked synonymous sites will also have high diversity, because balancing selection maintains different functional alleles for long time periods (e.g., WRIGHT 1939 Down; VEKEMANS and SLATKIN 1994 Down), allowing sequence differentiation of alleles (e.g., STROBECK 1972 Down; HUGHES et al. 1990 Down; TAKAHATA 1990 Down; NORDBORG et al. 1996 Down); this prediction also fits the data on S-alleles (NASRALLAH and NASRALLAH 1989 Down; CHARLESWORTH and AWADALLA 1998 Down; SCHIERUP et al. 2001 Down). Amino acid variants may be affected in the same way, so that such variants need not all be associated with specificity differences. For the same reason, other genome regions closely linked to the S-locus may also have high polymorphism. A number of cases of additional S-domain genes located close to the Brassica S-locus are known, and it is of interest to see whether they are highly polymorphic (as expected if linkage to the S-loci is very close) or have low diversity (which would imply that recombination occurs and that diversity is high only at sites that are physically very close to sites under balancing selection). We have already mentioned the highly polymorphic SLG locus in the Brassica S-locus region, but some other linked genes have been reported to have low diversity (HINATA et al. 1995 Down). Among the Aly loci studied here, we find no A. lyrata ortholog of SLG (in agreement with KUSABA et al. 2001 Down), but we observe high polymorphism in a different linked S-domain gene (Aly8).

A third important influence on diversity is that selective constraints may differ between different regions of the protein (NASRALLAH 1997 Down). Low selective constraints may allow certain regions to have particularly high polymorphism due to linkage to the sites under balancing selection. Different selective constraints can be detected using comparisons with reference loci, and we use this approach to test whether this could contribute to the high variability of the hypervariable regions. A sample of seven putative alleles showed that the Aly13 S-domain sequences have peaks of replacement site polymorphism in the regions equivalent to the Brassica S-locus HV regions (SCHIERUP et al. 2001 Down). To test whether this is because the same regions are important for recognition functions in both species, we must exclude the possibility that these regions are low constraint regions.

A final reason for studying other loci is that the analysis of sequence data to infer selection is complicated by population subdivision. To assess the effects of demographic and historical processes that can generate patterns that may be mistaken for evidence of selection, it is necessary to have reference loci that are not under strong balancing selection, but instead are evolving more or less neutrally. For example, genetic differentiation between populations can cause haplotype structure that may be difficult to distinguish from balancing selection unless other data are available to show the true situation (e.g., CHARLESWORTH et al. 1997 Down). Tests for selection such as TAJIMA's (1989a) test are thus affected by population subdivision and will often be unable to detect selection when samples from such populations are pooled (SCHIERUP et al. 2000 Down). When there is subdivision, this test cannot distinguish between balancing selection and differentiation between populations, because Tajima's D is affected by both selection and population subdivision (which, like balancing selection, causes positive D values; TAJIMA 1989A Down, TAJIMA 1989B Down). Directional selection at a locus causes negative D values, and pooling is therefore conservative (ignoring subdivision may obscure this form of selection). Data on diversity at the Aly13 locus (the putative A. lyrata SRK) must therefore be interpreted in the light of such information. Finally, a locus experiencing balancing selection is expected to show less population subdivision than loci not experiencing such selection (SCHIERUP et al. 2000 Down).

Here we describe analyses of diversity and selection at several A. lyrata S-domain loci and assess the implications for our understanding of balancing selection and its effects on sequence diversity within S-loci and in their genomic neighborhood.


*  MATERIALS AND METHODS
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

A. lyrata plant material and DNA preparation:
Seeds were collected from four populations of A. lyrata (see details in CHARLESWORTH et al. 2000 Down). A. lyrata populations are widely distributed in northern North America, and European populations formerly classified as A. petraea are now considered to be the same species as A. lyrata, so both are referred to here as A. lyrata (following KOCH et al. 2000 Down). The populations, and the abbreviations we shall use for them, were as follows: Two samples from North America (kindly provided by R. Mauricio) were from North Carolina (NC) and Indiana Dunes, Indiana (IN), and two from Europe were from Braemar, Scotland (kindly provided by R. Ennos), and the Reykjanes Peninsula, Iceland (some of them provided by E. Thorhallsdottir). Individual plants were grown in the greenhouse from seeds from these populations. DNA was extracted from leaves using a CTAB protocol (JUNGHANS and METZLAFF 1990 Down).

Primers, amplification, cloning, and sequencing:
S-domain primers: Primers were designed on the basis of sequence alignments of Brassica SLG and SRK loci (Table 1) and used to amplify A. lyrata genomic DNA. Because SLG and SRK are members of a gene family, our initial primers were based on the most conserved regions of the Brassica S-domain and should amplify multiple A. lyrata S-domain genes, particularly those most similar to SRK. The S-domains of most Brassica oleracea and B. campestris SLG and SRK alleles have no introns (TANTIKANJANA et al. 1993 Down; HATAKEYAMA et al. 1998 Down; CABRILLAC et al. 1999 Down), so A. lyrata S-domains should be similar in length to those of Brassica. To distinguish between sequences from different loci, specific primers were designed from the A. lyrata sequences amplified. The primer sequences and amplification conditions are given in CHARLESWORTH et al. 2000 Down.


 
View this table:
In this window
In a new window

 
Table 1. Primers used

Kinase domain reverse primers: To test whether each S-domain sequence had a kinase domain downstream from the S-domain, we used specific forward primers for the loci identified in A. lyrata with reverse primers based on either Brassica SRK locus kinase domains (srk4r and srk5r) or an A. lyrata SRK kinase sequence kindly provided by J. B. Nasrallah (srknasr1, srknasr4, and srknasr3; see Table 1).

Cloning and sequencing: Because some primers amplify more than one locus, and also because of the high variability of some of the putative loci (see below), PCR products of the expected size were generally cloned before sequencing [using the Invitrogen (San Diego) TOPO TA cloning kit]. To detect sequence variants and differentiate between loci (see below), the cloned amplification products were digested with four- and six-cutter restriction enzymes and fragments were separated electrophoretically.

Sequences were obtained using standard cycle sequencing protocols for the Applied Biosystems (Foster City, CA) model 377 sequencer; with the Big Dye sequencing kit, using M13 universal primers for clones; or by direct sequencing using primers specific to the original amplified product. All Aly3, Aly9, Aly10.1, and Aly14 were sequenced directly. Sequences of the more variable loci were sequenced from cloned PCR products. In most cases, at least two clones were sequenced per individual to check for PCR errors. However, since this was not always done, some diversity values may be slightly overestimated, and an excess of singletons may have been produced; this does not affect our general conclusions. Sequence data from this article have been deposited with the EMBL/GenBank Data Libraries under POPSET accession nos. AY186752, AY186753, AY186754, AY186755, AY186756, AY186757, AY186758, AY186759, AY186760, AY186761, AY186762, AY186763, AY186764, AY186765, AY186766, AY186767, AY186768, AY186769, AY186770, AY186771, AY186772, AY186773, AY186774, AY186775, AY186776, AY186777. The full population sequence set can be obtained from the authors by request.

Sequence alignments and analyses:
Sequences were aligned using ClustalX (THOMPSON et al. 1994 Down), followed by manual adjustments based on the inferred amino acid alignments using Se-Al v. 1.0 (RAMBAUT 1996 Down). A. lyrata S-domain sequences were aligned to two representative B. oleracea SRK alleles from class I (pollen dominant: SRK9 and SRK45) and class II (pollen recessive: SRK5 and SRK15) nucleotide sequences and to several A. thaliana S-domain genes (see Table 2 for accession numbers and chromosomal locations). The nucleotide sequence of the apparent A. thaliana ortholog of Aly13 (KUSABA et al. 2001 Down) was also included (T6K22.100: accession no. AL031187). Conserved amino acid residues common to S-domain loci (KUSABA et al. 1997 Down) were used to anchor the alignments. For visual comparison of relative divergence within and between sequence types, phylogenetic trees were reconstructed with PAUP* 4.0b10 (SWOFFORD 2002 Down), using the minimum evolution function under an HKY85 model of substitution (HASEGAWA et al. 1985 Down). Heuristic searches were conducted with initial trees obtained by simple stepwise addition, followed by branch swapping using the tree bisection-reconnection routine implemented in PAUP*. Relative support for individual nodes was assessed by bootstrapping (1000 replicates) using neighbor joining (SAITOU and NEI 1987 Down). For these analyses, sequences from Aly14 and their putative ortholog At14 (see discussion below) were excluded because of the short length of the Aly14 sequences. For the other sequence types, all unique sequences were included in the analysis, except for Aly13, for which a subset of 10 alleles with known linkage relationships (SCHIERUP et al. 2001 Down) was included. The analysis was based on the region from bp 496 to 1365 of the alignment.


 
View this table:
In this window
In a new window

 
Table 2. Synonymous (above diagonal) and nonsynonymous divergence values (below diagonal) between A. lyrata S-domain Aly sequences and those of potentially orthologous A. thaliana S-domain kinases

To estimate nucleotide divergence between sequences, synonymous (Ks) and nonsynonymous substitutions per site (Ka) were calculated using the method of NEI and GOJOBORI 1986 Down with the MEGA2 program (KUMAR et al. 2000 Down). For divergence between pairs of paralogous loci within species or for divergence between species, all regions present in both sequences were included. The comparisons therefore involve slightly different lengths of sequence (see Fig 2 and Table 2).



View larger version (19K):
In this window
In a new window
Download PPT slide
 
Figure 1. Unrooted gene tree of the A. lyrata S-domain sequences, together with two sequences each of Brassica class I and II SRK alleles, chosen to represent the full SRK diversity, and putative A. thaliana orthologs. The two A. lyrata SRK sequences of KUSABA et al. 2001 Down are denoted by AlSRKa and AlSRKb (note that these sequences are the same as Aly13-13 and -20, respectively). For Aly3, Aly7, Aly8, Aly9, Aly10.1, and Aly10.2, all unique sequences found are included, whereas for Aly13, 10 representative sequences with known linkage relationships are shown (SCHIERUP et al. 2001 Down; the accession numbers are AF328990AF329000 and AY186763AY186777). Aly14 and its putative ortholog are not included. The tree is based on nucleotide distances using the HKY85 substitution model and the minimum evolution function. Bootstrap values exceeding 80% (based on 1000 neighbor-joining replicates) are indicated on the tree. Note that, except for Aly13, diversity within the sequence types is much lower than divergence between them (see also Table 5). In general, relationships within the unlinked sequence types were not resolved with any certainty but relationships between them were strongly supported. Relationships of Aly13 alleles to one another and to the other loci were not resolved, although the putatively orthologous pseudogene from A. thaliana, T6K.22 100 (indicated by {psi}), is close to some of the Aly13 alleles. Predicted relationships to putative orthologs from A. thaliana for several other sequence types (Table 2) were also strongly supported. Accession numbers of the Brassica SRK alleles are as follows: SRK5, Y18259; SRK15, Y18260; SRK45, E15795; and SRK9, D30049. Accession numbers for A. thaliana sequences are given in Table 2.



View larger version (51K):
In this window
In a new window
Download PPT slide
 
Figure 2. Structure of the different S-domain sequence types. Solid blocks indicate the portions of the S-domains sequenced in A. lyrata and the corresponding regions of the putative A. thaliana orthologs (see text). Alignment gaps relative to Brassica S-domains are indicated by open regions, and their positions and the approximate positions of the HV regions in the amino acid sequence of the S-domain are shown at the top. Indels that were polymorphic among sequences from a given locus are indicated by the "P." For sequences in which an intron has been detected 3' to the S-domain and a kinase domain is present 3' to this, these regions are indicated at the end of the diagrams; details of the kinase domains or the introns within these domains are not shown, and these regions are not drawn to scale. Cysteine residues conserved in all the loci are indicated by "C" above the diagram, and the sequences of the motif characteristic of S-domain kinases (WALKER 1994 Down) are shown. For Aly10.1 sequences, the positions of the large deletions discussed in the text are shown as vertically hatched regions for the four different types of sequence (10.1 A, B1, B2, and B3); the deletions are shown relative to the Ark1 sequence.

Nonsynonymous and synonymous diversity values ({pi}a and {pi}s) within species were estimated using a set of putative alleles of the different loci sequenced from a common small sample of individuals from the four populations. (Some individuals did not yield sequences for some loci, sometimes because the DNA sample was used up, so we were unable to obtain exactly the same samples for all loci; Table 5 shows the sample sizes.) The MEGA2 software was used for diversity estimation. Proseq v. 2.9 (FILATOV and CHARLESWORTH 1999 Down) was used to do TAJIMA's D (1989a) tests and to test for population subdivision in A. lyrata using the Kst statistic (HUDSON et al. 1992 Down). These analyses used all sequences available from the study populations for each locus; i.e., sequences were included that were excluded by the sampling scheme used to estimate within-population diversity. The critical values for Kst were obtained by 1000 random permutations of the sequences between the populations (HUDSON et al. 1992 Down).


 
View this table:
In this window
In a new window

 
Table 3. Results of linkage tests between Aly loci and the self-incompatibility locus in family 98E-15


 
View this table:
In this window
In a new window

 
Table 4. Results of linkage tests between Aly8 variants and the self-incompatibility locus in family MS00-36


 
View this table:
In this window
In a new window

 
Table 5. Nucleotide diversity estimates and summary statistics for seven S-domain loci sampled from four A. lyrata populations

Recombination in the Aly loci was tested by two types of analysis. Correlation analyses were done with the r2 program written by M. H. Schierup (http://www.brics.dk/~compbio/r2). Only segregating sites with frequencies >0.1 were included. Where only one or two sequences had gaps, the site was included (although the sequences with gaps were excluded from the analysis); otherwise, gap regions were excluded from all the sequences. Significance of the correlation coefficients of two measures of linkage disequilibrium with distance (r2 or D') was determined using 5000 random permutations of the variable sites. The second analysis used the composite likelihood finite sites extension to HUDSON's (2000) method (MCVEAN et al. 2002 Down), implemented in the LDhat program (http://www.stats.ox.ac.uk/~mcvean). All sites were included in this analysis. Maximum-likelihood estimates of the scaled recombination rate were obtained from the likelihood curve evaluated from 20 points between {rho} = 0 and {rho} = 50. Significance against {rho} = 0 was tested by 1000 random permutations. Minimum numbers of recombination events were estimated by HUDSON and KAPLAN's (1985) estimator using DNAsp v. 3.5 (ROZAS and ROZAS 1999 Down).

To test whether the parts of the S-domain sequences that are hypervariable in the Aly13 sequences also evolve unusually in the other A. lyrata S-domain genes, we compared the levels of divergence and polymorphism in different regions of Aly13 with diversity in the sequences of each putative locus. Following KUSABA et al. 1997 Down, the positions of the HV regions in the S-domains of the Brassica SLG and SRK type I alleles initially recognized by DWYER et al. 1994 Down correspond to the following positions in our alignments: HV1 corresponds to amino acid residues 215–236; HV2 to 293–331; HV3 to 354–368; and the C-terminal region to residues 446–456. McDonald-Kreitman tests (MCDONALD and KREITMAN 1991 Down) were used to test for selection acting on the A. lyrata S-domain loci using DnaSP v. 3.5.

To test whether more changes have occurred in any of the A. lyrata Aly genes, compared with their orthologs in A. thaliana genes since their divergence from outgroup sequences, relative rates were evaluated using Tajima's one-parameter test (TAJIMA 1993 Down), which does not assume any specific substitution model. As no sequence data are from orthologs of closely related species, the closest paralogous A. thaliana S-domain gene was used as an outgroup for each comparison (for Aly8/Ark3, this was Ats1; for Aly9/Ats1, this was Ark3; for Aly10.1/Ark1, this was Ark2; and for Aly10.2/Ark2, this was Ark1).

Tests for linkage between the A. lyrata self-incompatibility locus and the S-domain loci:
To test Aly S-domain loci for linkage to the self-incompatibility locus, we used full-sib families in which the incompatibility groups of progeny plants had been determined by hand-pollinations between individuals or in which Aly13 genotypes had been determined, so that it was known that one or both parent plants were Aly13 heterozygotes (see SCHIERUP et al. 2001 Down). The two parents and the progeny were also scored using PCR amplification with primers specific for any of the other Aly sequence types that sequencing showed to be heterozygous in one or the other parent. We used digestion with restriction enzymes to score variants for these putative loci; the details are described below.


*  RESULTS
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

Amplification of S-domain sequences:
Several combinations of primers designed to match conserved regions within the first exon (the S-domain) of the Brassica S-gene family (see MATERIALS AND METHODS) were used in the initial screening of A. lyrata genomic DNA for S-domain sequences. Amplifications from a single individual from Scotland yielded PCR products of the size predicted from Brassica S-gene sequences. Five different sequence types were initially identified using the six-cutter restriction enzymes EcoRI, HindIII, and BamHI, and these were sequenced (Aly7, Aly8, Aly9, Aly10, and Aly13; CHARLESWORTH et al. 2000 Down; SCHIERUP et al. 2001 Down).

BLAST searches of these sequences showed homology to Brassica and A. thaliana S-domain loci (see below), and they could all readily be aligned with Brassica S-allele sequences and with members of the S-domain gene family in A. thaliana. The portions of the S-domains sequenced (see Fig 2) in all the Aly loci identified, with the exception of Aly7 and some of the Aly10.1 alleles (see below), are open reading frames. There are no stop codons in any of the S-domains, and all indels, including the few that are polymorphic within loci (see Fig 2), are multiples of three nucleotides. No introns were found in any of the A. lyrata S-domain sequences.

Design of specific primers for S-domain sequence types and evidence that they represent different loci:
Given the very high diversity of the A. lyrata Aly13 sequences, which we ascribe to a single incompatibility locus, as explained above, it is important to investigate in detail the extent and nature of the S-domain gene family to check that the Aly13 sequences truly come from a single locus. We therefore used sequence variants to help classify the S-domain sequences into sets belonging to different loci.

Specific primer pairs were designed for individual sequences (CHARLESWORTH et al. 2000 Down; Table 1). All individuals tested by amplification of genomic DNA, representing all four populations sampled, have all of the different sequence types. These sequences therefore suggest at least five different loci. A further locus, Aly10.2, was subsequently found in amplifications using primers initially designed from Aly10.1 sequences. Amplifications using primers designed to be specific for Aly13 (13seq1F and SLGR; see SCHIERUP et al. 2001 Down) yielded another sequence type, Aly14 (accession no. AY186762; as only short sequences were obtained from a few plants, this sequence type is not included in most of the analyses below). This yields a total of seven loci, in addition to the Aly13 locus. Although there is some diversity within most of the sequence types, which we describe in detail below, the different sequences are diverged at both silent and replacement sites (Table 2) and fall into strongly supported groups in a phylogenetic analysis (Fig 1), consistent with the other evidence that they represent distinct loci. The five A. thaliana gene sequences in Fig 1 and Table 2 will be discussed later. Unlike these putative S-domain loci, no single primer amplifies all sequences of the Aly13 type (MABLE et al. 2003 Down). The Aly13 sequences are highly variable, consistent with other data suggesting that these sequences represent a single highly polymorphic A. lyrata S-locus, rather than several different loci (SCHIERUP et al. 2001 Down).

A. thaliana orthologs of the Aly genes and structure of the loci:
Comparing the Aly sequences with A. thaliana S-domain receptor kinase genes (Table 2), we can identify probable orthologs for five genes (Fig 1). For three loci (Aly3, Aly7, or Aly9), we could not identify kinase domains by PCR (see MATERIALS AND METHODS); for brevity, we refer to these as "nonkinase domain" sequences, although it is possible that a kinase domain exists but was not detected. No orthologs can be identified for two of these loci, Aly3 and Aly7. For Aly9, AtS1 is a potential ortholog (see Table 2 for accession numbers). This is the probable ortholog of SLR1 in Brassica (DWYER et al. 1994 Down) and, like Aly9, appears not to have a kinase domain.

We tentatively identify orthologs of Aly8, Aly10.1, and Aly10.2 as the three kinase domain loci Ark3, Ark1, and Ark2, respectively (Table 2). These three A. lyrata genes have quite similar S-domains, which amplify with the same forward primers (see Table 1), and kinase domains were detectable for all three. A possible ortholog of Aly14 (which we have not tested for the presence of a kinase domain) is an anonymous kinase domain sequence, which we denote by At14 (contig accession AL161566.2|ATCHRIV66, position 170101).

For the putative orthologous pairs, silent site divergence from A. thaliana for the S-domain ranges from 18 to 30%, and replacement site divergence is between 3 and 12% (see Table 2). Divergence values in the three kinase domain sequence types were similar (based on the coding sequence of exons 1–7, synonymous and nonsynonymous divergence values between Aly8 and Ark3 were 0.34 and 0.049, respectively; for Aly10.1 vs. Ark1, the values were 0.22 and 0.07, and for Aly10.2 vs. Ark2, 0.19 and 0.065). The values for both the S- and kinase domains are higher than values for most orthologous sequence comparisons between these two species (WRIGHT et al. 2002 Down), but within reasonable limits (silent divergence exceeding ~30% after Jukes-Cantor correction is unlikely for true orthologs). The high divergence for Aly8 is in part due to its diversity within A. lyrata (discussed in more detail below).

Fig 2 summarizes the structure of the S-domain sequence types identified in A. lyrata and their putative A. thaliana orthologs within the regions that were sequenced for the eight putative A. lyrata loci. The inferred amino acid sequences all share the 12 cysteine residues present in the Brassica SLG, SRK, and SLR S-domain sequences of other Brassicaceae (KUSABA et al. 1997 Down), as well as the A. thaliana Ark loci. We searched our sequences for the motif of 10 amino acids described in the S-domains of other plant receptor-like protein kinases (WALKER 1994 Down; WQSFDYPTDT in all Brassica SRK sequences). The motif is present in Aly3 and Aly9 and in several Aly13 sequences (13-2, -3, -4, -7, -8, -14, and -20). The Aly8, Aly10.1, and Aly10.2 sequences differ in the 6th amino acid of the motif (F replaces Y), and the same is true for Aly13 sequences 13-1, -5, -9, -13, -15, and -23 (Fig 2).

Aly7 sequences:
As mentioned previously, some Aly7 sequences have a single base-pair insertion (at position 751 within a region of four TA repeats; see Fig 3, "Aly7(+)" sequences). This disrupts the reading frame and creates a downstream stop codon. A 9-bp insertion is also present in this set of sequences, relative to the in-frame, Aly7(-) sequences (Fig 3). Overall, 26% out of a total of 27 Aly7 sequences classified (either by sequencing or by amplifying with primers specific to each haplotype; see Table 1) were of the 7(+) type. These sequences could represent a separate locus or could be allelic to the other Aly7 sequences. The sequences with and without the insertion form two haplotypes. There is significant linkage disequilibrium between the two types, between sites separated by 550 nucleotides, and between closer sites (Fig 3). This might suggest two distinct loci, but there are few pairs of sites for which linkage disequilibrium is complete, so the sequences appear to have recombined. Alternatively, interlocus gene conversion may have occurred, so this does not conclusively rule out the possibility of two loci. Nucleotide similarity is otherwise high between the two sequence types. Removing the insertion from the Aly7(+) sequences, the mean divergence from the other Aly7 sequences is 0.017 for synonymous sites, and slightly higher (0.019) for nonsynonymous sites, suggesting that the sequences are evolving neutrally. However, net divergence is very small, given the diversity within the Aly7(-) sequences. The sequence diversity of the Aly7(+) sequences is lower than that of Aly7(-), and only slightly lower than the divergence between the two, and again the sequences appear to be evolving neutrally (among Aly7(+) sequences, {pi}s = 0.0098 and {pi}a = 0.0107). Tajima's D is significantly negative (D = -1.65, P < 0.05) for the Aly7(+) sequences, but is not significant for the 7(-) sequences. Although relationships within loci were not well resolved (Fig 1), no evidence was found for separation of the Aly7(+) and Aly7(-) sequence types in the gene tree.



View larger version (50K):
In this window
In a new window
Download PPT slide
 
Figure 3. Sequence variants of the Aly7 locus. The numbers of the polymorphic sites are indicated at the top, and the rest of the figure shows the variants present in the sequences of the two haplotypes from the different populations. Sites in linkage disequilibrium are shaded, and regions of missing sequence are blacked out.

If the Aly7 gene is duplicated, both sequences should be detectable in all individuals, but this is not the case. Aly7(+) sequences have been found in only three of the four populations studied (two in the North Carolina population, one in the Indiana population, and three in the Scottish population). This suggests a single locus or else a duplication that is absent from the Iceland population. Consistent with the single-locus hypothesis, we find plants with both sequence types (apparent heterozygotes) as well as apparent homozygotes, and three individuals heterozygous for two different 7(-) sequences had no sign of sequences with the frame-altering 7(+) insertion. Finally, if the two haplotypes represent alleles, the haplotypes should segregate in the progeny of the apparent heterozygotes. Using primers specific for the two different haplotypes to score 11 progeny (family 99E-10) of such a plant (98E17-4), crossed with an Aly7(-) homozygote (98E17-6), 5 were apparent heterozygotes and 6 apparent homozygotes (-/-); i.e., we find the expected 1:1 ratio. We therefore conclude that the Aly7(+) sequences are probably null alleles of the same locus as the Aly7(-) sequences. In the further analyses below, the Aly7(+) sequences containing the frameshift are omitted.

Aly10.1 sequences:
Four types of Aly10.1 alleles have been found, and they are shown in Fig 2. Relative to the type "A" sequences, "B1" sequences have a 227-bp deletion beginning 99 bp from the end of the S-domain and leaving only 7 bp of intron 1 and an in-frame stop codon 5 bp before the deletion. "B2" sequences have a further deletion of 25 bp starting at bp 654, which changes the reading frame, while "B3" sequences have a 223-bp deletion starting at bp 446 (which also changes the reading frame). The A allele type, which presumably encodes a functional protein, is the commonest (77% overall, out of 44 alleles sequenced) and is present in all four populations studied, whereas B1 and B2 alleles were seen only in the U.S. populations, and B3 only in the Scottish population. Apart from a single B1/B2 plant, all individuals had at least one allele of type A. No evidence of grouping by alleles was found in the gene tree analysis (Fig 1).

Tests for linkage between the A. lyrata self-incompatibility locus and the S-domain loci:
We tested for linkage of the Aly S-domain loci and the self-incompatibility locus, using families whose parents were heterozygous for one or more Aly loci (see MATERIALS AND METHODS). Linkage between Aly13 variants and the S-locus in both sibships, and in several other families, has already been reported (CHARLESWORTH et al. 2000 Down; SCHIERUP et al. 2001 Down). Since variants that do not segregate as alleles of course define different loci, we hoped that tests for linkage would help to show whether or not sequence variants are from a single locus, in addition to what can be deduced from similarities and differences between the sequences.

For the sibship 98E-15 (see CHARLESWORTH et al. 2000 Down; SCHIERUP et al. 2001 Down), direct sequencing reactions for Aly3, Aly8, Aly10.1, Aly10.2, and Aly9 from the parents of this family showed that the parent 97F13-5 was heterozygous for a number of sites in two loci (Table 3). Two restriction enzymes, AciI and BpuAI, were used to test for segregation of Aly3 alleles. Sequence variants at Aly3 behave as allelic, and this locus frequently recombined with the S-locus. In the same family, a length polymorphism at the end of the S-domain of the Aly10.1 locus segregates as expected if the variants are allelic (Table 3) and also recombine with the S-locus. The results for another sibship, 98G-23, confirmed this conclusion for Aly10.1.

For Aly8, however, some variants showed linkage. Table 4 shows another sibship in which both parents were double heterozygotes and in which the Aly8 variants again cosegregated with Aly13 sequences. Linkage of variants was detected in several other sibships by scoring Aly13 variants known from our previous work to show linkage to the S-locus (Aly13-4, -5, -9, -13, -16, and -22 from several different natural populations). These results for Aly8 are consistent with the fact that Ark3, its A. thaliana ortholog, is linked to the putative SRK ortholog of this species, which is a pseudogene (KUSABA et al. 2001 Down). However, other very similar sequences, in addition to two different linked variants, were sometimes amplified with primers for Aly8, suggesting that Aly8 represents two separate loci. The linkage of the second locus with the S-locus is not known, and it is even possible that haplotypes vary in the numbers of Aly8 genes linked to Aly13, similarly to the variable number of linked SCR genes in A. lyrata (KUSABA et al. 2001 Down). Until physical maps of different haplotypes are available, this cannot be resolved.

To examine further the possibility of paralogous loci, we aligned all our Aly8 sequences to test whether they cluster into two sets with fixed differences between them. However, we found no evidence for any such haplotype structure in the complete sequence data set. Moreover, we could not identify variants that characterize the set of linked or the set of recombining sequences from several families. In other words, there are no sites in linkage disequilibrium that allow us to define site states characteristic of the two putative loci and that might allow us to distinguish the loci on the basis of their sequences. This is also clear in Fig 1, in which there is no evident split into two Aly8 types.

Finally, there is linkage disequlibrium between the Aly8 and the Aly13 loci. We studied a set of Aly8 sequences that cosegregate with various incompatibility alleles of independent origins (scored using restriction enzyme digestion of Aly13 PCR products to determine the Aly13 sequence types). Four Aly13 types were represented twice in the sample. There were a total of 38 nucleotide sites polymorphic among the Aly8 sequences in this sample. Between pairs of Aly8 sequences from the four pairs of haplotypes whose Aly13 sequences match, the mean proportion of these variable sites that differ was 7% (range 0–17%). Between Aly8s from haplotypes with different Aly13s, the differences were much greater (the mean proportion of difference in the 21 comparisons is 41%, and the range is 20–71%). The similarities between the most similar Aly8 sequences are underestimated, because some of the differences may be PCR errors (these sequences are from cloned PCR products, as this highly polymorphic locus cannot be directly sequenced).

Within- and between-population variability of the different Aly loci:
Within-population and total diversity: We estimated sequence variability of seven S-domain Aly loci, at least two of which are not closely linked to the S-locus (see above). Table 5 summarizes the results for each of the four populations studied, as well as for the total sample. The mean within-population synonymous site diversity values ({pi}s) are mostly <1%. {pi}a/{pi}s values, based on the within-population diversity values (Table 5), are mostly rather high for the loci studied here and for the Aly13 locus (although the very high Aly9 value is based on few variable sites). The extremely high diversity for Aly8 (species wide {pi}s = 7.5% and {pi}a > 1%) is partly, but not entirely, attributable to the fact that this sequence type could represent more than one locus, as just explained. This is discussed further below.

Between-population diversity: When all sites were used in the analysis, tests for spatial structure (HUDSON et al. 1992 Down) detected significant population subdivision in A. lyrata for several loci. The exceptions are Aly10.1 (with low diversity), the highly polymorphic Aly13 locus, and Aly8 (Table 5). Aly8 shows some evidence of population structure for silent sites (a moderate Kst value, significant at the 5% level), but this is difficult to interpret, given our inability to assign different sequences to individual loci. Overall, the data therefore indicate isolation between the geographically distant populations studied. Clearly, however, the high variability at the Aly8 locus is not merely a consequence of population subdivision. This sequence type is highly variable within all four populations studied (see above).

Recombination and linkage disequilibrium:
In an effort to test whether the putative alleles of each of the loci identified from the sequences are truly allelic, we tested for recombination in S-domain loci other than Aly13, where polymorphism levels are high and balancing selection is likely, which violates the assumptions of the analysis. Except for the Aly9 locus, which has little variability, both tests used suggest recombination (or some other form of exchange) in all the putative loci (Table 6). In the Aly8 sequences, many exchange events are detected using HUDSON and KAPLAN's (1985) estimator, even though these sequences probably come from at least two loci (see above). This suggests the possibility of gene conversion between the different loci.


 
View this table:
In this window
In a new window

 
Table 6. Tests for recombination in the S-domains of the Aly loci

Tests for selection within A. lyrata:
Tajima's tests: Tajima's D statistic (TAJIMA 1989A Down) was calculated for each putative locus to check whether the sequences appear to be evolving neutrally. Since the sample sizes are too small to perform the test within populations, we pooled the sequences from the different populations. For most loci, the Tajima's D values were negative, but did not differ significantly from zero, although a significant negative value (P < 0.01) was found for Aly10.1. Only Aly3 gave a positive Tajima's D value, but this is nonsignificant and may be attributable to population subdivision, consistent with the high Kst for this locus. There is thus no evidence from this test for balancing selection acting at any of the loci. This includes the Aly8 sequences and the highly polymorphic Aly13 locus, which is likely to be the A. lyrata self-incompatibility locus.

Patterns of evolution in the S-domain sequences: Replacement site polymorphism in the putative A. lyrata self-incompatibility locus, Aly13, is significantly higher in the regions corresponding to the Brassica SRK and SLG hypervariable regions than in the rest of the sequence (SCHIERUP et al. 2001 Down). This is not found for the further loci with S-domain sequences studied here. In these, including the Aly8 sequences, variability is not especially high in these regions.

However, the low polymorphism at most of these loci (see above) makes it difficult to detect differences in diversity among different sequence regions. We therefore also estimated nonsynonymous divergence among the A. lyrata paralogs and among the three loci with putative orthologs in A. thaliana (Aly8, -10.1, and -10.2; see Table 2). Divergence for regions corresponding to the Brassica SLG and SRK and Aly13 HV regions was compared with divergence elsewhere in the S-domain sequences (see MATERIALS AND METHODS for the positions assigned to these regions). Among A. lyrata paralogs, the regions that correspond to non-HV regions of the S-domain accumulate fewer substitutions per nonsynonymous site than do those corresponding to HV regions (the mean for HV is 60% higher than that for non-HV regions); synonymous divergence is saturated and comparisons are not informative for such sites. Of the 15 comparisons, 14 show HV nonsynonymous divergence greater than non-HV divergence (Fig 4, open and shaded bars, respectively; the difference is significant with P < 0.0005 by a paired sign test, although it must be realized that the tests are not independent). Thus either these regions are under lower selective constraint than the remainder of the sequence or directional selection has caused divergence in these regions, specifically, in the different loci. There is no such clear effect between the Aly loci and their A. thaliana orthologs (Fig 4, solid bars).



View larger version (84K):
In this window
In a new window
Download PPT slide
 
Figure 4. Comparisons of divergence in HV vs. other regions of the S-domain sequences. The figure shows nonsynonymous site divergence values between the different paralogous Aly S-domain sequences in A. lyrata, and divergence of these sequences from the putative orthologous loci in A. thaliana (shown as solid). The total lengths of sequences compared can be seen in Table 2, and the HV regions are as described in MATERIALS AND METHODS and include ~200 bp.

MCDONALD-KREITMAN (1991) tests did not detect evidence of directional selection driving divergence between paralogous loci specifically in the HV regions (although this test indicated a significant excess of nonsynonymous polymorphic sites in the non-HV region of Aly3, when compared with divergence from several of the other loci; the reason for this result is unknown, but there is no evidence for balancing selection as this locus does not have high diversity). The relative rate tests also give no indication of any overall deviation from equal rates of evolution of the orthologous pairs of genes since divergence (data not shown). Overall, we conclude that the fairly high Ka/Ks values in the S-domains and the high nonsynonymous divergence in HV regions are due largely to low selective constraints, rather than to diversifying selection.


*  DISCUSSION
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

The S-domain gene family:
The S-domain loci studied here clearly form part of an ancient gene family with members in all other angiosperms tested, including distantly related species such as maize (WALKER 1994 Down; ANSALDI et al. 2000 Down). The loci show different degrees of divergence, but most differ considerably. An exception is the two or more genes apparently required to account for the Aly8 results, which could be a recent duplication, but further work is needed to clarify the situation with respect to these sequences. Five of the eight A. lyrata loci identified have plausible orthologs in A. thaliana, which is highly self-compatible, and not surprisingly, therefore, its Aly13 ortholog is a pseudogene (KUSABA et al. 2001 Down). Several of the genes are expressed both in nonflower tissues and in flowers in A. thaliana (we have no data on the expression of these genes in A. lyrata). Ark1 is expressed in leaves, flower buds, and stigmas (TOBIAS et al. 1992 Down; DWYER et al. 1994 Down; TOBIAS and NASRALLAH 1996 Down; SUZUKI et al. 1997 Down, SUZUKI et al. 1999 Down); Ark2 is expressed specifically during maturation of cotyledons, leaves, and sepals; and, on the basis of promoter expression, Ark3 is detected specifically in roots and in the root-hypocotyl transition zone (DWYER et al. 1994 Down). The nonkinase AtS1 is expressed specifically in stigmatic papillary cells (ISOGAI et al. 1988 Down; LALONDE et al. 1989 Down).

There is no evidence for major birth and death of members of this gene family between A. lyrata and A. thaliana, since most loci can be identified in both species. However, duplication of the pollen-expressed SCR gene was found in one of the two haplotypes studied by KUSABA et al. 2001 Down, and different numbers of copies of SRK are known in Brassica haplotypes (CABRILLAC et al. 1999 Down). There may also be further members of this family that we have not studied. PCR amplifications occasionally yielded further S-domain sequences that either were very different from those described here or proved not to be closely linked to the S-locus (M. H. SCHIERUP, unpublished results). Furthermore, two of the Aly loci without detectable kinase domains (Aly3 and Aly7) have no evident A. thaliana orthologs, and, as discussed in the next section, two loci, Aly7 and Aly10.1, may be pseudogenes.

Pseudogenes:
Pseudogene S-domain genes have been found in the Brassica S-locus region (SUZUKI et al. 1999 Down), and this possibility must be considered for two of our putative loci. The Aly7 sequences with the inserted nucleotide might suggest that these sequences merely represent a pseudogene, but our segregation evidence suggests that they are allelic (presumably null alleles) at the same locus as the Aly7(-) sequences. Moreover, both types of sequence appear to be quite old, since there are numerous fixed differences between them, and both include considerable diversity and are found in most, if not all, A. lyrata populations. This would argue against this locus being a pseudogene; 7(+) appears to have more singletons than 7(-), consistent with its being a derived sequence type. The significantly negative Tajima's D for the null alleles [7(+) sequences] suggests a recent increase in frequency of this allele and is conservative, given the other evidence for population subdivision. However, there is more diversity among 7(+) alleles than would be expected if this haplotype rose to high frequency by a recent selective sweep, which would imply low or zero diversity (HUDSON et al. 1994 Down).

The Aly10.1 sequences containing deletions may also be a pseudogene. Our diversity analysis included the different types of alleles of this locus, excluding the deletion regions, and nonsynonymous diversity was low, as was the {pi}a/{pi}s ratio (see Table 5), suggesting that loss of function occurred recently. It therefore seems most likely that the B1, B2, and B3 alleles (see Fig 2) are null alleles.

Other examples of polymorphic null alleles are known, sometimes at frequencies as high as those found for the Aly7 and Aly10.1 sequences (e.g., OXTOBY et al. 1991 Down; GIBSON et al. 1992 Down; CHARMLEY et al. 1993 Down; MOMBAERTS 2001 Down). One example is the deletion of a disease resistance gene in A. thaliana, for which there is evidence for balancing selection (STAHL et al. 1999 Down), but there is no evidence for this at the Aly7 or Aly10.1 loci. Replacement polymorphism in the Aly7 sequences without the insertion is only slightly lower than that at silent sites. This high {pi}a/{pi}s ratio might suggest a locus that has lost function or is in an early stage of doing so and is evolving neutrally; but this is not certain, since similar or even higher {pi}a/{pi}s values are found for other Aly loci (see Table 5). Loss of function could also explain the higher diversity in Aly7 than in most of the other loci. We are unable to compare divergence of the nonfunctional and potentially functional Aly7 alleles, since no A. thaliana ortholog can be identified. The absence of an ortholog is, however, consistent with this gene being a nonfunctional duplicate in A. lyrata.

Levels and patterns of diversity:
The S-domain loci studied here have a range of nucleotide diversity values, including widely differing silent site diversity. Lack of evidence for balancing selection and only moderate diversity levels are also reported for the Brassica SLR1, SLA, and SLB loci, which are not linked to the incompatibility locus (although sample sizes are very small; HINATA et al. 1995 Down; SAKAMOTO et al. 1998 Down; WATANABE et al. 1998 Down; LUU et al. 2001 Down). For A. lyrata, the diversity estimate for 1.6 kb of the alcohol dehydrogenase gene (Adh) yielded a mean within-population {pi}-value for all nucleotide sites of 0.1%; the total diversity, including three different populations, was 0.38% (SAVOLAINEN et al. 2000 Down). Higher values are found at other loci, particularly in the non-U.S. subspecies petraea (WRIGHT et al. 2002 Down). All except one of our S-domain loci also have diversity values higher than those of the Adh, but there is no reason to suspect balancing selection. Ratios of nonsynonymous-to-synonymous site diversity within species or divergence between species for plant nuclear genes generally range from 0.1 to 0.2 (LI 1997 Down; LIU 1998 Down; WRIGHT et al. 2002 Down). Published values for Brassica S-domain genes are, however, much higher (HINATA et al. 1995 Down), and the same is true in our data (see Table 5; for divergence between paralogs within A. lyrata, the mean value is 0.55 for the sequence regions corresponding to the SRK HV regions and 0.24 for the rest of the S-domain sequence). This suggests that selective constraints are low in the S-domain, especially in the parts that are hypervariable in SRK, which would accord with the ability of the S-domain to generate highly diverse SRK alleles.

The difference between the reference loci studied here and the otherwise similar S-domain Aly13 sequences (the putative A. lyrata S-locus) therefore supports the view that the Aly13 silent and amino acid diversity is unusually high due to the maintenance of the polymorphism of incompatibility alleles. Moreover, variation at the Aly13 locus is similar in all populations, and Kst does not differ significantly from zero. This is as expected for loci experiencing balancing selection (SCHIERUP et al. 2000 Down) and is similar to what has been found in the fungus Schizophyllum commune, where no population structure was detectable for mating-type alleles (RAPER et al. 1958 Down) in contrast to strong structure for polymorphic allozymes (JAMES et al. 1999 Down). The same is true for Aly8, although analysis of the synonymous diversity suggests some differentiation between populations. In contrast, several of the other Aly loci are significantly differentiated among the populations studied here, indicating some degree of isolation between these geographically distant populations; the same is observed for several other non-S-domain loci (WRIGHT et al. 2002 Down). Since subdivision obscures evidence of selection when samples are pooled (SCHIERUP et al. 2000 Down), the absenc