Genetics, Vol. 155, 685-698, June 2000, Copyright © 2000

Effect of Inversion Polymorphism on the Neutral Nucleotide Variability of Linked Chromosomal Regions in Drosophila

Arcadio Navarro1,a, Antonio Barbadillaa, and Alfredo Ruiza
a Departament de Genètica i de Microbiologia, Universitat Autònoma de Barcelona, 08193 Bellaterra (Barcelona), Spain

Corresponding author: Arcadio Navarro, Institute of Cell, Animal and Population Biology, University of Edinburgh, Ashworth Lab, King's Bldg's., W. Mains Rd., Edinburgh EH9 3JT, United Kingdom., arcadi{at}holyrood.ed.ac.uk (E-mail)

Communicating editor: A. G. CLARK


*  ABSTRACT
*TOP
*ABSTRACT
*MODELS AND METHODS
*ANALYTICAL RESULTS
*SIMULATION RESULTS
*DISCUSSION
*APPENDIX
*LITERATURE CITED

Recombination is a main factor determining nucleotide variability in different regions of the genome. Chromosomal inversions, which are ubiquitous in the genus Drosophila, are known to reduce and redistribute recombination, and thus their specific effect on nucleotide variation may be of major importance as an explanatory factor for levels of DNA variation. Here, we use the coalescent approach to study this effect. First, we develop analytical expressions to predict nucleotide variability in old inversion polymorphisms that have reached mutation-drift-flux equilibrium. The effects on nucleotide variability of a new arrangement appearing in the population and reaching a stable polymorphism are then studied by computer simulation. We show that inversions modulate nucleotide variability in a complex way. The establishment of an inversion polymorphism involves a partial selective sweep that eliminates part of the variability in the population. This is followed by a slow convergence to the equilibrium values. During this convergence, regions close to the breakpoints exhibit much lower variability than central regions. However, at equilibrium, regions close to the breakpoints have higher levels of variability and differentiation between arrangements than regions in the middle of the inverted segment. The implications of these findings for overall variability levels during the evolution of Drosophila species are discussed.


CHROMOSOMAL inversion polymorphisms have been a cornerstone in the study of evolution all through the history of population genetics. Since the establishment of the modern synthesis, inversions have been a privileged system to study such diverse subjects as phylogenies, geographical clines, temporal cycles, meiotic drive, and, of course, to look for evidence of natural selection (see KRIMBAS and POWELL 1992 Down for a review). In fact, the first studies on the role of natural selection in the maintenance of genetic polymorphisms, either in nature or in experimental populations, used inversions because they could be detected by means of simple cytological techniques and their frequency changes could easily be followed through the generations (DOBZHANSKY 1970 Down; LEWONTIN 1981 Down). The onset of electrophoresis and the allozymes era was followed by an intense search for linkage disequilibria between allozyme loci associated with inversion polymorphisms and between allozyme loci and the inversions themselves, because they were thought to be generated by epistatic selection (PRAKASH and LEWONTIN 1968 Down; ZAPATA and ALVAREZ 1987 Down, ZAPATA and ALVAREZ 1992 Down, ZAPATA and ALVAREZ 1993 Down; KRIMBAS and POWELL 1992 Down). Now, in the DNA era, inversions may still be useful places to look for selection (KREITMAN and WAYNE 1994 Down; DEPAULIS et al. 1999 Down), because, by reducing recombination, inversions can act as amplifiers of the effects on nucleotide polymorphism of selective phenomena such as hitchhiking with favorable mutations (KAPLAN et al. 1989 Down; AQUADRO and BEGUN 1993 Down; AQUADRO et al. 1994 Down) or deleterious background selection (CHARLESWORTH et al. 1993 Down; CHARLESWORTH 1994 Down; HUDSON 1994 Down; HUDSON and KAPLAN 1995 Down).

Recombination affects levels of nucleotide polymorphism. In Drosophila, it accounts for one-quarter of the variance among genes in nucleotide diversity (see MORIYAMA and POWELL 1996 Down for a review) and an increasing amount of evidence of the same trend is being gathered in other organisms, such as Mus domesticus (NACHMAN 1997 Down), several species of the genus Lycopersicon (STEPHAN and LANGLEY 1998 Down), and humans (NACHMAN et al. 1998 Down). It is also well established that recombination rates are strongly influenced by inversions (STURTEVANT and BEADLE 1936 Down), the main reason being that, in the heterokaryotypes, crossing-over events within the inversion loop give rise to nonfunctional or nonviable aneuploid meiotic products, and recombination results only from multiple crossovers and gene conversion. Also, inversions are ubiquitous in Drosophila: more than three-quarters of all the species in the genus are polymorphic for paracentric inversions (SPERLICH and PFRIEM 1986 Down; KRIMBAS and POWELL 1992 Down). It is, therefore, clear that inversions can strongly influence nucleotide polymorphism levels. This influence can take several forms. First, inversions reduce and redistribute recombination in heterokaryotypes (NAVARRO and RUIZ 1997 Down; NAVARRO et al. 1997 Down). Hence, the dynamics of selective sweeps and background selection in chromosomes segregating for different arrangements may be different—and their effects conceivably larger—than in chromosomes without inversions. Second, recombination is not uniformly distributed along chromosomes (LINDSLEY and SANDLER 1977 Down; ASHBURNER 1989 Down; TRUE et al. 1996 Down) and when inversions change the position of genes, they are also changing their recombinational context in homokaryotypes. Third, because inversion polymorphisms are maintained by balancing selection (DOBZHANSKY 1970 Down), they will increase the average life expectancy of nucleotide variability linked to them (STROBECK 1983 Down; HUDSON and KAPLAN 1988 Down; KAPLAN et al. 1988 Down). Finally, the latter effect can be just the opposite with new inversions. A recently appeared inversion increasing its frequency may produce a selective sweep that can potentially eliminate variability in large segments of the chromosome. All these different factors may have powerful and contradictory effects on variability. The aims of this work are to obtain theoretical predictions concerning the amount and pattern of nucleotide variability associated with inversion polymorphisms and to shed some light on the overall effect of inversions on the level of nucleotide variability within species.

The basic tools to carry out such studies have been developed in recent years using the coalescent approach. Theoretical and simulation studies concerning DNA variability under balancing selection (HUDSON and KAPLAN 1988 Down; KAPLAN et al. 1988 Down; HEY 1991 Down; NORDBORG 1997 Down) or under subdivision and migration (SLATKIN 1987 Down; STROBECK 1987 Down; TAJIMA 1989A Down, TAJIMA 1993 Down; NOTOHARA 1990 Down; NORDBORG 1997 Down) are providing a detailed picture of the properties of the amount of DNA polymorphism in a population. Although the analogy between inversion systems and balancing selection or subdivided populations is clear, because all of them produce a structured population, the coalescent approach has never been explicitly applied to inversions. One of the causes of this vacuum may be found in the scarcity and somewhat contradictory nature of empirical information about the degree of exchange of genetic information between arrangements along the inverted chromosome (KRIMBAS and POWELL 1992 Down; NAVARRO et al. 1997 Down). Also, the lack of detailed theoretical predictions of the effect of inversions on recombination makes it difficult to obtain realistic recombination values for every position along the inverted chromosomal region. A recent theoretical study from NAVARRO et al. 1997 Down provides such results. Given the physical and genetic lengths of an inversion, theoretical predictions of recombination and gene flux caused by crossing over and gene conversion between arrangements can be obtained for every site along the chromosome for heterokaryotypes.

The work presented here deals with the effect on neutral nucleotide variability of both new and old inversion polymorphisms. That is, we exclusively consider the effect on variability of inversions themselves. We focus on the two most common measures of DNA variability: the number of segregating sites and the average number of pairwise differences in a sample of DNA sequences (WATTERSON 1975 Down; TAJIMA 1983 Down, TAJIMA 1993 Down). We study these two variability measures in a population of DNA sequences linked to a chromosome segregating for two arrangements. First, we develop analytically equations for the mutation-drift-flux equilibrium case, in which the inversion polymorphism is precisely balanced (i.e., inversion frequencies do not change from one generation to another). It is assumed that the polymorphism was established enough time ago for the DNA variability in the population to have reached mutation-drift equilibrium. Second, we explore by means of computer simulation cases in which DNA variability is not at equilibrium because the inversion polymorphism was recently established. Changing the age of the polymorphism in the simulations allows us to study the approach to the equilibrium values of DNA variability previously derived using analytical methods.


*  MODELS AND METHODS
*TOP
*ABSTRACT
*MODELS AND METHODS
*ANALYTICAL RESULTS
*SIMULATION RESULTS
*DISCUSSION
*APPENDIX
*LITERATURE CITED

We study the properties of a sample of n DNA sequences at a locus linked to a chromosome segregating for two arrangements, Standard (St) and Inversion (In), at frequencies p and q, respectively. The two arrangements differ by a single paracentric inversion and St is the oldest one. We denote by N the population size and by {Phi} the per generation probability of gene exchange between arrangements, i.e., the probability that a DNA sequence recombines with the inversion, which only happens in heterokaryotypes and that is referred to as the probability of gene flux (NAVARRO et al. 1997 Down). It is assumed that karyotype frequencies are maintained approximately at Hardy-Weinberg equilibrium. Accordingly, the probability that a DNA sequence linked to a St chromosome descends from a sequence linked to an In chromosome in the previous generation is q{Phi}; the converse probability for an In chromosome is p{Phi}. Following the infinite-sites model (KIMURA 1969 Down), we assume that the DNA sequence is so large that every new mutation takes place in a previously unmutated site. The mutants are selectively neutral and µ is the mutation rate per sequence and per generation.

In the development of our analytical results, we make use of the analogy between inversion polymorphisms and subdivided populations. Thus, although results can be obtained in several other ways (see, for example, HUDSON and KAPLAN 1988 Down; HUDSON 1990 Down; or NORDBORG 1997 Down), we follow TAJIMA 1989A Down, TAJIMA 1993 Down approach to obtain equations giving the number of segregating sites in a sample of n alleles taken at random from a population that has reached mutation-drift-flux equilibrium. For the simulation studies, we use the standard principles of the coalescent process to construct the genealogical tree of a sample and the associated time for each branch (HUDSON 1990 Down). The analogy between inversion polymorphisms and subdivided populations is also used to adapt the general coalescent process to a population segregating for two chromosomal arrangements.

We follow a method analogous to the one described in STROBECK 1987 Down to construct phylogenetic trees for samples taken from such a population. For every sample, the simulation starts by generating a maximum of four random numbers, each derived from the appropriate exponential distribution, which represent the time until one of the four possible events affecting the sample (no simultaneous events are allowed): the time until the most recent flux event (for each arrangement with one or more alleles in the sample, t{Phi}(St -> In) and t{Phi}(In -> St)) and the time until the most recent coalescence event (within each arrangement with two or more alleles in the sample, tC(St) and tC(In)). The smallest of these four times is chosen and the sample is modified by the creation of the corresponding branches and nodes. The chosen time is associated with the newly created branches and the process starts over. The simulation stops when the most recent common ancestor of the sample is reached.

To study nucleotide variability in a new arrangement appearing in the population and reaching a stable polymorphism, we use the simulation method outlined in BRAVERMAN et al. 1995 Down after adapting it to overdominant, instead of directional, selection. Let the three karyotypes St/St, St/In, and In/In have fitnesses 1 - s1, 1, and 1 - s2, respectively. We start by constructing the tree in the way described in the previous paragraph, with the frequencies of the arrangements in the population being = and = . At a given time, we make the simulation enter a selective phase as described in BRAVERMAN et al. 1995 Down. At the beginning of this phase the frequency of arrangement In is changed to q = - {epsilon}, where {epsilon} = (BRAVERMAN et al. 1995 Down), that is, we remove one of the In gametes from the population. Then, the deterministic equation for {Delta}q under overdominant selection is used to change allele frequencies every generation. Because the simulation works backward in time, by removing a gamete we allow the frequency equilibrium to be broken and q to decrease deterministically. The process continues until q <= {epsilon}, i.e., until only one In gamete is left. At this point, this In gamete is mutated to St and the selective phase is exited. The standard coalescent process starts over again in a population formed exclusively by St chromosomes.

The events of coalescence or gene flux between arrangements during the selective phase are simulated in the same way as in BRAVERMAN et al. 1995 Down. We make time change in a per generation basis. For each generation the probabilities of the four possible events are computed. Subtracting the sum of the four probabilities from one gives us the probability of no events taking place during that generation. The probabilities of zero events are multiplied every generation until the product is less than a random number drawn from a uniform distribution between zero and one. When that happens, one of the four events is chosen, taking into account the probability of that event in the current generation. Of course, the simulation is also exited if the most recent common ancestor of the sample is reached during the selective phase. The main difference between our algorithm and that used by BRAVERMAN et al. 1995 Down, which considered directional selection, is that the differential equation describing the change of allele frequencies with time under overdominant selection lacks an analytical solution (NEI 1987 Down; NAGYLAKI 1992 Down) and, therefore, we compute the transition probabilities and change q on a per generation basis.

We illustrate and discuss our results using parameter values from Drosophila because most of the evidence on inversions and on nucleotide variability comes from this genus. The mutation rate per nucleotide per generation of Drosophila melanogaster ranges between 10-8 and 10-9 (POWELL 1997 Down). In the same species, the population size has been estimated to be of the order of 106 (KREITMAN 1983 Down; POWELL 1997 Down), the average {theta} (= 4Nµ) per nucleotide being ~0.005 (HUDSON 1993 Down). To avoid the use of many decimals we will focus all through this article on alleles of 100 nucleotides and, therefore, on a {theta} value of 0.5.

Inversions affect our model by modifying gene flux rates all along the inverted segment. According to NAVARRO et al. 1997 Down and NAVARRO and RUIZ 1997 Down, the gene flux per nucleotide and per generation between arrangements will range between {Phi} = 10-2 in the center of a large inversion and {Phi} = 10-8 in regions close to the breakpoints of a short inversion. This predicted range includes most of the empirically estimated gene flux values available in the literature: 10-4 in the central region of inversion In(3L)Payne of D. melanogaster (PAYNE 1924 Down); 10-5 in the central region of inversion In(3R)P18 of D. melanogaster (CHOVNICK 1973 Down); and 10-7 near the breakpoints of O3+4/OSt heterokaryotypes in D. subobscura (ROZAS and AGUADE 1994 Down). Details on how to obtain {Phi} values for any site along the chromosome can be found in NAVARRO et al. 1997 Down.


*  ANALYTICAL RESULTS
*TOP
*ABSTRACT
*MODELS AND METHODS
*ANALYTICAL RESULTS
*SIMULATION RESULTS
*DISCUSSION
*APPENDIX
*LITERATURE CITED

We use the coalescent approach to study the variability of n DNA sequences, among which i sequences are randomly chosen from St chromosomes and j (= n - i) sequences from In chromosomes. Let Q(i, j) represent the state of the sample. In terms of the genealogical relationships of the sequences in the sample, that is, going back in the past, there are four possible adjacent states into which Q(i, j) can move in a single generation, namely, Q(i - 1, j), Q(i, j - 1), Q(i - 1, j + 1), and Q(i + 1, j - 1). The first two changes represent common ancestor events and the latter gene flux events. The probabilities of these events are (derived following HUDSON 1983 Down and TAJIMA 1989A Down):

(1a)


(1b)


(1c)


(1d)

It follows from these equations that any Q(i, j) will finally converge to Q(1, 0) or Q(0, 1) unless {Phi} = 0.

Let S(i, j) be the expected number of segregating sites in a sample in state Q(i, j) taken at random from a population at mutation-drift-flux equilibrium. Given the infinite-sites model, the number of segregating sites is the number of mutations that take place while Q(i, j) is converging to Q(1, 0) or Q(0, 1). To calculate this number we must first consider the sojourn time of the sample, i.e., the expected number of generations during which Q(i, j) does not change. The probability that Q(i, j) changes to one of the four adjacent states in a single generation is

(2)

Therefore, the sojourn time until the first change is geometrically distributed with mean 1/P(i, j) generations. While there are n alleles, nµ mutations take place every generation; hence, C(i, j), the expected number of mutations taking place while Q(i, j) remains the same, is

(3)

where F = 4N{Phi}pq and {theta} = 4Nµ.

Given that Q(i, j) changes, the conditional probabilities that it changes to each one of the four adjacent states are easily obtained from Equation 1aEquation 1bEquation 1cEquation 1d. With those probabilities and (3) we can readily obtain an iterative expression for S(i, j),

(4)

where

and

Of course, S(1, 0) = S(0, 1) = 0 by definition. Equation 4 can also be obtained from previous results on balancing selection (HUDSON and KAPLAN 1988 Down; KAPLAN et al. 1988 Down; HEY 1991 Down; NORDBORG 1997 Down).

From (4), S(i, j) can be computed for every value of i and j. For instance, when n = 2,

(5a)


(5b)


(5c)

And solving these equations, we get

(6a)


(6b)


(6c)

To ascertain the effect of inversions on nucleotide variability in the population as a whole, we must consider the expected number of segregating sites in a sample of n sequences taken at random from the entire population, S(n). Because we are assuming that chromosome arrangements are in Hardy-Weinberg equilibrium frequencies, S(n) can be easily obtained from

(7)

In addition, Equation 6aEquation 6bEquation 6c (and 7 making n = 2) gives us the average number of pairwise differences between the alleles in our sample, E(k), which is equal to the expected number of segregating sites among a sample of two alleles (HUDSON 1990 Down; TAJIMA 1993 Down). E(k) has an advantage over S: under a neutral model with no recombination, it gives a direct estimate of {theta} (TAJIMA 1993 Down); that is, it is independent of n. On the other hand, when the neutral theory is correct and there is no population subdivision, we can obtain an estimate of {theta} from S in the following way (WATTERSON 1975 Down):

(8)

Both variability measures, E(k) and , combine in Tajima's method for testing the neutral mutation hypothesis (TAJIMA 1989B Down) and are quite useful for studying the effect of inversions on DNA variability. Equations for the variance of k(n) under some specific conditions can be found in the Appendix

The results presented so far allow us to study the effect on variability of a precisely balanced inversion polymorphism that reached mutation-drift-flux equilibrium a long time ago. Table 1 gives the values of and E(k) together with its standard deviation (when formulas are available; see Appendix) under different arrangement frequencies and under different sample sizes for samples taken at random either from the entire population or from a given arrangement class. As expected, the behavior of is dependent on the sample size, mainly for low flux rates, while E(k) does not depend on n. Of course, the standard deviation of k decreases with increasing sample size, although it remains exceedingly large, as always happens with pairwise measures, mainly when no intragenic recombination is allowed (TAJIMA 1983 Down; HUDSON 1990 Down).


 
View this table:
In this window
In a new window

 
Table 1. Theoretical expectations for mutation-flux-drift equilibrium

Flux rates affect the variability in the population as a whole, which increases as flux decreases. Flux rates of 10-2 or higher make E(k) and equal to their values in a population without inversions (in our case {theta} = 0.5). On the other hand, flux rates <10-4 produce a rapid departure of DNA variability from its state in a population without inversions. Given that heterokaryotypes always have large regions with gene flux rates <10-4, for instance, regions around inversion breakpoints (NAVARRO et al. 1997 Down), it is obvious that inversions will have a strong effect on nucleotide variability.

The frequency of the chromosome arrangements in the population also has a remarkable effect on variability. The maximum values of E(k) and for the whole population are reached with intermediate inversion frequencies (Table 1). Table 1 shows further effects of inversions. We can see that, with intermediate frequencies, variability is scarcely reduced within each arrangement when compared to variability in a population without inversions (in which {theta} = 0.5). On the other hand, if frequencies are not intermediate, variability in the lowest frequency arrangement is notably reduced. This reduction is caused by the diminished population size of the gametes carrying each arrangement, which boosts the loss of genetic variability by drift. In contrast, the variability of the most frequent arrangement increases. This variability increase tends to balance the low variability in the less frequent arrangement because of gene flux acting as conservative migration, which agrees with the invariant property of structured populations noted by MAYURAMA 1971 Down and NAGYLAKI 1982 Down. Also, it is worth noting that variability in the low frequency arrangement, though reduced, is still higher than that expected in an isolated population of the same size.

Both variability augments can be explained by the same mechanism. With low flux and a lot of drift (mainly in the low frequency arrangement), the two kinds of chromosomes are highly differentiated and, therefore, almost every allele coming by recombination from the other arrangement will be absent in the recipient arrangement. These new alleles add new variability at a higher rate than mutation. This effect overpowers drift and increases with decreasing flux. It only disappears with gene flux rates <<10-8 (i.e., very close to zero and smaller than the mutation rate we assume). In that case E(k) and , both in St and In, converge to their values in a neutral population of sizes Np and Nq, respectively.

Differentiation between arrangements can be measured by means of the number of pairwise differences between an In allele and a St allele (Equation 6b). As we can see in Table 2, equilibrium pairwise differences between arrangements do not depend on inversion frequencies and standard deviations are practically unaffected by them, which agrees with previous results (NORDBORG 1997 Down).


 
View this table:
In this window
In a new window

 
Table 2. Theoretical expectations for mutation-flux-drift equilibrium


*  SIMULATION RESULTS
*TOP
*ABSTRACT
*MODELS AND METHODS
*ANALYTICAL RESULTS
*SIMULATION RESULTS
*DISCUSSION
*APPENDIX
*LITERATURE CITED

The simulation program described in MODELS AND METHODS allows us to obtain E(k) and for different values of the selection coefficients and the age of the inversion polymorphism. Using this tool, we are able to study the approach to mutation-drift-flux equilibrium for nucleotide variability in any given inversion polymorphism. Taking into account the size of the standard deviations we are dealing with (Table 1 and Table 2), all the values given in this section are based on 100,000 runs of our program.

In Table 3 and Table 4 we can see the values of E(k) and 100 generations after the stabilization of three different polymorphisms (with the frequencies of the new arrangement, In, being 0.5, 0.8, and 0.1), that is, 100 generations after the end of the selective phase that produced the stable inversion polymorphisms we are studying. During this short amount of time, no new variability has appeared in the population and gene flux has had no time to homogenize the variability between arrangements. Thus, the footprint left by the origin of the new inversion is still visible. As expected, it is quite similar to the trail generated by a selective sweep (BRAVERMAN et al. 1995 Down). However, given that we are considering overdominant selection, our sweep is partial; i.e., it stops before the inversion reaches fixation and, therefore, it does not eliminate all the genetic variability in the population. St chromosomes are left with an amount of variability proportional to their frequency in the population. Moreover, some of the nucleotide variability is rescued by gene flux, because it allows In chromosomes to receive variants that will otherwise be lost. The smaller the selection coefficients affecting the arrangements, the slower the selective phase and the larger the amount of variability maintained in the population.


 
View this table:
In this window
In a new window

 
Table 3. Simulation results


 
View this table:
In this window
In a new window

 
Table 4. Simulation results

However, the variability differences caused by selection coefficients differing by as much as an order of magnitude are not very important (compare Table 3 and Table 4). The reason for that must be sought in the approach of arrangement frequencies to equilibrium. Under overdominant selection, In frequencies increase in a sigmoidal way and, therefore, for much of the time since the appearance of the inversion its frequency is either close to zero, which makes it irrelevant, or close to the equilibrium point, = , which does not change with the magnitude of the selection coefficients. The only relevant variability differences are built up during the lineal increment phase and the amount of time spent in that phase is always short when compared to the total length of the genealogy. For example, when s1 = s2 = 0.1, 57 generations are needed for an inversion to increase its frequency from 0.05 to 0.45. When s1 = s2 = 0.01, the same increment needs 600 generations; i.e., selection coefficients make a difference of 543 generations only in a tree that can have >105 generations. Hence, from now on we use the data in Table 3 as the starting point to study the approach to mutation-drift-flux equilibrium.

The convergence to the equilibrium variability in the population as a whole is drawn in Fig 1A. During the first million generations, almost no new diversity is added to the population. Gene flux, having higher rates than mutation, plays a very important role during this phase because it homogenizes variabilities within the two arrangements. Only after the first several million generations has mutation added enough variability to reach the equilibrium. With high gene flux ({Phi} = 10-2) the equilibrium point is independent of the frequency of inversions. On the other hand, lower gene flux ({Phi} = 10-6) makes the equilibrium variabilities higher for intermediate arrangement frequencies. Note that the equilibrium points obtained by simulation are equivalent to those obtained analytically in the previous section.






View larger version (52K):
In this window
In a new window
Download PPT slide
 
Figure 1. Simulation results. Approach of nucleotide variability to mutation-drift-flux equilibrium. Abscissa: decimal logarithm of the number of generations since the stabilization of the polymorphism. Ordinate: (a) E(k) for the entire population; (b) E(k) for the In chromosomes; (c) E(k) for the St chromosomes; (d) E(k) for two alleles taken at random, one from the pool of St chromosomes and the other from the pool of In chromosomes. {Phi} = 10-2 stands for gene flux in the center of an average inversion and {Phi} = 10-6 for gene flux around the breakpoints. N = 106 and µ = 1.25 x 10-7, so the equilibrium E(k) for a population without inversions is 0.5 (dotted line).

Fig 1B shows the changes in E(k) between two In alleles during the convergence to equilibrium. The convergence process within an average inversion is plotted in Fig 2A. As we can see in these figures, gene flux plays a key role in determining both the amount of variability that is lost during the origin of the inversion polymorphism and the speed at which this lost variability is recovered. With low gene flux, nucleotide variability within the newly appeared arrangement is zero, or very close to zero, during the first 105–106 generations. Convergence to mutation-drift-flux equilibrium is slow because of the scarce amount of variability incoming from St chromosomes. On the other hand, with high rates of gene flux the variability within In chromosomes is very close to the variability left in St chromosomes after the partial sweep and the convergence to equilibrium is faster. Moreover, during the first million generations, gene flux is the main cause of the increase of variability within inversion chromosomes because it adds new variability (imported from standard chromosomes) at higher rates than mutation.





View larger version (44K):
In this window
In a new window
Download PPT slide
 
Figure 2. Simulation results. E(k) for different positions along the inverted chromosome. Arrangement frequencies reached equilibrium an infinite time ago (solid lines), 1 million generations ago (long dashed lines), or 1 hundred generations ago (short dashed lines). Note how the mutation-drift-flux equilibrium is built up. (a) E(k) for two random In alleles. (b) E(k) for two random St alleles. (c) E(k) for two alleles taken at random from the entire population. N = 106, µ = 1.25 x 10-7 ({theta} = 0.5). We consider an average inversion (30 cM long laying at 10 cM from the centromere) at frequency q = 0.5; gene flux values for 21 evenly spaced sites along the inverted segment are obtained according to NAVARRO et al. 1997 Down.

The process that is meanwhile taking place within St chromosomes is represented in Fig 1C and Fig 2B. In this case, of course, gene flux has little influence on the initial variability. It does, however, affect the way in which variability changes, as well as the equilibrium points. We can see that with low flux rates, during the first 105–106 generations, variability within St chromosomes decreases. This can be explained by a sink-source mechanism: the relatively great allele diversity stored in St chromosomes is transferred by flux to In chromosomes, where low flux rates forced an initial elimination of variability. This process lasts until the homogenization of the two arrangements; hence, variability decreases within St chromosomes while increasing within In chromosomes. In chromosomes can undergo a similar temporary variability decrease if flux rates are high and the new inversion reaches a high frequency (Fig 1B).

In relation to the time dynamics of the pairwise differences between arrangements, Fig 1D and Fig 2C show that, as proved in ANALYTICAL RESULTS, the equilibrium values of E(k) are dependent only on gene flux. On the other hand, during the first 105–106 generations of polymorphism, the pairwise differences between In and St chromosomes are dependent only on arrangement frequencies.


*  DISCUSSION
*TOP
*ABSTRACT
*MODELS AND METHODS
*ANALYTICAL RESULTS
*SIMULATION RESULTS
*DISCUSSION
*APPENDIX
*LITERATURE CITED

On its way to the establishment of a balanced polymorphism, a newly arisen inversion sweeps a lot of variability from the population. Just after the stabilization of the arrangement frequencies, the chromosomes bearing the newly appeared arrangement will have almost no variability (Table 3 and Table 4). As the polymorphism grows older, a slow convergence to mutation-drift-flux equilibrium starts. At this equilibrium, the level of DNA polymorphism in the population as a whole can be higher than in a population of the same size without segregating arrangements (Table 1 and Fig 1). Which of these two effects of an inversion, to reduce or to increment variability, will prevail depends on the time that it takes to reach equilibrium. The convergence to the equilibrium values proceeds at very different speeds, strongly depending on gene flux rates, and, thus, it proceeds differently in different regions of the inverted segment.

In regions close to the breakpoints, flux rates are very low (Fig 2) and, therefore (1) the strength of the partial sweep is greater, and almost no variability is left within the new arrangement; and (2) the linkage disequilibria generated by the frequency increment of the new inversion will persist for a long time (NAVARRO et al. 1996 Down). Around the breakpoints, only new mutations supply variability to the new arrangement and it is unlikely that new mutants will be exchanged between arrangements. Therefore, in regions close to the breakpoints of In chromosomes, variability will be very low until enough new mutants are added, which will take >106 generations (Fig 1B and Fig 2A). Around the break-points of St chromosomes, variability is initially greater than that of In chromosomes, but decreases afterward over several generations (Fig 1C) because gene flux tends to homogenize the two arrangements and no variability was left in the In chromosomes. Although nucleotide variability in relation to inversions has been studied by several authors (AQUADRO et al. 1986 Down; AGUADE 1988 Down; BENASSI et al. 1993 Down; ROZAS and AGUADE 1993 Down, ROZAS and AGUADE 1994 Down; WESLEY and EANES 1994 Down; POPADIC and ANDERSON 1995 Down; POPADIC et al. 1995 Down; ANDOLFATTO et al. 1999 Down; CACERES et al. 1999 Down; DEPAULIS et al. 1999 Down; ROZAS et al. 1999 Down), the only available study in which nucleotide variability was surveyed simultaneously at different positions of a chromosome segregating for two arrangements separated by a single inversion has been carried out by HASSON and EANES 1996 Down. Our theoretical expectations are consistent with their findings. First, the breakpoints of inversion In(3L)Payne of D. melanogaster host 20 times less polymorphism ({pi} = 0.0003) than the breakpoints of St chromosomes ({pi} = 0.0060). Furthermore, the Hsp83 gene locus, which is close to, but not exactly at, the distal breakpoint, presents higher levels of variability ({pi} = 0.0067) and lower levels of differentiation between arrangements (Nei's d = 0.0053) than the breakpoint itself ({pi} = 0.0058, d = 0.0068), although the differences are not statistically significant (HASSON and EANES 1996 Down).

In the population as a whole, nucleotide variability around breakpoints is low during the first 105–106 generations. The higher the frequency of In chromosomes, the lower the levels of DNA polymorphism (Fig 1A). At equilibrium, on the other hand, low gene flux rates induce substantial differentiation between St and In chromosomes, which causes an increment in the level of variability of the whole population (Table 1, Fig 1D and Fig 2C). This enhancement of polymorphism levels is due to the extension of the average lifetime of mutants caused by balancing selection (HUDSON and KAPLAN 1988 Down; KAPLAN et al. 1988 Down) and it is greater with lower gene flux and intermediate arrangement frequencies.

Gene flux rates are higher in the center of the inverted regions and hence (1) gene flux preserves some of the starting variability from the initial sweep by sheltering it in the new arrangement; and (2) the differentiation between arrangements will decrease at a steady rate, as new mutations are exchanged and some of the variability stored in St chromosomes enters the inversion by gene flux. Inverted chromosomes, therefore, have a higher starting level of polymorphism with higher flux rates (Fig 1B). The smaller the frequency of inversions and the greater the flux, the higher the starting polymorphism level. On the contrary, neither the initial variability within St chromosomes (Fig 1C) nor the initial differentiation between St and In (Fig 1D) is affected by gene flux rates. The main differences between the central region and the regions around the breakpoints arise from the buildup of the equilibrium in the central region of inversions and on the equilibrium state itself. When equilibrium has been achieved, higher flux rates make the amount of variability in the central zone smaller than that of the regions around breakpoints (Fig 2C). On the other hand, higher flux rates during the convergence to equilibrium allow for a rapid increase in polymorphism levels and a decrease in differences between arrangements, which starts at about generation 105 (Fig 1). Again this result is consistent with the finding by HASSON and EANES 1996 Down of higher levels of nucleotide variability in the Est-6 gene ({pi} = 0.0192), which lies approximately in the middle of inversion In(3L)Payne, than at the breakpoints of the inversion ({pi} = 0.0058). Also, the silent polymorphism levels for Est-6 were roughly similar in both arrangements ({pi} = 0.0162 in St and {pi} = 0.0200 in In).

In this analysis we have focused on neutral variability without considering any explicit source for the overdominance of the inversion. The selective maintenance of inversion polymorphisms has been the subject of abundant theoretical work. Some models consider associations of the inversion with either a single gene or a group of genes with additive relationships (e.g., NEI et al. 1967 Down; OHTA and KOJIMA 1968 Down). Under those models, the establishment of an inversion polymorphism would be a rare event, because, eventually, the linkage disequilibrium between the selected loci and the inversion would break down, rendering the polymorphism unstable and allowing the inversion to drift away until fixation or loss. The most widely accepted models for the maintenance of inversion polymorphisms consider their association with a complex of genes where epistatic selection maintains gametic disequilibrium. Essentially, a new inversion can reach a stable polymorphism only if it occurs in chromosomes carrying an excess gametic type (CHARLESWORTH and CHARLESWORTH 1973 Down; CHARLESWORTH 1974 Down; see KRIMBAS and POWELL 1992 Down for a review). Under these conditions, a stable polymorphism can be reached because certain recombination events within heterokaryotypes generate unfavored gametic types that would be eliminated by selection. This would have the additional effect of reducing recombination even further and, thus, increasing differentiation between arrangements.

Inversions are the most common form of chromosomal change in the evolutionary history of Drosophila. More than 28,000 paracentric inversions are estimated to be currently segregating in natural populations of Drosophila and >42,000 paracentric inversions have become fixed during the evolution of the genus (SORSA 1988 Down). To what extent has all this continuous chromosomal reorganization affected DNA variability? RANZ et al. 1997 Down studied the divergence of chromosomal element E between D. melanogaster and D. repleta. Although evolutionary rates may vary between elements and lineages, a rate of fixation of inversions of approximately one inversion per million years was estimated for the E element (RANZ et al. 1997 Down, RANZ et al. 1999 Down). Because most inversions will never reach fixation and will be lost after segregating for some time, we can consider a million years as an overestimate of the average lifespan of an inversion. This time is equivalent to 5 x 106–107 generations if we take 5–10 generations per year as an average for the Drosophila genus (ASHBURNER 1989 Down; POWELL 1997 Down). An examination of Fig 1 yields the conclusion that at least 107 generations are needed to reach mutation-drift-flux equilibrium. It follows that it is very unlikely to achieve equilibrium within the inverted segment. If gene flux rates are <=10-4, inversions can increase variability levels when mutation-drift-flux equilibrium is reached (Table 1). However, at least 107 generations are needed to achieve equilibrium, and most of the time variability is lower in low gene flux regions (Fig 1 and Fig 2). This fact may imply that, other things being equal, chromosomes and/or species having high levels of inversion polymorphism will have lower levels of DNA polymorphism. It has been pointed out by AKASHI 1996 Down, BEGUN 1996 Down, and MORIYAMA and POWELL 1996 Down that the autosomes of D. melanogaster, which are moderately polymorphic for inversions, have lower variability than those of D. simulans, which have no known polymorphic inversions. Stronger hitchhiking and background selection in D. melanogaster have been proposed as possible causes of this striking correlation (BEGUN 1996 Down). Our results show that the presence of inversions in the evolutionary history of D. melanogaster may help to explain its lower levels of nucleotide polymorphism without appealing to other selective forces. Although some caution must be raised because D. melanogaster levels of variability may have been underestimated (BEGUN 1996 Down; LABATE et al. 1999 Down), the extant data fit the expected pattern. Of course, had the inversions been maintained for a longer time, or had the mutation rates been higher, inversions would produce the opposite effect and increase variability. On the other hand, factors, like fluctuating selection, that cause oscillations in the frequencies of the arrangements will boost the loss of neutral variability. Further research concerning all these possibilities is currently under way.


*  FOOTNOTES

1 Present address: Institute of Cell, Animal and Population Biology, University of Edinburgh, Edinburgh EH9 3JT, United Kingdom. Back


*  ACKNOWLEDGMENTS

We thank P. Andolfatto, N. Barton, A. Berry, E. Betrán, B. Charlesworth, A. Clark, F. Depaulis, J. Rozas, and two anonymous reviewers for valuable discussion and criticism. Work was supported by a Formació del Personal Investigador (FPI) fellowship from the DGU (Generalitat de Catalunya, Spain) to A.N. and grant PB95-0607 from the DGICYT (Ministerio de Educación y Ciencia, Spain) to A.R.

Manuscript received August 15, 1999; Accepted for publication February 14, 2000.


*  APPENDIX
*TOP
*ABSTRACT
*MODELS AND METHODS
*ANALYTICAL RESULTS
*SIMULATION RESULTS
*DISCUSSION
*APPENDIX
*LITERATURE CITED

Variances for k(n), the number of pairwise differences in a sample of n alleles, can be easily found for n = 2 by developing an expression for pairwise identities and using it as the moment generating function of the distribution of coalescence times (HUDSON 1990 Down). Doing so, we obtain the rather bulky expressions

(A1a)

(A1b)

(A1c)

These equations can be simplified if we assume equal arrangement frequencies (p = q):

(A2a)


(A2b)

If n increases, the mean number of pairwise differences remains the same but, of course, the variance decreases. However, as variances decrease the expressions giving them increase in size and in number (for example, if n = 10 one has to obtain 11 different enormous expressions).

Variances for the simplest case (p = q) can be obtained with some pain following WAKELEY 1996 Down. We end up with two expressions equivalent to those in WAKELEY 1996 Down. The first one gives the variance of k(n) when all the n alleles are linked to the same arrangement:

(A3)

The second one gives the variance of k(n) when i alleles are linked to St and j alleles to In:

(A4)


*  LITERATURE CITED
*TOP
*ABSTRACT
*MODELS AND METHODS
*ANALYTICAL RESULTS
*SIMULATION RESULTS
*DISCUSSION
*APPENDIX
*LITERATURE CITED

AGUADÉ, M., 1988  Restriction map variation at the Adh locus of Drosophila melanogaster in inverted an noninverted chromosomes. Genetics 119:135-140[Abstract/Free Full Text].

AKASHI, H., 1996  Molecular evolution between Drosophila melanogaster and D. simulans: reduced codon bias, faster rates of amino acid substitution, and larger proteins in D. melanogaster.. Genetics 144:1297-1307[Abstract].

ANDOLFATTO, P., J. D. WALL, and M. KREITMAN, 1999  Unusual haplotype structure at the proximal breakpoint of In(2L)t in a natural population of Drosophila melanogaster.. Genetics 153:1297-1311[Abstract/Free Full Text].

AQUADRO, C. F., and D. J. BEGUN, 1993 Evidence for an implication of genetic hitchhiking in the Drosophila genome, pp. 159–178 in Mechanisms of Molecular Evolution, edited by N. TAKAHATA and A. G. CLARK. Sinauer Associates, Sunderland, MA.

AQUADRO, C. F., S. F. DESSE, M. M. BLAND, C. H. LANGLEY, and C. C. LAURIE-AHLBERG, 1986  Molecular population genetics of the alcohol dehydrogenase gene region of Drosophila melanogaster.. Genetics 114:1165-1190[Abstract/Free Full Text].

AQUADRO, C. F., D. J. BEGUN and E. C. KINDAHL, 1994 Selection, recombination and DNA polymorphism in Drosophila, pp. 46–56 in Non-Neutral Evolution, edited by B. GOLDING. Chapman & Hall, New York.

ASHBURNER, M., 1989 Drosophila: A Laboratory Handbook. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY.

BEGUN, D. J., 1996  Population genetics of silent and replacement variation in Drosophila simulans and D. melanogaster: X/autosomes differences? Mol. Biol. Evol. 13:1405-1407[Medline].

NASSI, V., S. AULARD, S. MAZEAU, and M. VEUILLE, 1993  Molecular variation of Adh and P6 genes in an African population of Drosophila melanogaster and its relation to chromosomal inversions. Genetics 134:789-799[Abstract].

BRAVERMAN, J. M., R. R. HUDSON, N. L. KAPLAN C, H. LANGLEY, and W. STEPHAN, 1995  The hitchhiking effect on the site frequency spectrum of DNA polymorphisms. Genetics 140:783-796[Abstract].

CERES, M., J. M. RANZ, A. BARBADILLA, M. LONG, and A. RUIZ, 1999  Generation of a widespread Drosophila inversion by a transposable element. Science 285:415-418[Abstract/Free Full Text].

CHARLESWORTH, B., 1974  Inversion polymorphism in a two-locus genetic system. Genet. Res. 23:259-280[Medline].

CHARLESWORTH, B., 1994  The effect of background selection against deleterious mutations on weakly selected, linked variants. Genet. Res. 63:213-227[Medline].

CHARLESWORTH, B. and D. CHARLESWORTH, 1973  Selection of new inversions in multilocus genetic systems. Genet. Res. 21:167-183.

CHARLESWORTH, B., M. T. MORGAN, and D. CHARLESWORTH, 1993  The effect of deleterious mutations on neutral molecular variation. Genetics 134:1289-1303[Abstract].

CHOVNICK, A., 1973  Gene conversion and transfer of genetic information within the inverted region of inversion heterozygotes. Genetics 75:123-131[Abstract/Free Full Text].

DEPAULIS, F., L. BRAZIER, and M. VEUILLE, 1999  Selective sweep at the Drosophila melanogaster Supressor of Hairless locus and its association with the In(2L)t inversion polymorphism. Genetics 152:1017-1024[Abstract/Free Full Text].

DOBZHANSKY, TH., 1970 Genetics of the Evolutionary Process. Columbia University Press, New York.

HASSON, E. and W. EANES, 1996  Contrasting histories of three gene regions associated with In(3L)Payne of Drosophila melanogaster.. Genetics 144:1565-1575[Abstract].

HEY, J., 1991  A multi-dimensional coalescent process applied to multiallelic selection models and migration models. Theor. Popul. Biol. 39:30-48[Medline].

HUDSON, R. R., 1983  Properties of a neutral alleles model with intragenic recombination. Theor. Popul. Biol. 23:183-201[Medline].

HUDSON, R. R., 1990  Gene genealogies and the coalescent process. Oxf. Surv. Evol. Biol. 7:1-44.

HUDSON, R. R., 1993 The how and why of generating gene genealogies, pp. 23–36 in Mechanisms of Molecular Evolution, edited by N. TAKAHATA and A. G. CLARK. Sinauer Associates, Sunderland, MA.

HUDSON, R. R., 1994  How can the low levels of Drosophila sequence variation in regions of the genome with low levels of recombination be explained. Proc. Natl. Acad. Sci. USA 91:6815-6818[Abstract/Free Full Text].

HUDSON, R. R. and N. L. KAPLAN, 1988  The coalescent process in models with selection and recombination. Genetics 120:831-840[Abstract/Free Full Text].

HUDSON, R. R. and N. L. KAPLAN, 1995  Deleterious background selection with recombination. Genetics 141:1605-1617[Abstract].

KAPLAN, N. L., T. DARDEN, and R. R. HUDSON, 1988  The coalescent process in models with selection. Genetics 120:841-848[Abstract/Free Full Text].

KAPLAN, N. L., R. R. HUDSON, and C. H. LANGLEY, 1989  The "hitchhiking effect" revisited. Genetics 123:887-899[Abstract/Free Full Text].

KIMURA, M., 1969  The rate of molecular evolution considered from the standpoint of population genetics. Proc. Natl. Acad. Sci. USA 63:1181-1188[Abstract/Free Full Text].

KREITMAN, M., 1983  Nucleotide polymorphism at the alcohol dehydrogenase locus of Drosophila melanogaster.. Nature 304:412-417[Medline].

KREITMAN, M., and M. L. WAYNE, 1994 Organization of genetic variation at the molecular level: lessons from Drosophila, pp. 157–183 in Molecular Ecology and Evolution: Approaches and Implications, edited by D. SCHIERWATER, B. STREIT, G. P. WAGNER and R. DE SALLE. Birkhäuser, Basel.

KRIMBAS, C. B., and J. R. POWELL, 1992 Drosophila Inversion Polymorphism. CRC Press, Boca Raton, FL.

LABATE, J. A., C. H. BIERMANN, and W. F. EANES, 1999  Nucleotide variation at the runt locus in Drosophila melanogaster and Drosophila simulans.. Mol. Biol. Evol. 16:724-731[Abstract].

LEWONTIN, R. C., 1981 The scientific work of Th. Dobzhansky, pp. 93–115 Dobzhansky's Genetics of Natural Populations, edited by R. C. LEWONTIN, J. A. MOORE, W. B. PROVINE and B. W