Genetics. Published Articles Ahead of Print: June 18, 2008, Copyright © 2008
doi:10.1534/genetics.107.082198


A more recent version of this article appeared on July 1, 2008.


REGULAR RESEARCH PAPERS

testing for neutrality in samples with sequencing errors

1 Université Pierre et Marie Curie

* To whom correspondence should be addressed. E-mail: achaz{at}abi.snv.jussieu.fr.

Submitted on September 21, 2007
Revised on January 11, 2008
Accepted on 18 April 2008


Abstract

Many datasets one could use for population genetics contain artifactual sites, i.e. sequencing errors. Here, we first explore the impact of such errors on several common summary statistics, assuming that sequencing errors are mostly singletons. We thus show that in the presence of those errors, estimators of {theta} can be strongly biased. We further show that even with a moderate number of sequencing errors, neutrality tests based on the frequency spectrum reject neutrality. This implies that analyses of datasets with such errors will systematically lead to wrong inferences of evolutionary scenarios. To avoid to these errors, we propose two new estimators of {theta} that ignore singletons as well as two new tests Y and Y* that can be used to test neutrality despite sequencing errors. All in all, we show that even though singletons are ignored, these new tests show some power to detect deviations from a standard neutral model. We therefore advise the use of these new tests to strengthen conclusions in suspicious datasets.

Key Words: coalescent theory, neutrality tests, sequencing errors