- THIS ARTICLE
- Full Text
- Full Text (PDF)
-
All Versions of this Article:
genetics.107.082198v1
179/3/1409 most recent - Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- GOOGLE SCHOLAR
- Articles by Achaz, G.
- PUBMED
- PubMed Citation
- Articles by Achaz, G.
Originally published as Genetics Published Articles Ahead of Print on June 18, 2008.
Genetics, Vol. 179, 1409-1424, July 2008, Copyright © 2008
doi:10.1534/genetics.107.082198
Testing for Neutrality in Samples With Sequencing Errors
Guillaume Achaz1
Systématique, Adaptation et Evolution (UMR 7138) and Atelier de Bioinformatique, Université Pierre et Marie Curie-Paris VI, 75005 Paris, France
1 Address for correspondence: Atelier de Bioinformatique, Université Pierre et Marie Curie 4, place Jussieu, Boîte courrier 1202, 75005 Paris, France.
E-mail: achaz{at}abi.snv.jussieu.fr
Many data sets one could use for population genetics contain artifactual sites, i.e., sequencing errors. Here, we first explore the impact of such errors on several common summary statistics, assuming that sequencing errors are mostly singletons. We thus show that in the presence of those errors, estimators of
can be strongly biased. We further show that even with a moderate number of sequencing errors, neutrality tests based on the frequency spectrum reject neutrality. This implies that analyses of data sets with such errors will systematically lead to wrong inferences of evolutionary scenarios. To avoid to these errors, we propose two new estimators of
that ignore singletons as well as two new tests Y and Y* that can be used to test neutrality despite sequencing errors. All in all, we show that even though singletons are ignored, these new tests show some power to detect deviations from a standard neutral model. We therefore advise the use of these new tests to strengthen conclusions in suspicious data sets.