Variant calling, in particular, calling SNPs (Single Nucleotide Polymorphisms) is a fundamental task in genomics. While existing packages offer excellent performance on calling SNPs which have uniquely mapped reads, they suffer in loci where the reads are multiply mapped, and are unable to make any reliable calls. Variants in multiply mapped loci can arise, for example in long segmental duplications, and can play important role in evolution and disease. In this paper, we develop a new SNP caller named abSNP, which offers three innovations. (a) abSNP calls SNPs from RNA-Seq data. Since RNA-Seq data is primarily sampled from gene regions, this method is inexpensive. (b) abSNP is able to successfully make calls on repetitive gene regions by exploiting the quality scores of multiply mapped reads carefully in order to make variant calls. (c) abSNP exploits a specific feature of RNA-Seq data, namely the varying abundance of different genes, in order to identify which repetitive copy a particular read is sampled from. We demonstrate that the proposed method offers significant performance gains on repetitive regions in simulated data. In particular, the algorithm is able to achieve near-perfect sensitivity on high-coverage SNPs, even when multiply mapped.
|Original language||English (US)|
|Title of host publication||17th International Workshop on Algorithms in Bioinformatics, WABI 2017|
|Editors||Knut Reinert, Russell Schwartz|
|Publisher||Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing|
|State||Published - Aug 1 2017|
|Event||17th International Workshop on Algorithms in Bioinformatics, WABI 2017 - Boston, United States|
Duration: Aug 21 2017 → Aug 23 2017
|Name||Leibniz International Proceedings in Informatics, LIPIcs|
|Other||17th International Workshop on Algorithms in Bioinformatics, WABI 2017|
|Period||8/21/17 → 8/23/17|
Bibliographical noteFunding Information:
∗ This work of SK and SM were supported, in part, by U.S. National Institute of Health grant 5R01HG008164-02 (SK and SM) and U.S. National Science Foundation CAREER grant 1651236 (SK). The work of DNT was supported in part by the Center for the Science of Information and in part by the NIH grant R01HG008164.
© Shunfu Mao, Soheil Mohajer, Kannan Ramachandran, David Tse, and Sreeram Kannan.
- Abundance Estimation
- Multiply Mapped Reads
- Repetitive Region
- SNP Calling