Detecting genomic indel variants with exact breakpoints in single- and paired-end sequencing data using splazers

Anne Katrin Emde, Marcel H. Schulz, David Weese, Ruping Sun, Martin Vingron, Vera M. Kalscheuer, Stefan A. Haas, Knut Reinert

Research output: Contribution to journalArticlepeer-review

52 Scopus citations

Abstract

Motivation: The reliable detection of genomic variation in resequencing data is still a major challenge, especially for variants larger than a few base pairs. Sequencing reads crossing boundaries of structural variation carry the potential for their identification, but are difficult to map. Results: Here we present a method for 'split' read mapping, where prefix and suffix match of a read may be interrupted by a longer gap in the read-to-reference alignment. We use this method to accurately detect medium-sized insertions and long deletions with precise breakpoints in genomic resequencing data. Compared with alternative split mapping methods, SplazerS significantly improves sensitivity for detecting large indel events, especially in variant-rich regions. Our method is robust in the presence of sequencing errors as well as alignment errors due to genomic mutations/divergence, and can be used on reads of variable lengths. Our analysis shows that SplazerS is a versatile tool applicable to unanchored or singleend as well as anchored paired-end reads. In addition, application of SplazerS to targeted resequencing data led to the interesting discovery of a complete, possibly functional gene retrocopy variant.

Original languageEnglish (US)
Article numberbts019
Pages (from-to)619-627
Number of pages9
JournalBioinformatics
Volume28
Issue number5
DOIs
StatePublished - Mar 2012
Externally publishedYes

Bibliographical note

Funding Information:
Funding: European Union’s Seventh Framework Program under grant agreement number 241995, project GENCODYS; International Max Planck Research School for Computational Biology and Scientific Computing.

Fingerprint

Dive into the research topics of 'Detecting genomic indel variants with exact breakpoints in single- and paired-end sequencing data using splazers'. Together they form a unique fingerprint.

Cite this