Modeling read counts for CNV detection in exome sequencing data

Michael I. Love, Alena Myišičková, Ruping Sun, Vera Kalscheuer, Martin Vingron, Stefan A. Haas

Research output: Contribution to journalArticlepeer-review

41 Scopus citations

Abstract

Varying depth of high-throughput sequencing reads along a chromosome makes it possible to observe copy number variants (CNVs) in a sample relative to a reference. In exome and other targeted sequencing projects, technical factors increase variation in read depth while reducing the number of observed locations, adding difficulty to the problem of identifying CNVs. We present a hidden Markov model for detecting CNVs from raw read count data, using background read depth from a control set as well as other positional covariates such as GC-content. The model, exomeCopy, is applied to a large chromosome X exome sequencing project identifying a list of large unique CNVs. CNVs predicted by the model and experimentally validated are then recovered using a cross-platform control set from publicly available exome sequencing data. Simulations show high sensitivity for detecting heterozygous and homozygous CNVs, outperforming normalization and state-of-the-art segmentation methods.

Original languageEnglish (US)
Article number52
JournalStatistical Applications in Genetics and Molecular Biology
Volume10
Issue number1
DOIs
StatePublished - 2011
Externally publishedYes

Bibliographical note

Funding Information:
KEYWORDS: exome sequencing, targeted sequencing, CNV, copy number variant, HMM, hidden Markov model Author Notes: We thank our collaborators on the XLID project, Prof. Dr. H.-Hilger Ropers, Wei Chen, Hao Hu, Reinhard Ullmann and the EUROMRX consortium for providing the XLID data, validation of CNVs and for helpful discussion. We also thank Ho-Ryun Chung for suggestions. Part of this work was financed by the European Union's Seventh Framework Program under grant agreement number 241995, project GENCODYS.

Keywords

  • CNV
  • HMM
  • copy number variant
  • exome sequencing
  • hidden Markov model
  • targeted sequencing

Fingerprint Dive into the research topics of 'Modeling read counts for CNV detection in exome sequencing data'. Together they form a unique fingerprint.

Cite this