Modeling read counts for CNV detection in exome sequencing data

Michael I. Love; Alena Myišičková; Ruping Sun; Vera Kalscheuer; Martin Vingron; Stefan A. Haas

doi:10.2202/1544-6115.1732

Modeling read counts for CNV detection in exome sequencing data

Michael I. Love, Alena Myišičková, Ruping Sun, Vera Kalscheuer, Martin Vingron, Stefan A. Haas

Research output: Contribution to journal › Article › peer-review

53 Scopus citations

Abstract

Varying depth of high-throughput sequencing reads along a chromosome makes it possible to observe copy number variants (CNVs) in a sample relative to a reference. In exome and other targeted sequencing projects, technical factors increase variation in read depth while reducing the number of observed locations, adding difficulty to the problem of identifying CNVs. We present a hidden Markov model for detecting CNVs from raw read count data, using background read depth from a control set as well as other positional covariates such as GC-content. The model, exomeCopy, is applied to a large chromosome X exome sequencing project identifying a list of large unique CNVs. CNVs predicted by the model and experimentally validated are then recovered using a cross-platform control set from publicly available exome sequencing data. Simulations show high sensitivity for detecting heterozygous and homozygous CNVs, outperforming normalization and state-of-the-art segmentation methods.

Original language	English (US)
Article number	52
Journal	Statistical Applications in Genetics and Molecular Biology
Volume	10
Issue number	1
DOIs	https://doi.org/10.2202/1544-6115.1732
State	Published - 2011
Externally published	Yes

Bibliographical note

Funding Information:
KEYWORDS: exome sequencing, targeted sequencing, CNV, copy number variant, HMM, hidden Markov model Author Notes: We thank our collaborators on the XLID project, Prof. Dr. H.-Hilger Ropers, Wei Chen, Hao Hu, Reinhard Ullmann and the EUROMRX consortium for providing the XLID data, validation of CNVs and for helpful discussion. We also thank Ho-Ryun Chung for suggestions. Part of this work was financed by the European Union's Seventh Framework Program under grant agreement number 241995, project GENCODYS.

Keywords

CNV
HMM
copy number variant
exome sequencing
hidden Markov model
targeted sequencing

Access

10.2202/1544-6115.1732

OpenUrl availability

Full text

Cite this

@article{ce7f31b62f0845e78017dd60817cdb5c,

title = "Modeling read counts for CNV detection in exome sequencing data",

abstract = "Varying depth of high-throughput sequencing reads along a chromosome makes it possible to observe copy number variants (CNVs) in a sample relative to a reference. In exome and other targeted sequencing projects, technical factors increase variation in read depth while reducing the number of observed locations, adding difficulty to the problem of identifying CNVs. We present a hidden Markov model for detecting CNVs from raw read count data, using background read depth from a control set as well as other positional covariates such as GC-content. The model, exomeCopy, is applied to a large chromosome X exome sequencing project identifying a list of large unique CNVs. CNVs predicted by the model and experimentally validated are then recovered using a cross-platform control set from publicly available exome sequencing data. Simulations show high sensitivity for detecting heterozygous and homozygous CNVs, outperforming normalization and state-of-the-art segmentation methods.",

keywords = "CNV, HMM, copy number variant, exome sequencing, hidden Markov model, targeted sequencing",

author = "Love, {Michael I.} and Alena Myi{\v s}i{\v c}kov{\'a} and Ruping Sun and Vera Kalscheuer and Martin Vingron and Haas, {Stefan A.}",

note = "Funding Information: KEYWORDS: exome sequencing, targeted sequencing, CNV, copy number variant, HMM, hidden Markov model Author Notes: We thank our collaborators on the XLID project, Prof. Dr. H.-Hilger Ropers, Wei Chen, Hao Hu, Reinhard Ullmann and the EUROMRX consortium for providing the XLID data, validation of CNVs and for helpful discussion. We also thank Ho-Ryun Chung for suggestions. Part of this work was financed by the European Union's Seventh Framework Program under grant agreement number 241995, project GENCODYS.",

year = "2011",

doi = "10.2202/1544-6115.1732",

language = "English (US)",

volume = "10",

journal = "Statistical Applications in Genetics and Molecular Biology",

issn = "1544-6115",

publisher = "Berkeley Electronic Press",

number = "1",

}

TY - JOUR

T1 - Modeling read counts for CNV detection in exome sequencing data

AU - Love, Michael I.

AU - Myišičková, Alena

AU - Sun, Ruping

AU - Kalscheuer, Vera

AU - Vingron, Martin

AU - Haas, Stefan A.

N1 - Funding Information: KEYWORDS: exome sequencing, targeted sequencing, CNV, copy number variant, HMM, hidden Markov model Author Notes: We thank our collaborators on the XLID project, Prof. Dr. H.-Hilger Ropers, Wei Chen, Hao Hu, Reinhard Ullmann and the EUROMRX consortium for providing the XLID data, validation of CNVs and for helpful discussion. We also thank Ho-Ryun Chung for suggestions. Part of this work was financed by the European Union's Seventh Framework Program under grant agreement number 241995, project GENCODYS.

PY - 2011

Y1 - 2011

N2 - Varying depth of high-throughput sequencing reads along a chromosome makes it possible to observe copy number variants (CNVs) in a sample relative to a reference. In exome and other targeted sequencing projects, technical factors increase variation in read depth while reducing the number of observed locations, adding difficulty to the problem of identifying CNVs. We present a hidden Markov model for detecting CNVs from raw read count data, using background read depth from a control set as well as other positional covariates such as GC-content. The model, exomeCopy, is applied to a large chromosome X exome sequencing project identifying a list of large unique CNVs. CNVs predicted by the model and experimentally validated are then recovered using a cross-platform control set from publicly available exome sequencing data. Simulations show high sensitivity for detecting heterozygous and homozygous CNVs, outperforming normalization and state-of-the-art segmentation methods.

AB - Varying depth of high-throughput sequencing reads along a chromosome makes it possible to observe copy number variants (CNVs) in a sample relative to a reference. In exome and other targeted sequencing projects, technical factors increase variation in read depth while reducing the number of observed locations, adding difficulty to the problem of identifying CNVs. We present a hidden Markov model for detecting CNVs from raw read count data, using background read depth from a control set as well as other positional covariates such as GC-content. The model, exomeCopy, is applied to a large chromosome X exome sequencing project identifying a list of large unique CNVs. CNVs predicted by the model and experimentally validated are then recovered using a cross-platform control set from publicly available exome sequencing data. Simulations show high sensitivity for detecting heterozygous and homozygous CNVs, outperforming normalization and state-of-the-art segmentation methods.

KW - CNV

KW - HMM

KW - copy number variant

KW - exome sequencing

KW - hidden Markov model

KW - targeted sequencing

UR - http://www.scopus.com/inward/record.url?scp=82955184653&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=82955184653&partnerID=8YFLogxK

U2 - 10.2202/1544-6115.1732

DO - 10.2202/1544-6115.1732

M3 - Article

C2 - 23089826

AN - SCOPUS:82955184653

SN - 1544-6115

VL - 10

JO - Statistical Applications in Genetics and Molecular Biology

JF - Statistical Applications in Genetics and Molecular Biology

IS - 1

M1 - 52

ER -

Modeling read counts for CNV detection in exome sequencing data

Abstract

Bibliographical note

Keywords

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this