Incorporating gene networks into statistical tests for genomic data via a spatially correlated mixture model

Peng J. Wei; Wei Pan

doi:10.1093/bioinformatics/btm612

Incorporating gene networks into statistical tests for genomic data via a spatially correlated mixture model

Peng J. Wei, Wei Pan

Biostatistics

Research output: Contribution to journal › Article › peer-review

56 Scopus citations

Abstract

Motivation: It is a common task in genomic studies to identify a subset of the genes satisfying certain conditions, such as differentially expressed genes or regulatory target genes of a transcription factor (TF). This can be formulated as a statistical hypothesis testing problem. Most existing approaches treat the genes as having an identical and independent distribution a priori, testing each gene independently or testing some subsets of the genes one by one. On the other hand, it is known that the genes work coordinately as dictated by gene networks. Treating genes equally and independently ignores the important information contained in gene networks, leading to inefficient analysis and reduced power. Results: We propose incorporating gene network information into statistical analysis of genomic data. Specifically, rather than treating the genes equally and independently a priori in a standard mixture model, we assume that gene-specific prior probabilities are correlated as induced by a gene network: while the genes are allowed to have different prior probabilities, those neighboring ones in the network have similar prior probabilities, reflecting their shared biological functions. We applied the two approaches to a real ChIP-chip dataset (and simulated data) to identify the transcriptional target genes of TF GCN4. The new method was found to be more powerful in discovering the target genes.

Original language	English (US)
Pages (from-to)	404-411
Number of pages	8
Journal	Bioinformatics
Volume	24
Issue number	3
DOIs	https://doi.org/10.1093/bioinformatics/btm612
State	Published - Feb 2008

Bibliographical note

Funding Information:
This research was partially supported by NIH grant HL65462 and a UM AHC Faculty Research Development grant. The authors thank Stuart Levine and Rick Young for sharing the binding data. The authors thank the reviewers for helpful and constructive comments.

Access

10.1093/bioinformatics/btm612

OpenUrl availability

Full text

Cite this

@article{17541c468a58415dba68f372500c785b,

title = "Incorporating gene networks into statistical tests for genomic data via a spatially correlated mixture model",

abstract = "Motivation: It is a common task in genomic studies to identify a subset of the genes satisfying certain conditions, such as differentially expressed genes or regulatory target genes of a transcription factor (TF). This can be formulated as a statistical hypothesis testing problem. Most existing approaches treat the genes as having an identical and independent distribution a priori, testing each gene independently or testing some subsets of the genes one by one. On the other hand, it is known that the genes work coordinately as dictated by gene networks. Treating genes equally and independently ignores the important information contained in gene networks, leading to inefficient analysis and reduced power. Results: We propose incorporating gene network information into statistical analysis of genomic data. Specifically, rather than treating the genes equally and independently a priori in a standard mixture model, we assume that gene-specific prior probabilities are correlated as induced by a gene network: while the genes are allowed to have different prior probabilities, those neighboring ones in the network have similar prior probabilities, reflecting their shared biological functions. We applied the two approaches to a real ChIP-chip dataset (and simulated data) to identify the transcriptional target genes of TF GCN4. The new method was found to be more powerful in discovering the target genes.",

author = "Wei, {Peng J.} and Wei Pan",

note = "Funding Information: This research was partially supported by NIH grant HL65462 and a UM AHC Faculty Research Development grant. The authors thank Stuart Levine and Rick Young for sharing the binding data. The authors thank the reviewers for helpful and constructive comments.",

year = "2008",

month = feb,

doi = "10.1093/bioinformatics/btm612",

language = "English (US)",

volume = "24",

pages = "404--411",

journal = "Bioinformatics",

issn = "1367-4803",

publisher = "Oxford University Press",

number = "3",

}

TY - JOUR

T1 - Incorporating gene networks into statistical tests for genomic data via a spatially correlated mixture model

AU - Wei, Peng J.

AU - Pan, Wei

N1 - Funding Information: This research was partially supported by NIH grant HL65462 and a UM AHC Faculty Research Development grant. The authors thank Stuart Levine and Rick Young for sharing the binding data. The authors thank the reviewers for helpful and constructive comments.

PY - 2008/2

Y1 - 2008/2

N2 - Motivation: It is a common task in genomic studies to identify a subset of the genes satisfying certain conditions, such as differentially expressed genes or regulatory target genes of a transcription factor (TF). This can be formulated as a statistical hypothesis testing problem. Most existing approaches treat the genes as having an identical and independent distribution a priori, testing each gene independently or testing some subsets of the genes one by one. On the other hand, it is known that the genes work coordinately as dictated by gene networks. Treating genes equally and independently ignores the important information contained in gene networks, leading to inefficient analysis and reduced power. Results: We propose incorporating gene network information into statistical analysis of genomic data. Specifically, rather than treating the genes equally and independently a priori in a standard mixture model, we assume that gene-specific prior probabilities are correlated as induced by a gene network: while the genes are allowed to have different prior probabilities, those neighboring ones in the network have similar prior probabilities, reflecting their shared biological functions. We applied the two approaches to a real ChIP-chip dataset (and simulated data) to identify the transcriptional target genes of TF GCN4. The new method was found to be more powerful in discovering the target genes.

AB - Motivation: It is a common task in genomic studies to identify a subset of the genes satisfying certain conditions, such as differentially expressed genes or regulatory target genes of a transcription factor (TF). This can be formulated as a statistical hypothesis testing problem. Most existing approaches treat the genes as having an identical and independent distribution a priori, testing each gene independently or testing some subsets of the genes one by one. On the other hand, it is known that the genes work coordinately as dictated by gene networks. Treating genes equally and independently ignores the important information contained in gene networks, leading to inefficient analysis and reduced power. Results: We propose incorporating gene network information into statistical analysis of genomic data. Specifically, rather than treating the genes equally and independently a priori in a standard mixture model, we assume that gene-specific prior probabilities are correlated as induced by a gene network: while the genes are allowed to have different prior probabilities, those neighboring ones in the network have similar prior probabilities, reflecting their shared biological functions. We applied the two approaches to a real ChIP-chip dataset (and simulated data) to identify the transcriptional target genes of TF GCN4. The new method was found to be more powerful in discovering the target genes.

UR - http://www.scopus.com/inward/record.url?scp=38849163722&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=38849163722&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btm612

DO - 10.1093/bioinformatics/btm612

M3 - Article

C2 - 18083717

AN - SCOPUS:38849163722

SN - 1367-4803

VL - 24

SP - 404

EP - 411

JO - Bioinformatics

JF - Bioinformatics

IS - 3

ER -

Incorporating gene networks into statistical tests for genomic data via a spatially correlated mixture model

Abstract

Bibliographical note

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this