Abstract
Building reliable predictive models from multiple complementary genomic data for cancer study is a crucial step towards successful cancer treatment and a full understanding of the underlying biological principles. To tackle this challenging data integration problem, we propose a hypergraph-based learning algorithm called HyperGene to integrate microarray gene expressions and protein-protein interactions for cancer outcome prediction and biomarker identification. HyperGene is a robust two-step iterative method that alternatively finds the optimal outcome prediction and the optimal weighting of the marker genes guided by a protein-protein interaction network. Under the hypothesis that cancer-related genes tend to interact with each other, the HyperGene algorithm uses a protein-protein interaction network as prior knowledge by imposing a consistent weighting of interacting genes. Our experimental results on two large-scale breast cancer gene expression datasets show that HyperGene utilizing a curated roteinprotein interaction network achieves significantly improved cancer outcome prediction. Moreover, HyperGene can also retrieve many known cancer genes as highly weighted marker genes.
Original language | English (US) |
---|---|
Title of host publication | Proceedings - 8th IEEE International Conference on Data Mining, ICDM 2008 |
Pages | 293-302 |
Number of pages | 10 |
DOIs | |
State | Published - Dec 1 2008 |
Event | 8th IEEE International Conference on Data Mining, ICDM 2008 - Pisa, Italy Duration: Dec 15 2008 → Dec 19 2008 |
Other
Other | 8th IEEE International Conference on Data Mining, ICDM 2008 |
---|---|
Country/Territory | Italy |
City | Pisa |
Period | 12/15/08 → 12/19/08 |