Prediction of biological functions of genes is an important issue in basic biology research and has applications in drug discoveries and gene therapies. Previous studies have shown either gene expression data or protein-protein interaction data alone can be used for predicting gene functions. In particular, clustering gene expression profiles has been widely used for gene function prediction. In this paper, we first propose a new method for gene function prediction using protein-protein interaction data, which will facilitate combining prediction results based on clustering gene expression profiles. We then propose a new method to combine the prediction results based on either source of data by weighting on the evidence provided by each. Using protein-protein interaction data downloaded from the GRID database, published gene expression profiles from 300 microarray experiments for the yeast S. cerevisiae, we show that this new combined analysis provides improved predictive performance over that of using either data source alone in a cross-validated analysis of the MIPS gene annotations. Finally, we propose a logistic regression method that is flexible enough to combine information from any number of data sources while maintaining computational feasibility.
|Original language||English (US)|
|Number of pages||19|
|Journal||Journal of Bioinformatics and Computational Biology|
|State||Published - Dec 2005|
Bibliographical noteFunding Information:
We thank the editor and two reviewers for many constructive comments. This work was partially supported by NIH grant R01-HL65462. GX was also supported by a Merck fellowship.
Copyright 2008 Elsevier B.V., All rights reserved.
- Cluster analysis
- Combining p-values
- Logistic regression
- Naive Bayes
- Weighted average