As various genome sequencing projects have already been completed or are near completion, genome researchers are shifting their focus to functional genomics. Functional genomics represents the next phase, that expands the biological investigation to studying the functionality of genes of a single organism as well as studying and correlating the functionality of genes across many different organisms. Recently developed methods for monitoring genome-wide mRNA expression changes hold the promise of allowing us to inexpensively gain insights into the function of unknown genes. In this paper we focus on evaluating the feasibility of using supervised machine learning methods for determining the function of genes based solely on their expression profiles. We experimentally evaluate the performance of traditional classification algorithms such as support vector machines and k-nearest neighbors on the yeast genome, and present new approaches for classification that improve the overall recall with moderate reductions in precision. Our experiments show that the accuracies achieved for different classes varies dramatically. In analyzing these results we show that the achieved accuracy is highly dependent on whether or not the genes of that class were significantly active during the various experimental conditions, suggesting that gene expression profiles can become a viable alternative to sequence similarity searches provided that the genes are observed under a wide range of experimental conditions.
|Original language||English (US)|
|Number of pages||20|
|Journal||International Journal on Artificial Intelligence Tools|
|State||Published - Aug 2005|
Bibliographical noteFunding Information:
∗This work was supported in part by NSF ACI-0133464 and ACI-0312828; the Digital Technology Center at the University of Minnesota; and by the Army High Performance Computing Research Center (AHPCRC) under the auspices of the Department of the Army, Army Research Laboratory (ARL) under Cooperative Agreement number DAAD19-01-2-0014. The content of which does not necessarily reflect the position or the policy of the government, and no official endorsement should be inferred. Access to research and computing facilities was provided by the Digital Technology Center and the Minnesota Supercomputing Institute.
- Expression profiles
- Gene classification