Consensus clustering of gene expression data and its application to gene function prediction

Guanghua Xiao; Wei Pan

doi:10.1198/106186007X237838

Consensus clustering of gene expression data and its application to gene function prediction

Guanghua Xiao, Wei Pan

Biostatistics

Research output: Contribution to journal › Article › peer-review

5 Scopus citations

Abstract

Predicting functions of genes is an important issue in biology. Clustering gene expression profiles has been widely used for gene function prediction, but most clustering methods are unstable and sensitive to input parameters such as starting values and number of clusters. In this article, we develop a novel consensus clustering method to address the instability issue and thus improve the performance of clustering methods. The biological function of an unannotated gene is predicted based on the most enriched functional category in its consensus cluster. The MIPS gene annotations are used to evaluate the predictive performance. It is shown that the consensus clustering-based classification method has a significantly better predictive performance than a previously used clustering-based classification method while performing as well as support vector machines (SVMs). In addition to the obvious applicability of consensus clustering to unsupervised learning, the method's advantages in supervised learning include its being a multiclass classifier that can be trained much faster than SVMs, its generality to include any of the many existing clustering algorithms, and its flexibility to be integrated with other predictive models built with other types of data, suggesting its potential for further improved performance. As a concrete example, we consider its combined use with protein-protein interaction data for gene function prediction. It is shown that the combined analysis has a significantly higher predictive accuracy and a much broader functional coverage than using either data source alone.

Original language	English (US)
Pages (from-to)	733-751
Number of pages	19
Journal	Journal of Computational and Graphical Statistics
Volume	16
Issue number	3
DOIs	https://doi.org/10.1198/106186007X237838
State	Published - Sep 2007

Bibliographical note

Funding Information:
The authors are grateful to the two reviewers, an AE and the editor for helpful comments. GX was supported by a Merck Fellowship, WP was partially supported by NIH grant HL65462 and a UM AHC FRD grant.

Keywords

Classification
Cross-validation
Gene annotation
Integrative analysis
Microarray
Protein-protein interaction

Access

10.1198/106186007X237838

OpenUrl availability

Full text

Cite this

@article{a9242f1f5d084613a6a1d74880b1d84b,

title = "Consensus clustering of gene expression data and its application to gene function prediction",

abstract = "Predicting functions of genes is an important issue in biology. Clustering gene expression profiles has been widely used for gene function prediction, but most clustering methods are unstable and sensitive to input parameters such as starting values and number of clusters. In this article, we develop a novel consensus clustering method to address the instability issue and thus improve the performance of clustering methods. The biological function of an unannotated gene is predicted based on the most enriched functional category in its consensus cluster. The MIPS gene annotations are used to evaluate the predictive performance. It is shown that the consensus clustering-based classification method has a significantly better predictive performance than a previously used clustering-based classification method while performing as well as support vector machines (SVMs). In addition to the obvious applicability of consensus clustering to unsupervised learning, the method's advantages in supervised learning include its being a multiclass classifier that can be trained much faster than SVMs, its generality to include any of the many existing clustering algorithms, and its flexibility to be integrated with other predictive models built with other types of data, suggesting its potential for further improved performance. As a concrete example, we consider its combined use with protein-protein interaction data for gene function prediction. It is shown that the combined analysis has a significantly higher predictive accuracy and a much broader functional coverage than using either data source alone.",

keywords = "Classification, Cross-validation, Gene annotation, Integrative analysis, Microarray, Protein-protein interaction",

author = "Guanghua Xiao and Wei Pan",

note = "Funding Information: The authors are grateful to the two reviewers, an AE and the editor for helpful comments. GX was supported by a Merck Fellowship, WP was partially supported by NIH grant HL65462 and a UM AHC FRD grant.",

year = "2007",

month = sep,

doi = "10.1198/106186007X237838",

language = "English (US)",

volume = "16",

pages = "733--751",

journal = "Journal of Computational and Graphical Statistics",

issn = "1061-8600",

publisher = "American Statistical Association",

number = "3",

}

TY - JOUR

T1 - Consensus clustering of gene expression data and its application to gene function prediction

AU - Xiao, Guanghua

AU - Pan, Wei

N1 - Funding Information: The authors are grateful to the two reviewers, an AE and the editor for helpful comments. GX was supported by a Merck Fellowship, WP was partially supported by NIH grant HL65462 and a UM AHC FRD grant.

PY - 2007/9

Y1 - 2007/9

N2 - Predicting functions of genes is an important issue in biology. Clustering gene expression profiles has been widely used for gene function prediction, but most clustering methods are unstable and sensitive to input parameters such as starting values and number of clusters. In this article, we develop a novel consensus clustering method to address the instability issue and thus improve the performance of clustering methods. The biological function of an unannotated gene is predicted based on the most enriched functional category in its consensus cluster. The MIPS gene annotations are used to evaluate the predictive performance. It is shown that the consensus clustering-based classification method has a significantly better predictive performance than a previously used clustering-based classification method while performing as well as support vector machines (SVMs). In addition to the obvious applicability of consensus clustering to unsupervised learning, the method's advantages in supervised learning include its being a multiclass classifier that can be trained much faster than SVMs, its generality to include any of the many existing clustering algorithms, and its flexibility to be integrated with other predictive models built with other types of data, suggesting its potential for further improved performance. As a concrete example, we consider its combined use with protein-protein interaction data for gene function prediction. It is shown that the combined analysis has a significantly higher predictive accuracy and a much broader functional coverage than using either data source alone.

AB - Predicting functions of genes is an important issue in biology. Clustering gene expression profiles has been widely used for gene function prediction, but most clustering methods are unstable and sensitive to input parameters such as starting values and number of clusters. In this article, we develop a novel consensus clustering method to address the instability issue and thus improve the performance of clustering methods. The biological function of an unannotated gene is predicted based on the most enriched functional category in its consensus cluster. The MIPS gene annotations are used to evaluate the predictive performance. It is shown that the consensus clustering-based classification method has a significantly better predictive performance than a previously used clustering-based classification method while performing as well as support vector machines (SVMs). In addition to the obvious applicability of consensus clustering to unsupervised learning, the method's advantages in supervised learning include its being a multiclass classifier that can be trained much faster than SVMs, its generality to include any of the many existing clustering algorithms, and its flexibility to be integrated with other predictive models built with other types of data, suggesting its potential for further improved performance. As a concrete example, we consider its combined use with protein-protein interaction data for gene function prediction. It is shown that the combined analysis has a significantly higher predictive accuracy and a much broader functional coverage than using either data source alone.

KW - Classification

KW - Cross-validation

KW - Gene annotation

KW - Integrative analysis

KW - Microarray

KW - Protein-protein interaction

UR - http://www.scopus.com/inward/record.url?scp=35348969985&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=35348969985&partnerID=8YFLogxK

U2 - 10.1198/106186007X237838

DO - 10.1198/106186007X237838

M3 - Article

AN - SCOPUS:35348969985

SN - 1061-8600

VL - 16

SP - 733

EP - 751

JO - Journal of Computational and Graphical Statistics

JF - Journal of Computational and Graphical Statistics

IS - 3

ER -

Consensus clustering of gene expression data and its application to gene function prediction

Abstract

Bibliographical note

Keywords

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this