Are random forests better than support vector machines for microarray-based cancer classification?

Alexander Statnikov; Constantin F. Aliferis

Are random forests better than support vector machines for microarray-based cancer classification?

Alexander Statnikov, Constantin F. Aliferis

Institute for Health Informatics

Research output: Contribution to journal › Article › peer-review

59 Scopus citations

Abstract

Cancer diagnosis and clinical outcome prediction are among the most important emerging applications of gene expression microarray technology with several molecular signatures on their way toward clinical deployment. Use of the most accurate decision support algorithms available for microarray gene expression data is a critical ingredient in order to develop the best possible molecular signatures for patient care. As suggested by a large body of literature to-date, support vector machines can be considered "best of class" algorithms for classification of such data. Recent work however found that random forest classifiers outperform support vector machines. In the present paper we point to several biases of this prior work and conduct a new unbiased evaluation of the two algorithms. Our experiments using 18 diagnostic and prognostic datasets show that support vector machines outperform random forests often by a large margin.

Original language	English (US)
Pages (from-to)	686-690
Number of pages	5
Journal	AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium
State	Published - 2007

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

OpenUrl availability

Full text

Cite this

@article{4cd989272bf5469eaee290b3c4835ff9,

title = "Are random forests better than support vector machines for microarray-based cancer classification?",

abstract = "Cancer diagnosis and clinical outcome prediction are among the most important emerging applications of gene expression microarray technology with several molecular signatures on their way toward clinical deployment. Use of the most accurate decision support algorithms available for microarray gene expression data is a critical ingredient in order to develop the best possible molecular signatures for patient care. As suggested by a large body of literature to-date, support vector machines can be considered {"}best of class{"} algorithms for classification of such data. Recent work however found that random forest classifiers outperform support vector machines. In the present paper we point to several biases of this prior work and conduct a new unbiased evaluation of the two algorithms. Our experiments using 18 diagnostic and prognostic datasets show that support vector machines outperform random forests often by a large margin.",

author = "Alexander Statnikov and Aliferis, {Constantin F.}",

year = "2007",

language = "English (US)",

pages = "686--690",

journal = "AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium",