Robust feature selection technique using rank aggregation

Chandrima Sarkar; Sarah Cooley; Jaideep Srivastava

doi:10.1080/08839514.2014.883903

Robust feature selection technique using rank aggregation

Chandrima Sarkar, Sarah Cooley, Jaideep Srivastava

Research output: Contribution to journal › Article › peer-review

42 Scopus citations

Abstract

Although feature selection is a well-developed research area, there is an ongoing need to develop methods to make classifiers more efficient. One important challenge is the lack of a universal feature selection technique that produces similar outcomes with all types of classifiers. This is because all feature selection techniques have individual statistical biases, whereas classifiers exploit different statistical properties of data for evaluation. In numerous situations, this can put researchers into dilemma with regard to which feature selection method and classifiers to choose from a vast range of choices. In this article, we propose a technique that aggregates the consensus properties of various feature selection methods in order to develop a more optimal solution. The ensemble nature of our technique makes it more robust across various classifiers. In other words, it is stable toward achieving similar and, ideally, higher classification accuracy across a wide variety of classifiers. We quantify this concept of robustness with a measure known as the robustness index (RI). We perform an extensive empirical evaluation of our technique on eight datasets with different dimensions, including arrythmia, lung cancer, Madelon, mfeat-fourier, Internet ads, leukemia-3c, embryonal tumor, and a real-world dataset, vis., acute myeloid leukemia (AML). We demonstrate not only that our algorithm is more robust, but also that, compared with other techniques, our algorithm improves the classification accuracy by approximately 3-4% in a dataset with fewer than 500 features and by more than 5% in a dataset with more than 500 features, across a wide range of classifiers. © 2014

Original language	English (US)
Pages (from-to)	243-257
Number of pages	15
Journal	Applied Artificial Intelligence
Volume	28
Issue number	3
DOIs	https://doi.org/10.1080/08839514.2014.883903
State	Published - Mar 16 2014

Bibliographical note

Funding Information:
AML data resource in this work was supported by the National Institutes of Health/NCI grant P01 111412, PI Jeffrey S. Miller, M.D, utilizing the Masonic Cancer Center, University of Minnesota Oncology Medical Informatics shared resources. We would like to thank Atanu Roy for his critical reviews and technical feedback during the development of this research.

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

Access

10.1080/08839514.2014.883903

OpenUrl availability

Full text

Cite this

@article{ef09730ad7874baeb3c053f79b581a39,

title = "Robust feature selection technique using rank aggregation",

abstract = "Although feature selection is a well-developed research area, there is an ongoing need to develop methods to make classifiers more efficient. One important challenge is the lack of a universal feature selection technique that produces similar outcomes with all types of classifiers. This is because all feature selection techniques have individual statistical biases, whereas classifiers exploit different statistical properties of data for evaluation. In numerous situations, this can put researchers into dilemma with regard to which feature selection method and classifiers to choose from a vast range of choices. In this article, we propose a technique that aggregates the consensus properties of various feature selection methods in order to develop a more optimal solution. The ensemble nature of our technique makes it more robust across various classifiers. In other words, it is stable toward achieving similar and, ideally, higher classification accuracy across a wide variety of classifiers. We quantify this concept of robustness with a measure known as the robustness index (RI). We perform an extensive empirical evaluation of our technique on eight datasets with different dimensions, including arrythmia, lung cancer, Madelon, mfeat-fourier, Internet ads, leukemia-3c, embryonal tumor, and a real-world dataset, vis., acute myeloid leukemia (AML). We demonstrate not only that our algorithm is more robust, but also that, compared with other techniques, our algorithm improves the classification accuracy by approximately 3-4% in a dataset with fewer than 500 features and by more than 5% in a dataset with more than 500 features, across a wide range of classifiers. {\textcopyright} 2014",

author = "Chandrima Sarkar and Sarah Cooley and Jaideep Srivastava",

note = "Funding Information: AML data resource in this work was supported by the National Institutes of Health/NCI grant P01 111412, PI Jeffrey S. Miller, M.D, utilizing the Masonic Cancer Center, University of Minnesota Oncology Medical Informatics shared resources. We would like to thank Atanu Roy for his critical reviews and technical feedback during the development of this research.",

year = "2014",

month = mar,

day = "16",

doi = "10.1080/08839514.2014.883903",

language = "English (US)",

volume = "28",

pages = "243--257",

journal = "Applied Artificial Intelligence",

issn = "0883-9514",

publisher = "Taylor and Francis Ltd.",

number = "3",

}

TY - JOUR

T1 - Robust feature selection technique using rank aggregation

AU - Sarkar, Chandrima

AU - Cooley, Sarah

AU - Srivastava, Jaideep

N1 - Funding Information: AML data resource in this work was supported by the National Institutes of Health/NCI grant P01 111412, PI Jeffrey S. Miller, M.D, utilizing the Masonic Cancer Center, University of Minnesota Oncology Medical Informatics shared resources. We would like to thank Atanu Roy for his critical reviews and technical feedback during the development of this research.

PY - 2014/3/16

Y1 - 2014/3/16

N2 - Although feature selection is a well-developed research area, there is an ongoing need to develop methods to make classifiers more efficient. One important challenge is the lack of a universal feature selection technique that produces similar outcomes with all types of classifiers. This is because all feature selection techniques have individual statistical biases, whereas classifiers exploit different statistical properties of data for evaluation. In numerous situations, this can put researchers into dilemma with regard to which feature selection method and classifiers to choose from a vast range of choices. In this article, we propose a technique that aggregates the consensus properties of various feature selection methods in order to develop a more optimal solution. The ensemble nature of our technique makes it more robust across various classifiers. In other words, it is stable toward achieving similar and, ideally, higher classification accuracy across a wide variety of classifiers. We quantify this concept of robustness with a measure known as the robustness index (RI). We perform an extensive empirical evaluation of our technique on eight datasets with different dimensions, including arrythmia, lung cancer, Madelon, mfeat-fourier, Internet ads, leukemia-3c, embryonal tumor, and a real-world dataset, vis., acute myeloid leukemia (AML). We demonstrate not only that our algorithm is more robust, but also that, compared with other techniques, our algorithm improves the classification accuracy by approximately 3-4% in a dataset with fewer than 500 features and by more than 5% in a dataset with more than 500 features, across a wide range of classifiers. © 2014

AB - Although feature selection is a well-developed research area, there is an ongoing need to develop methods to make classifiers more efficient. One important challenge is the lack of a universal feature selection technique that produces similar outcomes with all types of classifiers. This is because all feature selection techniques have individual statistical biases, whereas classifiers exploit different statistical properties of data for evaluation. In numerous situations, this can put researchers into dilemma with regard to which feature selection method and classifiers to choose from a vast range of choices. In this article, we propose a technique that aggregates the consensus properties of various feature selection methods in order to develop a more optimal solution. The ensemble nature of our technique makes it more robust across various classifiers. In other words, it is stable toward achieving similar and, ideally, higher classification accuracy across a wide variety of classifiers. We quantify this concept of robustness with a measure known as the robustness index (RI). We perform an extensive empirical evaluation of our technique on eight datasets with different dimensions, including arrythmia, lung cancer, Madelon, mfeat-fourier, Internet ads, leukemia-3c, embryonal tumor, and a real-world dataset, vis., acute myeloid leukemia (AML). We demonstrate not only that our algorithm is more robust, but also that, compared with other techniques, our algorithm improves the classification accuracy by approximately 3-4% in a dataset with fewer than 500 features and by more than 5% in a dataset with more than 500 features, across a wide range of classifiers. © 2014

UR - http://www.scopus.com/inward/record.url?scp=84896352368&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84896352368&partnerID=8YFLogxK

U2 - 10.1080/08839514.2014.883903

DO - 10.1080/08839514.2014.883903

M3 - Article

AN - SCOPUS:84896352368

SN - 0883-9514

VL - 28

SP - 243

EP - 257

JO - Applied Artificial Intelligence

JF - Applied Artificial Intelligence

IS - 3

ER -

Robust feature selection technique using rank aggregation

Abstract

Bibliographical note

UN SDGs

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this