Exploiting unlabeled data in ensemble methods

Kristin P. Bennett; Ayhan Demiriz; Richard Maclin

Exploiting unlabeled data in ensemble methods

Kristin P. Bennett, Ayhan Demiriz, Richard Maclin

Computer Science (Duluth)

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

147 Scopus citations

Abstract

An adaptive semi-supervised ensemble method, ASSEMBLE, is proposed that constructs classification ensembles based on both labeled and unlabeled data. ASSEMBLE alternates between assigning "pseudo-classes" to the unlabeled data using the existing ensemble and constructing the next base classifier using both the labeled and pseudo-labeled data. Mathematically, this intuitive algorithm corresponds to maximizing the classification margin in hypothesis space as measured on both the labeled and unlabeled data. Unlike alternative approaches, ASSEMBLE does not require a semi-supervised learning method for the base classifier. ASSEMBLE can be used in conjunction with any cost-sensitive classification algorithm for both two-class and multi-class problems. ASSEMBLE using decision trees won the NIPS 2001 Unlabeled Data Competition. In addition, strong results on several benchmark datasets using both decision trees and neural networks support the proposed method.

Original language	English (US)
Title of host publication	Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Editors	D. Hand, D. Keim, R. Ng
Pages	289-296
Number of pages	8
State	Published - Dec 1 2002
Event	KDD - 2002 Proceedings of the Eight ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - Edmonton, Alta, Canada Duration: Jul 23 2002 → Jul 26 2002

Other

Other	KDD - 2002 Proceedings of the Eight ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Country/Territory	Canada
City	Edmonton, Alta
Period	7/23/02 → 7/26/02

Keywords

Boosting
Classification
Ensemble learning
Semi-supervised learning

OpenUrl availability

Full text

Cite this

Bennett, KP, Demiriz, A & Maclin, R 2002, Exploiting unlabeled data in ensemble methods. in D Hand, D Keim & R Ng (eds), Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp. 289-296, KDD - 2002 Proceedings of the Eight ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, Alta, Canada, 7/23/02.

@inproceedings{30c2290a9ee0422598cce8eba20ebaff,

title = "Exploiting unlabeled data in ensemble methods",

abstract = "An adaptive semi-supervised ensemble method, ASSEMBLE, is proposed that constructs classification ensembles based on both labeled and unlabeled data. ASSEMBLE alternates between assigning {"}pseudo-classes{"} to the unlabeled data using the existing ensemble and constructing the next base classifier using both the labeled and pseudo-labeled data. Mathematically, this intuitive algorithm corresponds to maximizing the classification margin in hypothesis space as measured on both the labeled and unlabeled data. Unlike alternative approaches, ASSEMBLE does not require a semi-supervised learning method for the base classifier. ASSEMBLE can be used in conjunction with any cost-sensitive classification algorithm for both two-class and multi-class problems. ASSEMBLE using decision trees won the NIPS 2001 Unlabeled Data Competition. In addition, strong results on several benchmark datasets using both decision trees and neural networks support the proposed method.",

keywords = "Boosting, Classification, Ensemble learning, Semi-supervised learning",

author = "Bennett, {Kristin P.} and Ayhan Demiriz and Richard Maclin",

year = "2002",

month = dec,

day = "1",

language = "English (US)",

pages = "289--296",

editor = "D. Hand and D. Keim and R. Ng",

booktitle = "Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining",

note = "KDD - 2002 Proceedings of the Eight ACM SIGKDD International Conference on Knowledge Discovery and Data Mining ; Conference date: 23-07-2002 Through 26-07-2002",

}

TY - GEN

T1 - Exploiting unlabeled data in ensemble methods

AU - Bennett, Kristin P.

AU - Demiriz, Ayhan

AU - Maclin, Richard

PY - 2002/12/1

Y1 - 2002/12/1

N2 - An adaptive semi-supervised ensemble method, ASSEMBLE, is proposed that constructs classification ensembles based on both labeled and unlabeled data. ASSEMBLE alternates between assigning "pseudo-classes" to the unlabeled data using the existing ensemble and constructing the next base classifier using both the labeled and pseudo-labeled data. Mathematically, this intuitive algorithm corresponds to maximizing the classification margin in hypothesis space as measured on both the labeled and unlabeled data. Unlike alternative approaches, ASSEMBLE does not require a semi-supervised learning method for the base classifier. ASSEMBLE can be used in conjunction with any cost-sensitive classification algorithm for both two-class and multi-class problems. ASSEMBLE using decision trees won the NIPS 2001 Unlabeled Data Competition. In addition, strong results on several benchmark datasets using both decision trees and neural networks support the proposed method.

AB - An adaptive semi-supervised ensemble method, ASSEMBLE, is proposed that constructs classification ensembles based on both labeled and unlabeled data. ASSEMBLE alternates between assigning "pseudo-classes" to the unlabeled data using the existing ensemble and constructing the next base classifier using both the labeled and pseudo-labeled data. Mathematically, this intuitive algorithm corresponds to maximizing the classification margin in hypothesis space as measured on both the labeled and unlabeled data. Unlike alternative approaches, ASSEMBLE does not require a semi-supervised learning method for the base classifier. ASSEMBLE can be used in conjunction with any cost-sensitive classification algorithm for both two-class and multi-class problems. ASSEMBLE using decision trees won the NIPS 2001 Unlabeled Data Competition. In addition, strong results on several benchmark datasets using both decision trees and neural networks support the proposed method.

KW - Boosting

KW - Classification

KW - Ensemble learning

KW - Semi-supervised learning

UR - http://www.scopus.com/inward/record.url?scp=0242456809&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0242456809&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:0242456809

SP - 289

EP - 296

BT - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

A2 - Hand, D.

A2 - Keim, D.

A2 - Ng, R.

T2 - KDD - 2002 Proceedings of the Eight ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Y2 - 23 July 2002 through 26 July 2002

ER -

Exploiting unlabeled data in ensemble methods

Abstract

Other

Keywords

OpenUrl availability

Other files and links

Fingerprint

Cite this