Big data clustering via random sketching and validation

Panagiotis A. Traganitis; Konstantinos Slavakis; Georgios B Giannakis

doi:10.1109/ACSSC.2014.7094614

Big data clustering via random sketching and validation

Panagiotis A. Traganitis, Konstantinos Slavakis, Georgios B Giannakis

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

2 Scopus citations

Abstract

As the number and dimensionality of data increases, development of new efficient processing tools has become a necessity. The present paper introduces a novel dimensionality reduction approach for fast and efficient clustering of high-dimensional data. The new methods extend random sampling and consensus (RANSAC) arguments, originally developed for robust regression tasks in computer vision, to the dimensionality reduction problem. The advocated random sketching and validation K-means (SkeVa K-means) and Divergence SkeVa algorithms can achieve high performance, with the latter being able to afford lower computational footprint than the former. Extensive numerical tests on synthetic and real datasets highlight the potential of the proposed algorithms, and demonstrate their competitive performance relative to state-of-the-art random projection alternatives.

Original language	English (US)
Title of host publication	Conference Record of the 48th Asilomar Conference on Signals, Systems and Computers
Editors	Michael B. Matthews
Publisher	IEEE Computer Society
Pages	1046-1050
Number of pages	5
ISBN (Electronic)	9781479982974
DOIs	https://doi.org/10.1109/ACSSC.2014.7094614
State	Published - Apr 24 2015
Event	48th Asilomar Conference on Signals, Systems and Computers, ACSSC 2015 - Pacific Grove, United States Duration: Nov 2 2014 → Nov 5 2014

Publication series

Name	Conference Record - Asilomar Conference on Signals, Systems and Computers
Volume	2015-April
ISSN (Print)	1058-6393

Other

Other	48th Asilomar Conference on Signals, Systems and Computers, ACSSC 2015
Country/Territory	United States
City	Pacific Grove
Period	11/2/14 → 11/5/14

Bibliographical note

Publisher Copyright:
© 2014 IEEE.

Keywords

Clustering
K-means
big data
feature selection
high-dimensional data
random sampling and consensus
random sketching and validation

Access

10.1109/ACSSC.2014.7094614

OpenUrl availability

Full text

Cite this

Traganitis, P. A., Slavakis, K., & Giannakis, G. B. (2015). Big data clustering via random sketching and validation. In M. B. Matthews (Ed.), Conference Record of the 48th Asilomar Conference on Signals, Systems and Computers (pp. 1046-1050). Article 7094614 (Conference Record - Asilomar Conference on Signals, Systems and Computers; Vol. 2015-April). IEEE Computer Society. https://doi.org/10.1109/ACSSC.2014.7094614

Big data clustering via random sketching and validation. / Traganitis, Panagiotis A.; Slavakis, Konstantinos; Giannakis, Georgios B.
Conference Record of the 48th Asilomar Conference on Signals, Systems and Computers. ed. / Michael B. Matthews. IEEE Computer Society, 2015. p. 1046-1050 7094614 (Conference Record - Asilomar Conference on Signals, Systems and Computers; Vol. 2015-April).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Traganitis, PA, Slavakis, K & Giannakis, GB 2015, Big data clustering via random sketching and validation. in MB Matthews (ed.), Conference Record of the 48th Asilomar Conference on Signals, Systems and Computers., 7094614, Conference Record - Asilomar Conference on Signals, Systems and Computers, vol. 2015-April, IEEE Computer Society, pp. 1046-1050, 48th Asilomar Conference on Signals, Systems and Computers, ACSSC 2015, Pacific Grove, United States, 11/2/14. https://doi.org/10.1109/ACSSC.2014.7094614

Traganitis, Panagiotis A. ; Slavakis, Konstantinos ; Giannakis, Georgios B. / Big data clustering via random sketching and validation. Conference Record of the 48th Asilomar Conference on Signals, Systems and Computers. editor / Michael B. Matthews. IEEE Computer Society, 2015. pp. 1046-1050 (Conference Record - Asilomar Conference on Signals, Systems and Computers).

@inproceedings{1114c8e875e4410ba470e6f6572efe72,

title = "Big data clustering via random sketching and validation",

abstract = "As the number and dimensionality of data increases, development of new efficient processing tools has become a necessity. The present paper introduces a novel dimensionality reduction approach for fast and efficient clustering of high-dimensional data. The new methods extend random sampling and consensus (RANSAC) arguments, originally developed for robust regression tasks in computer vision, to the dimensionality reduction problem. The advocated random sketching and validation K-means (SkeVa K-means) and Divergence SkeVa algorithms can achieve high performance, with the latter being able to afford lower computational footprint than the former. Extensive numerical tests on synthetic and real datasets highlight the potential of the proposed algorithms, and demonstrate their competitive performance relative to state-of-the-art random projection alternatives.",

keywords = "Clustering, K-means, big data, feature selection, high-dimensional data, random sampling and consensus, random sketching and validation",

author = "Traganitis, {Panagiotis A.} and Konstantinos Slavakis and Giannakis, {Georgios B}",

note = "Publisher Copyright: {\textcopyright} 2014 IEEE.; 48th Asilomar Conference on Signals, Systems and Computers, ACSSC 2015 ; Conference date: 02-11-2014 Through 05-11-2014",

year = "2015",

month = apr,

day = "24",

doi = "10.1109/ACSSC.2014.7094614",

language = "English (US)",

series = "Conference Record - Asilomar Conference on Signals, Systems and Computers",

publisher = "IEEE Computer Society",

pages = "1046--1050",

editor = "Matthews, {Michael B.}",

booktitle = "Conference Record of the 48th Asilomar Conference on Signals, Systems and Computers",

}

TY - GEN

T1 - Big data clustering via random sketching and validation

AU - Traganitis, Panagiotis A.

AU - Slavakis, Konstantinos

AU - Giannakis, Georgios B

PY - 2015/4/24

Y1 - 2015/4/24

N2 - As the number and dimensionality of data increases, development of new efficient processing tools has become a necessity. The present paper introduces a novel dimensionality reduction approach for fast and efficient clustering of high-dimensional data. The new methods extend random sampling and consensus (RANSAC) arguments, originally developed for robust regression tasks in computer vision, to the dimensionality reduction problem. The advocated random sketching and validation K-means (SkeVa K-means) and Divergence SkeVa algorithms can achieve high performance, with the latter being able to afford lower computational footprint than the former. Extensive numerical tests on synthetic and real datasets highlight the potential of the proposed algorithms, and demonstrate their competitive performance relative to state-of-the-art random projection alternatives.

AB - As the number and dimensionality of data increases, development of new efficient processing tools has become a necessity. The present paper introduces a novel dimensionality reduction approach for fast and efficient clustering of high-dimensional data. The new methods extend random sampling and consensus (RANSAC) arguments, originally developed for robust regression tasks in computer vision, to the dimensionality reduction problem. The advocated random sketching and validation K-means (SkeVa K-means) and Divergence SkeVa algorithms can achieve high performance, with the latter being able to afford lower computational footprint than the former. Extensive numerical tests on synthetic and real datasets highlight the potential of the proposed algorithms, and demonstrate their competitive performance relative to state-of-the-art random projection alternatives.

KW - Clustering

KW - K-means

KW - big data

KW - feature selection

KW - high-dimensional data

KW - random sampling and consensus

KW - random sketching and validation

UR - http://www.scopus.com/inward/record.url?scp=84940479443&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84940479443&partnerID=8YFLogxK

U2 - 10.1109/ACSSC.2014.7094614

DO - 10.1109/ACSSC.2014.7094614

M3 - Conference contribution

AN - SCOPUS:84940479443

T3 - Conference Record - Asilomar Conference on Signals, Systems and Computers

SP - 1046

EP - 1050

BT - Conference Record of the 48th Asilomar Conference on Signals, Systems and Computers

A2 - Matthews, Michael B.

PB - IEEE Computer Society

T2 - 48th Asilomar Conference on Signals, Systems and Computers, ACSSC 2015

Y2 - 2 November 2014 through 5 November 2014

ER -

Big data clustering via random sketching and validation

Abstract

Publication series

Other

Bibliographical note

Keywords

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this