I/O Scalable bregman co-clustering

Kuo Wei Hsu; Arindam Banerjee; Jaideep Srivastava

doi:10.1007/978-3-540-68125-0_90

I/O Scalable bregman co-clustering

Kuo Wei Hsu, Arindam Banerjee, Jaideep Srivastava

Computer Science and Engineering

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

1 Scopus citations

Abstract

Consider an MxN matrix, where the (i,j)th entry represents the affinity between the i_th entity of the first type and the j_th entity of the second type. Co-clustering is an approach to simultaneously cluster both types of entities, using the affinities as the information guiding the clustering. Co-clustering has been found to achieve clustering and dimensionality reduction at the same time, and therefore it is finding application in various problems. Bregman co-clustering algorithm, which has been recently proposed, converts the co-clustering task to the search for an optimal approximation matrix. It is much more scalable but memory-based implementations have a severe computational bottleneck. In this paper we show that a significant fraction of computations performed by the Bregman co-clustering algorithm naturally map to those performed by an on-line analytical processing (OLAP) engine, making the latter a well suited data management engine for the algorithm. Based on this observation, we have developed a version of Bregman co-clustering algorithm that works on top of OLAP. Our experiments show that this version is much more scalable, achieving an order of magnitude performance improvement over the memory-based implementation. We believe this unlocks the power of this novel technique for application to much larger datasets.

Original language	English (US)
Title of host publication	Advances in Knowledge Discovery and Data Mining - 12th Pacific-Asia Conference, PAKDD 2008, Proceedings
Pages	896-903
Number of pages	8
DOIs	https://doi.org/10.1007/978-3-540-68125-0_90
State	Published - 2008
Event	12th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2008 - Osaka, Japan Duration: May 20 2008 → May 23 2008

Publication series

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume	5012 LNAI
ISSN (Print)	0302-9743
ISSN (Electronic)	1611-3349

Other

Other	12th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2008
Country/Territory	Japan
City	Osaka
Period	5/20/08 → 5/23/08

Keywords

Bregman co-clustering
Data cube
OLAP
SQL

Access

10.1007/978-3-540-68125-0_90

OpenUrl availability

Full text

Cite this

Hsu, K. W., Banerjee, A., & Srivastava, J. (2008). I/O Scalable bregman co-clustering. In Advances in Knowledge Discovery and Data Mining - 12th Pacific-Asia Conference, PAKDD 2008, Proceedings (pp. 896-903). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 5012 LNAI). https://doi.org/10.1007/978-3-540-68125-0_90

I/O Scalable bregman co-clustering. / Hsu, Kuo Wei; Banerjee, Arindam ; Srivastava, Jaideep.
Advances in Knowledge Discovery and Data Mining - 12th Pacific-Asia Conference, PAKDD 2008, Proceedings. 2008. p. 896-903 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 5012 LNAI).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Hsu, KW, Banerjee, A & Srivastava, J 2008, I/O Scalable bregman co-clustering. in Advances in Knowledge Discovery and Data Mining - 12th Pacific-Asia Conference, PAKDD 2008, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 5012 LNAI, pp. 896-903, 12th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2008, Osaka, Japan, 5/20/08. https://doi.org/10.1007/978-3-540-68125-0_90

@inproceedings{fda16cde594d44508dd44de6be447dc1,

title = "I/O Scalable bregman co-clustering",

abstract = "Consider an MxN matrix, where the (i,j)th entry represents the affinity between the i_th entity of the first type and the j_th entity of the second type. Co-clustering is an approach to simultaneously cluster both types of entities, using the affinities as the information guiding the clustering. Co-clustering has been found to achieve clustering and dimensionality reduction at the same time, and therefore it is finding application in various problems. Bregman co-clustering algorithm, which has been recently proposed, converts the co-clustering task to the search for an optimal approximation matrix. It is much more scalable but memory-based implementations have a severe computational bottleneck. In this paper we show that a significant fraction of computations performed by the Bregman co-clustering algorithm naturally map to those performed by an on-line analytical processing (OLAP) engine, making the latter a well suited data management engine for the algorithm. Based on this observation, we have developed a version of Bregman co-clustering algorithm that works on top of OLAP. Our experiments show that this version is much more scalable, achieving an order of magnitude performance improvement over the memory-based implementation. We believe this unlocks the power of this novel technique for application to much larger datasets.",

keywords = "Bregman co-clustering, Data cube, OLAP, SQL",

author = "Hsu, {Kuo Wei} and Arindam Banerjee and Jaideep Srivastava",

year = "2008",

doi = "10.1007/978-3-540-68125-0_90",

language = "English (US)",

isbn = "3540681248",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

pages = "896--903",

booktitle = "Advances in Knowledge Discovery and Data Mining - 12th Pacific-Asia Conference, PAKDD 2008, Proceedings",

note = "12th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2008 ; Conference date: 20-05-2008 Through 23-05-2008",

}

TY - GEN

T1 - I/O Scalable bregman co-clustering

AU - Hsu, Kuo Wei

AU - Banerjee, Arindam

AU - Srivastava, Jaideep

PY - 2008

Y1 - 2008

N2 - Consider an MxN matrix, where the (i,j)th entry represents the affinity between the i_th entity of the first type and the j_th entity of the second type. Co-clustering is an approach to simultaneously cluster both types of entities, using the affinities as the information guiding the clustering. Co-clustering has been found to achieve clustering and dimensionality reduction at the same time, and therefore it is finding application in various problems. Bregman co-clustering algorithm, which has been recently proposed, converts the co-clustering task to the search for an optimal approximation matrix. It is much more scalable but memory-based implementations have a severe computational bottleneck. In this paper we show that a significant fraction of computations performed by the Bregman co-clustering algorithm naturally map to those performed by an on-line analytical processing (OLAP) engine, making the latter a well suited data management engine for the algorithm. Based on this observation, we have developed a version of Bregman co-clustering algorithm that works on top of OLAP. Our experiments show that this version is much more scalable, achieving an order of magnitude performance improvement over the memory-based implementation. We believe this unlocks the power of this novel technique for application to much larger datasets.

AB - Consider an MxN matrix, where the (i,j)th entry represents the affinity between the i_th entity of the first type and the j_th entity of the second type. Co-clustering is an approach to simultaneously cluster both types of entities, using the affinities as the information guiding the clustering. Co-clustering has been found to achieve clustering and dimensionality reduction at the same time, and therefore it is finding application in various problems. Bregman co-clustering algorithm, which has been recently proposed, converts the co-clustering task to the search for an optimal approximation matrix. It is much more scalable but memory-based implementations have a severe computational bottleneck. In this paper we show that a significant fraction of computations performed by the Bregman co-clustering algorithm naturally map to those performed by an on-line analytical processing (OLAP) engine, making the latter a well suited data management engine for the algorithm. Based on this observation, we have developed a version of Bregman co-clustering algorithm that works on top of OLAP. Our experiments show that this version is much more scalable, achieving an order of magnitude performance improvement over the memory-based implementation. We believe this unlocks the power of this novel technique for application to much larger datasets.

KW - Bregman co-clustering

KW - Data cube

KW - OLAP

KW - SQL

UR - http://www.scopus.com/inward/record.url?scp=44649095199&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=44649095199&partnerID=8YFLogxK

U2 - 10.1007/978-3-540-68125-0_90

DO - 10.1007/978-3-540-68125-0_90

M3 - Conference contribution

AN - SCOPUS:44649095199

SN - 3540681248

SN - 9783540681243

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 896

EP - 903

BT - Advances in Knowledge Discovery and Data Mining - 12th Pacific-Asia Conference, PAKDD 2008, Proceedings

T2 - 12th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2008

Y2 - 20 May 2008 through 23 May 2008

ER -

I/O Scalable bregman co-clustering

Abstract

Publication series

Other

Keywords

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this