I/O Scalable bregman co-clustering

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations

Abstract

Consider an MxN matrix, where the (i,j)th entry represents the affinity between the i_th entity of the first type and the j_th entity of the second type. Co-clustering is an approach to simultaneously cluster both types of entities, using the affinities as the information guiding the clustering. Co-clustering has been found to achieve clustering and dimensionality reduction at the same time, and therefore it is finding application in various problems. Bregman co-clustering algorithm, which has been recently proposed, converts the co-clustering task to the search for an optimal approximation matrix. It is much more scalable but memory-based implementations have a severe computational bottleneck. In this paper we show that a significant fraction of computations performed by the Bregman co-clustering algorithm naturally map to those performed by an on-line analytical processing (OLAP) engine, making the latter a well suited data management engine for the algorithm. Based on this observation, we have developed a version of Bregman co-clustering algorithm that works on top of OLAP. Our experiments show that this version is much more scalable, achieving an order of magnitude performance improvement over the memory-based implementation. We believe this unlocks the power of this novel technique for application to much larger datasets.

Original languageEnglish (US)
Title of host publicationAdvances in Knowledge Discovery and Data Mining - 12th Pacific-Asia Conference, PAKDD 2008, Proceedings
Pages896-903
Number of pages8
DOIs
StatePublished - 2008
Event12th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2008 - Osaka, Japan
Duration: May 20 2008May 23 2008

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume5012 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other12th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2008
Country/TerritoryJapan
CityOsaka
Period5/20/085/23/08

Keywords

  • Bregman co-clustering
  • Data cube
  • OLAP
  • SQL

Fingerprint

Dive into the research topics of 'I/O Scalable bregman co-clustering'. Together they form a unique fingerprint.

Cite this