Co-clustering as multilinear decomposition with sparse latent factors

Evangelos E. Papalexakis; Nicholas D. Sidiropoulos

doi:10.1109/ICASSP.2011.5946731

Co-clustering as multilinear decomposition with sparse latent factors

Evangelos E. Papalexakis, Nicholas D. Sidiropoulos

Electrical and Computer Engineering

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

26 Scopus citations

Abstract

The K-means clustering problem seeks to partition the columns of a data matrix in subsets, such that columns in the same subset are 'close' to each other. The co-clustering problem seeks to simultaneously partition the rows and columns of a matrix to produce 'coherent' groups called co-clusters. Co-clustering has recently found numerous applications in diverse areas. The concept readily generalizes to higher-way data sets (e.g., adding a temporal dimension). Starting from K-means, we show how co-clustering can be formulated as constrained multilinear decomposition with sparse latent factors. In the case of three- and higher-way data, this corresponds to a PARAFAC decomposition with sparse latent factors. This is important, for PARAFAC is unique under mild conditions - and sparsity further improves identifiability. This allows us to uniquely unravel a large number of possibly overlapping co-clusters that are hidden in the data. Interestingly, the imposition of latent sparsity pays a collateral dividend: as one increases the number of fitted co-clusters, new co-clusters are added without affecting those previously extracted. An important corollary is that co-clusters can be extracted incrementally; this implies that the algorithm scales well for large datasets. We demonstrate the validity of our approach using the ENRON corpus, as well as synthetic data.

Original language	English (US)
Title of host publication	2011 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011 - Proceedings
Pages	2064-2067
Number of pages	4
DOIs	https://doi.org/10.1109/ICASSP.2011.5946731
State	Published - Aug 18 2011
Event	36th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011 - Prague, Czech Republic Duration: May 22 2011 → May 27 2011

Other

Other	36th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011
Country/Territory	Czech Republic
City	Prague
Period	5/22/11 → 5/27/11

Access

10.1109/ICASSP.2011.5946731

OpenUrl availability

Full text

Cite this

Papalexakis, EE & Sidiropoulos, ND 2011, Co-clustering as multilinear decomposition with sparse latent factors. in 2011 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011 - Proceedings., 5946731, pp. 2064-2067, 36th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011, Prague, Czech Republic, 5/22/11. https://doi.org/10.1109/ICASSP.2011.5946731

@inproceedings{8abbcf63a1b34e68969343dc743d8501,

title = "Co-clustering as multilinear decomposition with sparse latent factors",

abstract = "The K-means clustering problem seeks to partition the columns of a data matrix in subsets, such that columns in the same subset are 'close' to each other. The co-clustering problem seeks to simultaneously partition the rows and columns of a matrix to produce 'coherent' groups called co-clusters. Co-clustering has recently found numerous applications in diverse areas. The concept readily generalizes to higher-way data sets (e.g., adding a temporal dimension). Starting from K-means, we show how co-clustering can be formulated as constrained multilinear decomposition with sparse latent factors. In the case of three- and higher-way data, this corresponds to a PARAFAC decomposition with sparse latent factors. This is important, for PARAFAC is unique under mild conditions - and sparsity further improves identifiability. This allows us to uniquely unravel a large number of possibly overlapping co-clusters that are hidden in the data. Interestingly, the imposition of latent sparsity pays a collateral dividend: as one increases the number of fitted co-clusters, new co-clusters are added without affecting those previously extracted. An important corollary is that co-clusters can be extracted incrementally; this implies that the algorithm scales well for large datasets. We demonstrate the validity of our approach using the ENRON corpus, as well as synthetic data.",

author = "Papalexakis, {Evangelos E.} and Sidiropoulos, {Nicholas D.}",

year = "2011",

month = aug,

day = "18",

doi = "10.1109/ICASSP.2011.5946731",

language = "English (US)",

isbn = "9781457705397",

pages = "2064--2067",

booktitle = "2011 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011 - Proceedings",

note = "36th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011 ; Conference date: 22-05-2011 Through 27-05-2011",

}

TY - GEN

T1 - Co-clustering as multilinear decomposition with sparse latent factors

AU - Papalexakis, Evangelos E.

AU - Sidiropoulos, Nicholas D.

PY - 2011/8/18

Y1 - 2011/8/18

N2 - The K-means clustering problem seeks to partition the columns of a data matrix in subsets, such that columns in the same subset are 'close' to each other. The co-clustering problem seeks to simultaneously partition the rows and columns of a matrix to produce 'coherent' groups called co-clusters. Co-clustering has recently found numerous applications in diverse areas. The concept readily generalizes to higher-way data sets (e.g., adding a temporal dimension). Starting from K-means, we show how co-clustering can be formulated as constrained multilinear decomposition with sparse latent factors. In the case of three- and higher-way data, this corresponds to a PARAFAC decomposition with sparse latent factors. This is important, for PARAFAC is unique under mild conditions - and sparsity further improves identifiability. This allows us to uniquely unravel a large number of possibly overlapping co-clusters that are hidden in the data. Interestingly, the imposition of latent sparsity pays a collateral dividend: as one increases the number of fitted co-clusters, new co-clusters are added without affecting those previously extracted. An important corollary is that co-clusters can be extracted incrementally; this implies that the algorithm scales well for large datasets. We demonstrate the validity of our approach using the ENRON corpus, as well as synthetic data.

AB - The K-means clustering problem seeks to partition the columns of a data matrix in subsets, such that columns in the same subset are 'close' to each other. The co-clustering problem seeks to simultaneously partition the rows and columns of a matrix to produce 'coherent' groups called co-clusters. Co-clustering has recently found numerous applications in diverse areas. The concept readily generalizes to higher-way data sets (e.g., adding a temporal dimension). Starting from K-means, we show how co-clustering can be formulated as constrained multilinear decomposition with sparse latent factors. In the case of three- and higher-way data, this corresponds to a PARAFAC decomposition with sparse latent factors. This is important, for PARAFAC is unique under mild conditions - and sparsity further improves identifiability. This allows us to uniquely unravel a large number of possibly overlapping co-clusters that are hidden in the data. Interestingly, the imposition of latent sparsity pays a collateral dividend: as one increases the number of fitted co-clusters, new co-clusters are added without affecting those previously extracted. An important corollary is that co-clusters can be extracted incrementally; this implies that the algorithm scales well for large datasets. We demonstrate the validity of our approach using the ENRON corpus, as well as synthetic data.

UR - http://www.scopus.com/inward/record.url?scp=80051642158&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=80051642158&partnerID=8YFLogxK

U2 - 10.1109/ICASSP.2011.5946731

DO - 10.1109/ICASSP.2011.5946731

M3 - Conference contribution

AN - SCOPUS:80051642158

SN - 9781457705397

SP - 2064

EP - 2067

BT - 2011 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011 - Proceedings

T2 - 36th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011

Y2 - 22 May 2011 through 27 May 2011

ER -

Co-clustering as multilinear decomposition with sparse latent factors

Abstract

Other

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this