Residual Bayesian co-clustering for matrix approximation

Hanhuai Shan, Arindam Banerjee

Research output: Contribution to conferencePaperpeer-review

11 Scopus citations

Abstract

In recent years, matrix approximation for missing value prediction has emerged as an important problem in a variety of domains such as recommendation systems, e-commerce and online advertisement. While matrix factorization based algorithms typically have good approximation accuracy, such algorithms can be slow especially for large matrices. Further, such algorithms cannot naturally make prediction on new rows or columns. In this paper, we propose residual Bayesian co-clustering (RBC), which learns a generative model corresponding to the matrix from the non-missing entries, and uses the model to predict the missing entries. RBC is an extension of Bayesian co-clustering by taking row and column bias into consideration. The model allows mixed memberships of rows and columns to multiple clusters, and can naturally handle the prediction on new rows and columns which are not used in the training process, given only a few non-missing entries in them. We propose two variational inference based algorithms for learning the model and predicting missing entries. One of the proposed algorithms leads to a parallel RBC which can achieve significant speed-ups. The efficacy of RBC is demonstrated by extensive experimental comparisons with state-of-the-art algorithms on real world datasets.

Original languageEnglish (US)
Pages223-234
Number of pages12
DOIs
StatePublished - 2010
Event10th SIAM International Conference on Data Mining, SDM 2010 - Columbus, OH, United States
Duration: Apr 29 2010May 1 2010

Other

Other10th SIAM International Conference on Data Mining, SDM 2010
Country/TerritoryUnited States
CityColumbus, OH
Period4/29/105/1/10

Fingerprint

Dive into the research topics of 'Residual Bayesian co-clustering for matrix approximation'. Together they form a unique fingerprint.

Cite this