Residual Bayesian co-clustering for matrix approximation

Hanhuai Shan; Arindam Banerjee

doi:10.1137/1.9781611972801.20

Residual Bayesian co-clustering for matrix approximation

Hanhuai Shan, Arindam Banerjee

Computer Science and Engineering

Research output: Contribution to conference › Paper › peer-review

11 Scopus citations

Abstract

In recent years, matrix approximation for missing value prediction has emerged as an important problem in a variety of domains such as recommendation systems, e-commerce and online advertisement. While matrix factorization based algorithms typically have good approximation accuracy, such algorithms can be slow especially for large matrices. Further, such algorithms cannot naturally make prediction on new rows or columns. In this paper, we propose residual Bayesian co-clustering (RBC), which learns a generative model corresponding to the matrix from the non-missing entries, and uses the model to predict the missing entries. RBC is an extension of Bayesian co-clustering by taking row and column bias into consideration. The model allows mixed memberships of rows and columns to multiple clusters, and can naturally handle the prediction on new rows and columns which are not used in the training process, given only a few non-missing entries in them. We propose two variational inference based algorithms for learning the model and predicting missing entries. One of the proposed algorithms leads to a parallel RBC which can achieve significant speed-ups. The efficacy of RBC is demonstrated by extensive experimental comparisons with state-of-the-art algorithms on real world datasets.

Original language	English (US)
Pages	223-234
Number of pages	12
DOIs	https://doi.org/10.1137/1.9781611972801.20
State	Published - 2010
Event	10th SIAM International Conference on Data Mining, SDM 2010 - Columbus, OH, United States Duration: Apr 29 2010 → May 1 2010

Other

Other	10th SIAM International Conference on Data Mining, SDM 2010
Country/Territory	United States
City	Columbus, OH
Period	4/29/10 → 5/1/10

Access

10.1137/1.9781611972801.20

OpenUrl availability

Full text

Cite this

@conference{8cdc94e1fac64037b2c1c058d8b9a12f,

title = "Residual Bayesian co-clustering for matrix approximation",

abstract = "In recent years, matrix approximation for missing value prediction has emerged as an important problem in a variety of domains such as recommendation systems, e-commerce and online advertisement. While matrix factorization based algorithms typically have good approximation accuracy, such algorithms can be slow especially for large matrices. Further, such algorithms cannot naturally make prediction on new rows or columns. In this paper, we propose residual Bayesian co-clustering (RBC), which learns a generative model corresponding to the matrix from the non-missing entries, and uses the model to predict the missing entries. RBC is an extension of Bayesian co-clustering by taking row and column bias into consideration. The model allows mixed memberships of rows and columns to multiple clusters, and can naturally handle the prediction on new rows and columns which are not used in the training process, given only a few non-missing entries in them. We propose two variational inference based algorithms for learning the model and predicting missing entries. One of the proposed algorithms leads to a parallel RBC which can achieve significant speed-ups. The efficacy of RBC is demonstrated by extensive experimental comparisons with state-of-the-art algorithms on real world datasets.",

author = "Hanhuai Shan and Arindam Banerjee",

year = "2010",

doi = "10.1137/1.9781611972801.20",

language = "English (US)",

pages = "223--234",

note = "10th SIAM International Conference on Data Mining, SDM 2010 ; Conference date: 29-04-2010 Through 01-05-2010",

}

TY - CONF

T1 - Residual Bayesian co-clustering for matrix approximation

AU - Shan, Hanhuai

AU - Banerjee, Arindam

PY - 2010

Y1 - 2010

N2 - In recent years, matrix approximation for missing value prediction has emerged as an important problem in a variety of domains such as recommendation systems, e-commerce and online advertisement. While matrix factorization based algorithms typically have good approximation accuracy, such algorithms can be slow especially for large matrices. Further, such algorithms cannot naturally make prediction on new rows or columns. In this paper, we propose residual Bayesian co-clustering (RBC), which learns a generative model corresponding to the matrix from the non-missing entries, and uses the model to predict the missing entries. RBC is an extension of Bayesian co-clustering by taking row and column bias into consideration. The model allows mixed memberships of rows and columns to multiple clusters, and can naturally handle the prediction on new rows and columns which are not used in the training process, given only a few non-missing entries in them. We propose two variational inference based algorithms for learning the model and predicting missing entries. One of the proposed algorithms leads to a parallel RBC which can achieve significant speed-ups. The efficacy of RBC is demonstrated by extensive experimental comparisons with state-of-the-art algorithms on real world datasets.

AB - In recent years, matrix approximation for missing value prediction has emerged as an important problem in a variety of domains such as recommendation systems, e-commerce and online advertisement. While matrix factorization based algorithms typically have good approximation accuracy, such algorithms can be slow especially for large matrices. Further, such algorithms cannot naturally make prediction on new rows or columns. In this paper, we propose residual Bayesian co-clustering (RBC), which learns a generative model corresponding to the matrix from the non-missing entries, and uses the model to predict the missing entries. RBC is an extension of Bayesian co-clustering by taking row and column bias into consideration. The model allows mixed memberships of rows and columns to multiple clusters, and can naturally handle the prediction on new rows and columns which are not used in the training process, given only a few non-missing entries in them. We propose two variational inference based algorithms for learning the model and predicting missing entries. One of the proposed algorithms leads to a parallel RBC which can achieve significant speed-ups. The efficacy of RBC is demonstrated by extensive experimental comparisons with state-of-the-art algorithms on real world datasets.

UR - http://www.scopus.com/inward/record.url?scp=84877254919&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84877254919&partnerID=8YFLogxK

U2 - 10.1137/1.9781611972801.20

DO - 10.1137/1.9781611972801.20

M3 - Paper

AN - SCOPUS:84877254919

SP - 223

EP - 234

T2 - 10th SIAM International Conference on Data Mining, SDM 2010

Y2 - 29 April 2010 through 1 May 2010

ER -

Residual Bayesian co-clustering for matrix approximation

Abstract

Other

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this