TY - GEN
T1 - Towards a scalable kNN CF algorithm
T2 - 8th International Workshop on Knowledge Discovery on the Web, WebKDD 2006
AU - Rashid, Al Mamunur
AU - Lam, Shyong K.
AU - LaPitz, Adam
AU - Karypis, George
AU - Riedl, John
PY - 2007
Y1 - 2007
N2 - Collaborative Filtering (CF)-based recommender systems bring mutual benefits to both users and the operators of the sites with too much information. Users benefit as they are able to find items of interest from an unmanageable number of available items. On the other hand, e-commerce sites that employ recommender systems can increase sales revenue in at least two ways: a) by drawing customers' attention to items that they are likely to buy, and b) by cross-selling items. However, the sheer number of customers and items typical in e-commerce systems demand specially designed CF algorithms that can gracefully cope with the vast size of the data. Many algorithms proposed thus fax, where the principal concern is recommendation quality, may be too expensive to operate in a large-scale system. We propose CLUSTKNN, a simple and intuitive algorithm that is well suited for large data sets. The method first compresses data tremendously by building a straightforward but efficient clustering model. Recommendations are then generated quickly by using a simple NEAREST NEIGHBOR-based approach. We demonstrate the feasibility of CLUSTKNN both analytically and empirically. We also show, by comparing with a number of other popular CF algorithms that, apart from being highly scalable and intuitive, CLUSTKNN provides very good recommendation accuracy as well.
AB - Collaborative Filtering (CF)-based recommender systems bring mutual benefits to both users and the operators of the sites with too much information. Users benefit as they are able to find items of interest from an unmanageable number of available items. On the other hand, e-commerce sites that employ recommender systems can increase sales revenue in at least two ways: a) by drawing customers' attention to items that they are likely to buy, and b) by cross-selling items. However, the sheer number of customers and items typical in e-commerce systems demand specially designed CF algorithms that can gracefully cope with the vast size of the data. Many algorithms proposed thus fax, where the principal concern is recommendation quality, may be too expensive to operate in a large-scale system. We propose CLUSTKNN, a simple and intuitive algorithm that is well suited for large data sets. The method first compresses data tremendously by building a straightforward but efficient clustering model. Recommendations are then generated quickly by using a simple NEAREST NEIGHBOR-based approach. We demonstrate the feasibility of CLUSTKNN both analytically and empirically. We also show, by comparing with a number of other popular CF algorithms that, apart from being highly scalable and intuitive, CLUSTKNN provides very good recommendation accuracy as well.
UR - http://www.scopus.com/inward/record.url?scp=38549134336&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=38549134336&partnerID=8YFLogxK
U2 - 10.1007/978-3-540-77485-3_9
DO - 10.1007/978-3-540-77485-3_9
M3 - Conference contribution
AN - SCOPUS:38549134336
SN - 354077484X
SN - 9783540774846
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 147
EP - 166
BT - Advances in Web Mining and Web Usage Analysis - 8th International Workshop on Knowledge Discovery on the Web, WebKDD 2006, Revised Papers
PB - Springer Verlag
Y2 - 20 August 2006 through 20 August 2006
ER -