Towards a scalable kNN CF algorithm: Exploring effective applications of clustering

Al Mamunur Rashid; Shyong K. Lam; Adam LaPitz; George Karypis; John Riedl

doi:10.1007/978-3-540-77485-3_9

Towards a scalable kNN CF algorithm: Exploring effective applications of clustering

Al Mamunur Rashid, Shyong K. Lam, Adam LaPitz, George Karypis, John Riedl

Computer Science and Engineering

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

8 Scopus citations

Abstract

Collaborative Filtering (CF)-based recommender systems bring mutual benefits to both users and the operators of the sites with too much information. Users benefit as they are able to find items of interest from an unmanageable number of available items. On the other hand, e-commerce sites that employ recommender systems can increase sales revenue in at least two ways: a) by drawing customers' attention to items that they are likely to buy, and b) by cross-selling items. However, the sheer number of customers and items typical in e-commerce systems demand specially designed CF algorithms that can gracefully cope with the vast size of the data. Many algorithms proposed thus fax, where the principal concern is recommendation quality, may be too expensive to operate in a large-scale system. We propose CLUSTKNN, a simple and intuitive algorithm that is well suited for large data sets. The method first compresses data tremendously by building a straightforward but efficient clustering model. Recommendations are then generated quickly by using a simple NEAREST NEIGHBOR-based approach. We demonstrate the feasibility of CLUSTKNN both analytically and empirically. We also show, by comparing with a number of other popular CF algorithms that, apart from being highly scalable and intuitive, CLUSTKNN provides very good recommendation accuracy as well.

Original language	English (US)
Title of host publication	Advances in Web Mining and Web Usage Analysis - 8th International Workshop on Knowledge Discovery on the Web, WebKDD 2006, Revised Papers
Publisher	Springer Verlag
Pages	147-166
Number of pages	20
ISBN (Print)	354077484X, 9783540774846
DOIs	https://doi.org/10.1007/978-3-540-77485-3_9
State	Published - 2007
Event	8th International Workshop on Knowledge Discovery on the Web, WebKDD 2006 - Philadelphia, PA, United States Duration: Aug 20 2006 → Aug 20 2006

Publication series

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume	4811 LNAI
ISSN (Print)	0302-9743
ISSN (Electronic)	1611-3349

Other

Other	8th International Workshop on Knowledge Discovery on the Web, WebKDD 2006
Country/Territory	United States
City	Philadelphia, PA
Period	8/20/06 → 8/20/06

Access

10.1007/978-3-540-77485-3_9

OpenUrl availability

Full text

Cite this

Rashid, A. M., Lam, S. K., LaPitz, A., Karypis, G., & Riedl, J. (2007). Towards a scalable kNN CF algorithm: Exploring effective applications of clustering. In Advances in Web Mining and Web Usage Analysis - 8th International Workshop on Knowledge Discovery on the Web, WebKDD 2006, Revised Papers (pp. 147-166). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 4811 LNAI). Springer Verlag. https://doi.org/10.1007/978-3-540-77485-3_9

Towards a scalable kNN CF algorithm: Exploring effective applications of clustering. / Rashid, Al Mamunur; Lam, Shyong K.; LaPitz, Adam et al.
Advances in Web Mining and Web Usage Analysis - 8th International Workshop on Knowledge Discovery on the Web, WebKDD 2006, Revised Papers. Springer Verlag, 2007. p. 147-166 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 4811 LNAI).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Rashid, AM, Lam, SK, LaPitz, A, Karypis, G & Riedl, J 2007, Towards a scalable kNN CF algorithm: Exploring effective applications of clustering. in Advances in Web Mining and Web Usage Analysis - 8th International Workshop on Knowledge Discovery on the Web, WebKDD 2006, Revised Papers. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 4811 LNAI, Springer Verlag, pp. 147-166, 8th International Workshop on Knowledge Discovery on the Web, WebKDD 2006, Philadelphia, PA, United States, 8/20/06. https://doi.org/10.1007/978-3-540-77485-3_9

Rashid AM, Lam SK, LaPitz A, Karypis G, Riedl J. Towards a scalable kNN CF algorithm: Exploring effective applications of clustering. In Advances in Web Mining and Web Usage Analysis - 8th International Workshop on Knowledge Discovery on the Web, WebKDD 2006, Revised Papers. Springer Verlag. 2007. p. 147-166. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-540-77485-3_9

Rashid, Al Mamunur ; Lam, Shyong K. ; LaPitz, Adam et al. / Towards a scalable kNN CF algorithm : Exploring effective applications of clustering. Advances in Web Mining and Web Usage Analysis - 8th International Workshop on Knowledge Discovery on the Web, WebKDD 2006, Revised Papers. Springer Verlag, 2007. pp. 147-166 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{5247ef28e7854949b35e60743ee93489,

title = "Towards a scalable kNN CF algorithm: Exploring effective applications of clustering",

abstract = "Collaborative Filtering (CF)-based recommender systems bring mutual benefits to both users and the operators of the sites with too much information. Users benefit as they are able to find items of interest from an unmanageable number of available items. On the other hand, e-commerce sites that employ recommender systems can increase sales revenue in at least two ways: a) by drawing customers' attention to items that they are likely to buy, and b) by cross-selling items. However, the sheer number of customers and items typical in e-commerce systems demand specially designed CF algorithms that can gracefully cope with the vast size of the data. Many algorithms proposed thus fax, where the principal concern is recommendation quality, may be too expensive to operate in a large-scale system. We propose CLUSTKNN, a simple and intuitive algorithm that is well suited for large data sets. The method first compresses data tremendously by building a straightforward but efficient clustering model. Recommendations are then generated quickly by using a simple NEAREST NEIGHBOR-based approach. We demonstrate the feasibility of CLUSTKNN both analytically and empirically. We also show, by comparing with a number of other popular CF algorithms that, apart from being highly scalable and intuitive, CLUSTKNN provides very good recommendation accuracy as well.",

author = "Rashid, {Al Mamunur} and Lam, {Shyong K.} and Adam LaPitz and George Karypis and John Riedl",

year = "2007",

doi = "10.1007/978-3-540-77485-3_9",

language = "English (US)",

isbn = "354077484X",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer Verlag",

pages = "147--166",

booktitle = "Advances in Web Mining and Web Usage Analysis - 8th International Workshop on Knowledge Discovery on the Web, WebKDD 2006, Revised Papers",

note = "8th International Workshop on Knowledge Discovery on the Web, WebKDD 2006 ; Conference date: 20-08-2006 Through 20-08-2006",

}

TY - GEN

T1 - Towards a scalable kNN CF algorithm

T2 - 8th International Workshop on Knowledge Discovery on the Web, WebKDD 2006

AU - Rashid, Al Mamunur

AU - Lam, Shyong K.

AU - LaPitz, Adam

AU - Karypis, George

AU - Riedl, John

PY - 2007

Y1 - 2007

N2 - Collaborative Filtering (CF)-based recommender systems bring mutual benefits to both users and the operators of the sites with too much information. Users benefit as they are able to find items of interest from an unmanageable number of available items. On the other hand, e-commerce sites that employ recommender systems can increase sales revenue in at least two ways: a) by drawing customers' attention to items that they are likely to buy, and b) by cross-selling items. However, the sheer number of customers and items typical in e-commerce systems demand specially designed CF algorithms that can gracefully cope with the vast size of the data. Many algorithms proposed thus fax, where the principal concern is recommendation quality, may be too expensive to operate in a large-scale system. We propose CLUSTKNN, a simple and intuitive algorithm that is well suited for large data sets. The method first compresses data tremendously by building a straightforward but efficient clustering model. Recommendations are then generated quickly by using a simple NEAREST NEIGHBOR-based approach. We demonstrate the feasibility of CLUSTKNN both analytically and empirically. We also show, by comparing with a number of other popular CF algorithms that, apart from being highly scalable and intuitive, CLUSTKNN provides very good recommendation accuracy as well.

AB - Collaborative Filtering (CF)-based recommender systems bring mutual benefits to both users and the operators of the sites with too much information. Users benefit as they are able to find items of interest from an unmanageable number of available items. On the other hand, e-commerce sites that employ recommender systems can increase sales revenue in at least two ways: a) by drawing customers' attention to items that they are likely to buy, and b) by cross-selling items. However, the sheer number of customers and items typical in e-commerce systems demand specially designed CF algorithms that can gracefully cope with the vast size of the data. Many algorithms proposed thus fax, where the principal concern is recommendation quality, may be too expensive to operate in a large-scale system. We propose CLUSTKNN, a simple and intuitive algorithm that is well suited for large data sets. The method first compresses data tremendously by building a straightforward but efficient clustering model. Recommendations are then generated quickly by using a simple NEAREST NEIGHBOR-based approach. We demonstrate the feasibility of CLUSTKNN both analytically and empirically. We also show, by comparing with a number of other popular CF algorithms that, apart from being highly scalable and intuitive, CLUSTKNN provides very good recommendation accuracy as well.

UR - http://www.scopus.com/inward/record.url?scp=38549134336&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=38549134336&partnerID=8YFLogxK

U2 - 10.1007/978-3-540-77485-3_9

DO - 10.1007/978-3-540-77485-3_9

M3 - Conference contribution

AN - SCOPUS:38549134336

SN - 354077484X

SN - 9783540774846

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 147

EP - 166

BT - Advances in Web Mining and Web Usage Analysis - 8th International Workshop on Knowledge Discovery on the Web, WebKDD 2006, Revised Papers

PB - Springer Verlag

Y2 - 20 August 2006 through 20 August 2006

ER -

Towards a scalable kNN CF algorithm: Exploring effective applications of clustering

Abstract

Publication series

Other

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this