L2Knng: Fast exact K-nearest neighbor graph construction with L2-norm pruning

David C. Anastasiu; George Karypis

doi:10.1145/2806416.2806534

L2Knng: Fast exact K-nearest neighbor graph construction with L2-norm pruning

David C. Anastasiu, George Karypis

Computer Science and Engineering

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

24 Scopus citations

Abstract

The k-nearest neighbor graph is often used as a building block in information retrieval, clustering, online advertising, and recommender systems algorithms. The complexity of constructing the exact k-nearest neighbor graph is quadratic on the number of objects that are compared, and most existing methods solve the problem approximately. We present L2Knng, an efficient algorithm that finds the exact cosine similarity k-nearest neighbor graph for a set of sparse high-dimensional objects. Our algorithm quickly builds an approximate solution to the problem, identifying many of the most similar neighbors, and then uses theoretic bounds on the similarity of two vectors, based on the ℓ²-norm of part of the vectors, to find each object's exact k-neighborhood. We perform an extensive evaluation of our algorithm, comparing against both exact and approximate baselines, and demonstrate the efficiency of our method across a variety of real-world datasets and neighborhood sizes. Our approximate and exact L2Knng variants compute the k-nearest neighbor graph up to an order of magnitude faster than their respective baselines.

Original language	English (US)
Title of host publication	CIKM 2015 - Proceedings of the 24th ACM International Conference on Information and Knowledge Management
Publisher	Association for Computing Machinery
Pages	791-800
Number of pages	10
ISBN (Electronic)	9781450337946
DOIs	https://doi.org/10.1145/2806416.2806534
State	Published - Oct 17 2015
Event	24th ACM International Conference on Information and Knowledge Management, CIKM 2015 - Melbourne, Australia Duration: Oct 19 2015 → Oct 23 2015

Publication series

Name	International Conference on Information and Knowledge Management, Proceedings
Volume	19-23-Oct-2015

Other

Other	24th ACM International Conference on Information and Knowledge Management, CIKM 2015
Country/Territory	Australia
City	Melbourne
Period	10/19/15 → 10/23/15

Bibliographical note

Publisher Copyright:
© 2015 ACM.

Keywords

Cosine similarity
K-nearest neighbor graph
Similarity search
Top-k

Access

10.1145/2806416.2806534

OpenUrl availability

Full text

Cite this

Anastasiu, D. C., & Karypis, G. (2015). L2Knng: Fast exact K-nearest neighbor graph construction with L2-norm pruning. In CIKM 2015 - Proceedings of the 24th ACM International Conference on Information and Knowledge Management (pp. 791-800). (International Conference on Information and Knowledge Management, Proceedings; Vol. 19-23-Oct-2015). Association for Computing Machinery. https://doi.org/10.1145/2806416.2806534

L2Knng: Fast exact K-nearest neighbor graph construction with L2-norm pruning. / Anastasiu, David C.; Karypis, George.
CIKM 2015 - Proceedings of the 24th ACM International Conference on Information and Knowledge Management. Association for Computing Machinery, 2015. p. 791-800 (International Conference on Information and Knowledge Management, Proceedings; Vol. 19-23-Oct-2015).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Anastasiu, DC & Karypis, G 2015, L2Knng: Fast exact K-nearest neighbor graph construction with L2-norm pruning. in CIKM 2015 - Proceedings of the 24th ACM International Conference on Information and Knowledge Management. International Conference on Information and Knowledge Management, Proceedings, vol. 19-23-Oct-2015, Association for Computing Machinery, pp. 791-800, 24th ACM International Conference on Information and Knowledge Management, CIKM 2015, Melbourne, Australia, 10/19/15. https://doi.org/10.1145/2806416.2806534

Anastasiu DC, Karypis G. L2Knng: Fast exact K-nearest neighbor graph construction with L2-norm pruning. In CIKM 2015 - Proceedings of the 24th ACM International Conference on Information and Knowledge Management. Association for Computing Machinery. 2015. p. 791-800. (International Conference on Information and Knowledge Management, Proceedings). doi: 10.1145/2806416.2806534

@inproceedings{5f53e2fe3669496989e73636076b7bac,

title = "L2Knng: Fast exact K-nearest neighbor graph construction with L2-norm pruning",

abstract = "The k-nearest neighbor graph is often used as a building block in information retrieval, clustering, online advertising, and recommender systems algorithms. The complexity of constructing the exact k-nearest neighbor graph is quadratic on the number of objects that are compared, and most existing methods solve the problem approximately. We present L2Knng, an efficient algorithm that finds the exact cosine similarity k-nearest neighbor graph for a set of sparse high-dimensional objects. Our algorithm quickly builds an approximate solution to the problem, identifying many of the most similar neighbors, and then uses theoretic bounds on the similarity of two vectors, based on the ℓ2-norm of part of the vectors, to find each object's exact k-neighborhood. We perform an extensive evaluation of our algorithm, comparing against both exact and approximate baselines, and demonstrate the efficiency of our method across a variety of real-world datasets and neighborhood sizes. Our approximate and exact L2Knng variants compute the k-nearest neighbor graph up to an order of magnitude faster than their respective baselines.",

keywords = "Cosine similarity, K-nearest neighbor graph, Similarity search, Top-k",

author = "Anastasiu, {David C.} and George Karypis",

note = "Publisher Copyright: {\textcopyright} 2015 ACM.; 24th ACM International Conference on Information and Knowledge Management, CIKM 2015 ; Conference date: 19-10-2015 Through 23-10-2015",

year = "2015",

month = oct,

day = "17",

doi = "10.1145/2806416.2806534",

language = "English (US)",

series = "International Conference on Information and Knowledge Management, Proceedings",

publisher = "Association for Computing Machinery",

pages = "791--800",

booktitle = "CIKM 2015 - Proceedings of the 24th ACM International Conference on Information and Knowledge Management",

}

TY - GEN

T1 - L2Knng

T2 - 24th ACM International Conference on Information and Knowledge Management, CIKM 2015

AU - Anastasiu, David C.

AU - Karypis, George

PY - 2015/10/17

Y1 - 2015/10/17

N2 - The k-nearest neighbor graph is often used as a building block in information retrieval, clustering, online advertising, and recommender systems algorithms. The complexity of constructing the exact k-nearest neighbor graph is quadratic on the number of objects that are compared, and most existing methods solve the problem approximately. We present L2Knng, an efficient algorithm that finds the exact cosine similarity k-nearest neighbor graph for a set of sparse high-dimensional objects. Our algorithm quickly builds an approximate solution to the problem, identifying many of the most similar neighbors, and then uses theoretic bounds on the similarity of two vectors, based on the ℓ2-norm of part of the vectors, to find each object's exact k-neighborhood. We perform an extensive evaluation of our algorithm, comparing against both exact and approximate baselines, and demonstrate the efficiency of our method across a variety of real-world datasets and neighborhood sizes. Our approximate and exact L2Knng variants compute the k-nearest neighbor graph up to an order of magnitude faster than their respective baselines.

AB - The k-nearest neighbor graph is often used as a building block in information retrieval, clustering, online advertising, and recommender systems algorithms. The complexity of constructing the exact k-nearest neighbor graph is quadratic on the number of objects that are compared, and most existing methods solve the problem approximately. We present L2Knng, an efficient algorithm that finds the exact cosine similarity k-nearest neighbor graph for a set of sparse high-dimensional objects. Our algorithm quickly builds an approximate solution to the problem, identifying many of the most similar neighbors, and then uses theoretic bounds on the similarity of two vectors, based on the ℓ2-norm of part of the vectors, to find each object's exact k-neighborhood. We perform an extensive evaluation of our algorithm, comparing against both exact and approximate baselines, and demonstrate the efficiency of our method across a variety of real-world datasets and neighborhood sizes. Our approximate and exact L2Knng variants compute the k-nearest neighbor graph up to an order of magnitude faster than their respective baselines.

KW - Cosine similarity

KW - K-nearest neighbor graph

KW - Similarity search

KW - Top-k

UR - http://www.scopus.com/inward/record.url?scp=84958239935&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84958239935&partnerID=8YFLogxK

U2 - 10.1145/2806416.2806534

DO - 10.1145/2806416.2806534

M3 - Conference contribution

AN - SCOPUS:84958239935

T3 - International Conference on Information and Knowledge Management, Proceedings

SP - 791

EP - 800

BT - CIKM 2015 - Proceedings of the 24th ACM International Conference on Information and Knowledge Management

PB - Association for Computing Machinery

Y2 - 19 October 2015 through 23 October 2015

ER -

L2Knng: Fast exact K-nearest neighbor graph construction with L2-norm pruning

Abstract

Publication series

Other

Bibliographical note

Keywords

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this