PL2AP: Fast parallel cosine similarity search

David C. Anastasiu, George Karypis

Research output: Chapter in Book/Report/Conference proceedingConference contribution

8 Scopus citations

Abstract

Solving the AllPairs similarity search problem entails finding all pairs of vectors in a high dimensional sparse dataset that have a similarity value higher than a given threshold. The output form this problem is a crucial component in many real-world applications, such as clustering, online advertising, recommender systems, near-duplicate document detection, and query refinement. A number of serial algorithms have been proposed that solve the problem by pruning many of the possible similarity candidates for each query object, after accessing only a few of their non-zero values. The pruning process results in unpredictable memory access patterns that can reduce search efficiency. In this context, we introduce pL2AP, which efficiently solves the AllPairs cosine similarity search problem in a multi-core environment. Our method uses a number of cache-tiling optimizations, combined with fine-grained dynamically balanced parallel tasks, to solve the problem 1.5x- 238x faster than existing parallel baselines on datasets with hundreds of millions of non-zeros.

Original languageEnglish (US)
Title of host publicationProceedings of the 5th Workshop on Irregular Applications
Subtitle of host publicationArchitectures and Algorithms, IA3 2015
PublisherAssociation for Computing Machinery, Inc
ISBN (Electronic)9781450340014
DOIs
StatePublished - Nov 15 2015
Event5th Workshop on Irregular Applications: Architectures and Algorithms, IA3 2015 - Austin, United States
Duration: Nov 15 2015 → …

Publication series

NameProceedings of the 5th Workshop on Irregular Applications: Architectures and Algorithms, IA3 2015

Other

Other5th Workshop on Irregular Applications: Architectures and Algorithms, IA3 2015
Country/TerritoryUnited States
CityAustin
Period11/15/15 → …

Bibliographical note

Funding Information:
This work was supported in part by NSF (IIS-0905220, OCI- 1048018, CNS-1162405, IIS-1247632, IIP-1414153, IIS-1447788), Army Research Office (W911NF-14-1-0316), Intel Software and Services Group, and the Digital Technology Center at the University of Minnesota. Access to research and computing facilities was provided by the Digital Technology Center and the Minnesota Supercomputing Institute.

Publisher Copyright:
© 2015 ACM.

Keywords

  • Bounded cosine similarity graph
  • Cosine similarity
  • Similarity join
  • Similarity search

Fingerprint

Dive into the research topics of 'PL2AP: Fast parallel cosine similarity search'. Together they form a unique fingerprint.

Cite this