Efficient identification of tanimoto nearest neighbors

David C. Anastasiu, George Karypis

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Scopus citations

Abstract

Tanimoto, or (extended) Jaccard, is an important similarity measure which has seen prominent use in fields such as data mining and chemoinformatics. Many of the existing state-of-The-Art methods for market-basket analysis, plagiarism and anomaly detection, compound database search, and ligand-based virtual screening rely heavily on identifying Tanimoto nearest neighbors. Given the rapidly increasing size of data that must be analyzed, new algorithms are needed that can speed up nearest neighbor search, yet provide reliable results. While many search algorithms address the complexity of the task by retrieving only some of the nearest neighbors, we propose a method that finds all of the exact nearest neighbors efficiently by leveraging recent advances in similarity search filtering. We provide tighter filtering bounds for the Tanimoto coefficient and show that our method, TAPNN, greatly outperforms existing baselines across a variety of real-world datasets and similarity thresholds.

Original languageEnglish (US)
Title of host publicationProceedings - 3rd IEEE International Conference on Data Science and Advanced Analytics, DSAA 2016
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages156-165
Number of pages10
ISBN (Electronic)9781509052066
DOIs
StatePublished - Dec 22 2016
Event3rd IEEE International Conference on Data Science and Advanced Analytics, DSAA 2016 - Montreal, Canada
Duration: Oct 17 2016Oct 19 2016

Publication series

NameProceedings - 3rd IEEE International Conference on Data Science and Advanced Analytics, DSAA 2016

Other

Other3rd IEEE International Conference on Data Science and Advanced Analytics, DSAA 2016
Country/TerritoryCanada
CityMontreal
Period10/17/1610/19/16

Keywords

  • All-pairs
  • Extended Jaccard
  • Graph construction
  • NNG
  • Nearest neighbors
  • Similarity graph
  • Similarity search
  • Tanimoto

Fingerprint

Dive into the research topics of 'Efficient identification of tanimoto nearest neighbors'. Together they form a unique fingerprint.

Cite this