Fast attribute-based table clustering using Predicate-Trees: A vertical data mining approach

Arjun G. Roy, Arijit Chatterjee, Mohammad K. Hossain, William Perrizo

Research output: Contribution to journalArticlepeer-review

3 Scopus citations

Abstract

With technological advancements, massive amount of data is being collected in various domains. For instance, since the advent of digital image technology and remote sensing imagery (RSI), NASA and U.S. Geological Survey through the Landsat Data Continuity Mission, has been capturing images of Earth down to 15 meters resolution. Likewise, consider the Internet, where, growth of social media, blog Web sites , etc. generates exponential amount of textual data on a daily basis. Since clustering of data is time-consuming, much of these data is archived even before proper analysis. In this paper, we propose two novel and extremely fast algorithms called imgFAUST or Fast Attribute-based Unsupervised and Supervised Table Clustering for images and a variation called docFAUST for textual data. Both these algorithms are based on Predicate-Trees which are compressed, lossless and data-mining-ready data structures. Without compromising much on the accuracy, our algorithms are fast and can be effectively used in high-speed image data and document analysis.

Original languageEnglish (US)
Pages (from-to)139-146
Number of pages8
JournalJournal of Computational Methods in Sciences and Engineering
Volume12
Issue numberSUPPL. 1
DOIs
StatePublished - 2012
Externally publishedYes

Keywords

  • Data mining
  • Predicate Trees
  • big data
  • vertical data processing

Fingerprint

Dive into the research topics of 'Fast attribute-based table clustering using Predicate-Trees: A vertical data mining approach'. Together they form a unique fingerprint.

Cite this