High performance data mining

Vipin Kumar, Mahesh V. Joshi, Eui Hong Han, Pang Ning Tan, Michael Steinbach

Research output: Chapter in Book/Report/Conference proceedingChapter

Abstract

Recent times have seen an explosive growth in the availability of various kinds of data. It has resulted in an unprecedented opportunity to develop automated data-driven techniques of extracting useful knowledge. Data mining, an important step in this process of knowledge discovery, consists of methods that discover interesting, non-trivial, and useful patterns hidden in the data [SAD+93, CHY96]. The field of data mining builds upon the ideas from diverse fields such as machine learning, pattern recognition, statistics, database systems, and data visualization. But, techniques developed in these traditional disciplines are often unsuitable due to some unique characteristics of today's data-sets, such as their enormous sizes, high-dimensionality, and heterogeneity. There is a necessity to develop effective parallel algorithms for various data mining techniques. However, designing such algorithms is challenging, and the main focus of the paper is a description of the parallel formulations of two important data mining algorithms: discovery of association rules, and induction of decision trees for classification. We also briefly discuss an application of data mining to the analysis of large data sets collected by Earth observing satellites that need to be processed to better understand global scale changes in biosphere processes and patterns.

Original languageEnglish (US)
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
EditorsJose M. L. M. Palma, A. Augusto Sousa, Jack Dongarra, Vicente Hernandez
PublisherSpringer Verlag
Pages111-125
Number of pages15
ISBN (Print)3540008527, 9783540008521
DOIs
StatePublished - 2003

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume2565
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Fingerprint

Dive into the research topics of 'High performance data mining'. Together they form a unique fingerprint.

Cite this