HICAP: Hierarchical clustering with pattern preservation

Hui Xiong, Michael S Steinbach, Pang Ning Tan, Vipin Kumar

Research output: Chapter in Book/Report/Conference proceedingConference contribution

26 Scopus citations

Abstract

This paper describes a new approach for clustering - pattern preserving clustering - which produces more easily interpretable and usable clusters. This approach is motivated by the following observation: while there are usually strong patterns in the data-patterns that may be key for the analysis and description of the data-these patterns are often split among different clusters by current clustering approaches. This is, perhaps, not surprising, since clustering algorithms have no built in knowledge of these patterns and may often have goals that are in conflict with preserving patterns, e.g., minimize the distance of points to their nearest cluster centroids. Also, patterns are typically overlapping, i.e., may involve some of the same objects, and if the clustering algorithm produces disjoint clusters, then some patterns must be split when the objects are clustered. In this paper we describe a technique for pattern preserving clustering that first finds patterns composed of tightly connected groups of objects or attributes and then, starting from these patterns, performs agglomerative clustering using the Group Average (UPGMA) technique. We present the results of some experiments on document data that compare our approach, HIerarchical Clustering with PAttern Preservation (HICAP), to two other clustering techniques: bisecting K-means and traditional UPGMA. These results show that, despite the extra constraint of pattern preservation, HICAP has performance very much like traditional UPGMA with respect to the cluster evaluation criteria of entropy and F-measure. More importantly, we also illustrate how patterns, if preserved, can aid cluster interpretation.

Original languageEnglish (US)
Title of host publicationProceedings of the Fourth SIAM International Conference on Data Mining
EditorsM.W. Berry, U. Dayal, C. Kamath, D. Skillicorn
Pages279-290
Number of pages12
StatePublished - Jun 22 2004
EventProceedings of the Fourth SIAM International Conference on Data Mining - Lake Buena Vista, FL, United States
Duration: Apr 22 2004Apr 24 2004

Other

OtherProceedings of the Fourth SIAM International Conference on Data Mining
Country/TerritoryUnited States
CityLake Buena Vista, FL
Period4/22/044/24/04

Keywords

  • Cluster Analysis
  • Hyperclique Pattern
  • Pattern Preserving Clustering

Fingerprint

Dive into the research topics of 'HICAP: Hierarchical clustering with pattern preservation'. Together they form a unique fingerprint.

Cite this