Chameleon, an agglomerate hierarchical clustering algorithm, is described. Accounting for both interconnectivity and closeness in identifying the most similar pair of clusters, the algorithm considers the internal characteristics of the clusters themselves in modeling the degree of interconnectivity and closeness between each pair of clusters. A sparse graph representation allows scaling to large data sets and successful use of data sets that are available only in similarity and not in metric spaces. The clusters in the data set are found using a graph-partitioning algorithm to cluster the data items into several relatively small subclusters, followed by another algorithm to find the genuine clusters by repeatly combining the subclusters.
Bibliographical noteFunding Information:
This work was partially supported by Army Research Office contract DA/DAAG55-98-1-0441 and the Army High-Performance Computing Research Center under the auspices of the Department of the Army, Army Research Laboratory cooperative agreement number DAAH04-95-2-0003/contract number DAAH04-95-C-0008, the content of which does not necessarily reflect the government’s position or policy, and should infer no official endorsement. This work was also supported by an IBM Partnership Award. Access to computing facilities was provided by the Army High-Performance Computing Research Center and the Minnesota Supercomputer Institute.