Principal direction divisive partitioning

Research output: Contribution to journalArticlepeer-review

238 Scopus citations

Abstract

We propose a new algorithm capable of partitioning a set of documents or other samples based on an embedding in a high dimensional Euclidean space (i.e., in which every document is a vector of real numbers). The method is unusual in that it is divisive, as opposed to agglomerative, and operates by repeatedly splitting clusters into smaller clusters. The documents are assembled into a matrix which is very sparse. It is this sparsity that permits the algorithm to be very efficient. The performance of the method is illustrated with a set of text documents obtained from the World Wide Web. Some possible extensions are proposed for further investigation.

Original languageEnglish (US)
Pages (from-to)325-344
Number of pages20
JournalData Mining and Knowledge Discovery
Volume2
Issue number4
DOIs
StatePublished - 1998

Bibliographical note

Funding Information:
This research was partially supported by NSF grants CCR-9405380 and CCR-9628786.

Fingerprint

Dive into the research topics of 'Principal direction divisive partitioning'. Together they form a unique fingerprint.

Cite this