Multi-way clustering on relation graphs

Arindam Banerjee; Sugato Basu; Srujana Merugu

doi:10.1137/1.9781611972771.14

Multi-way clustering on relation graphs

Arindam Banerjee, Sugato Basu, Srujana Merugu

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

86 Scopus citations

Abstract

A number of real-world domains such as social networks and e-commerce involve heterogeneous data that describes relations between multiple classes of entities. Understanding the natural structure of this type of heterogeneous relational data is essential both for exploratory analysis and for performing various predictive modeling tasks. In this paper, we propose a principled multi-way clustering framework for relational data, wherein different types of entities are simultaneously clustered based not only on their intrinsic attribute values, but also on the multiple relations between the entities. To achieve this, we introduce a relation graph model that describes all the known relations between the different entity classes, in which each relation between a given set of entity classes is represented in the form of multi-modal tensor over an appropriate domain. Our multi-way clustering formulation is driven by the objective of capturing the maximal "information" in the original relation graph, i.e., accurately approximating the set of tensors corresponding to the various relations. This formulation is applicable to all Bregman divergences (a broad family of loss functions that includes squared Euclidean distance, KL-divergence), and also permits analysis of mixed data types using convex combinations of appropriate Bregman loss functions. Furthermore, we present a large family of structurally different multi-way clustering schemes that preserve various linear summary statistics of the original data. We accomplish the above generalizations by extending a recently proposed key theoretical result, namely the minimum Bregman information principle [1], to the relation graph setting. We also describe an efficient multi-way clustering algorithm based on alternate minimization that generalizes a number of other recently proposed clustering methods. Empirical results on datasets obtained from real-world domains (e.g., movie recommendations, newsgroup articles) demonstrate the generality and efficacy of our framework.

Original language	English (US)
Title of host publication	Proceedings of the 7th SIAM International Conference on Data Mining
Publisher	Society for Industrial and Applied Mathematics Publications
Pages	145-156
Number of pages	12
ISBN (Print)	9780898716306
DOIs	https://doi.org/10.1137/1.9781611972771.14
State	Published - 2007
Externally published	Yes
Event	7th SIAM International Conference on Data Mining - Minneapolis, MN, United States Duration: Apr 26 2007 → Apr 28 2007

Publication series

Name	Proceedings of the 7th SIAM International Conference on Data Mining

Other

Other	7th SIAM International Conference on Data Mining
Country/Territory	United States
City	Minneapolis, MN
Period	4/26/07 → 4/28/07

Access

10.1137/1.9781611972771.14

OpenUrl availability

Full text

Cite this

Multi-way clustering on relation graphs. / Banerjee, Arindam; Basu, Sugato; Merugu, Srujana.
Proceedings of the 7th SIAM International Conference on Data Mining. Society for Industrial and Applied Mathematics Publications, 2007. p. 145-156 (Proceedings of the 7th SIAM International Conference on Data Mining).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Banerjee, A, Basu, S & Merugu, S 2007, Multi-way clustering on relation graphs. in Proceedings of the 7th SIAM International Conference on Data Mining. Proceedings of the 7th SIAM International Conference on Data Mining, Society for Industrial and Applied Mathematics Publications, pp. 145-156, 7th SIAM International Conference on Data Mining, Minneapolis, MN, United States, 4/26/07. https://doi.org/10.1137/1.9781611972771.14

@inproceedings{bb9b514c3d734143903039e5f300be2a,

title = "Multi-way clustering on relation graphs",

abstract = "A number of real-world domains such as social networks and e-commerce involve heterogeneous data that describes relations between multiple classes of entities. Understanding the natural structure of this type of heterogeneous relational data is essential both for exploratory analysis and for performing various predictive modeling tasks. In this paper, we propose a principled multi-way clustering framework for relational data, wherein different types of entities are simultaneously clustered based not only on their intrinsic attribute values, but also on the multiple relations between the entities. To achieve this, we introduce a relation graph model that describes all the known relations between the different entity classes, in which each relation between a given set of entity classes is represented in the form of multi-modal tensor over an appropriate domain. Our multi-way clustering formulation is driven by the objective of capturing the maximal {"}information{"} in the original relation graph, i.e., accurately approximating the set of tensors corresponding to the various relations. This formulation is applicable to all Bregman divergences (a broad family of loss functions that includes squared Euclidean distance, KL-divergence), and also permits analysis of mixed data types using convex combinations of appropriate Bregman loss functions. Furthermore, we present a large family of structurally different multi-way clustering schemes that preserve various linear summary statistics of the original data. We accomplish the above generalizations by extending a recently proposed key theoretical result, namely the minimum Bregman information principle [1], to the relation graph setting. We also describe an efficient multi-way clustering algorithm based on alternate minimization that generalizes a number of other recently proposed clustering methods. Empirical results on datasets obtained from real-world domains (e.g., movie recommendations, newsgroup articles) demonstrate the generality and efficacy of our framework.",

author = "Arindam Banerjee and Sugato Basu and Srujana Merugu",

year = "2007",

doi = "10.1137/1.9781611972771.14",

language = "English (US)",

isbn = "9780898716306",

series = "Proceedings of the 7th SIAM International Conference on Data Mining",

publisher = "Society for Industrial and Applied Mathematics Publications",

pages = "145--156",

booktitle = "Proceedings of the 7th SIAM International Conference on Data Mining",

note = "7th SIAM International Conference on Data Mining ; Conference date: 26-04-2007 Through 28-04-2007",

}

TY - GEN

T1 - Multi-way clustering on relation graphs

AU - Banerjee, Arindam

AU - Basu, Sugato

AU - Merugu, Srujana

PY - 2007

Y1 - 2007

N2 - A number of real-world domains such as social networks and e-commerce involve heterogeneous data that describes relations between multiple classes of entities. Understanding the natural structure of this type of heterogeneous relational data is essential both for exploratory analysis and for performing various predictive modeling tasks. In this paper, we propose a principled multi-way clustering framework for relational data, wherein different types of entities are simultaneously clustered based not only on their intrinsic attribute values, but also on the multiple relations between the entities. To achieve this, we introduce a relation graph model that describes all the known relations between the different entity classes, in which each relation between a given set of entity classes is represented in the form of multi-modal tensor over an appropriate domain. Our multi-way clustering formulation is driven by the objective of capturing the maximal "information" in the original relation graph, i.e., accurately approximating the set of tensors corresponding to the various relations. This formulation is applicable to all Bregman divergences (a broad family of loss functions that includes squared Euclidean distance, KL-divergence), and also permits analysis of mixed data types using convex combinations of appropriate Bregman loss functions. Furthermore, we present a large family of structurally different multi-way clustering schemes that preserve various linear summary statistics of the original data. We accomplish the above generalizations by extending a recently proposed key theoretical result, namely the minimum Bregman information principle [1], to the relation graph setting. We also describe an efficient multi-way clustering algorithm based on alternate minimization that generalizes a number of other recently proposed clustering methods. Empirical results on datasets obtained from real-world domains (e.g., movie recommendations, newsgroup articles) demonstrate the generality and efficacy of our framework.

AB - A number of real-world domains such as social networks and e-commerce involve heterogeneous data that describes relations between multiple classes of entities. Understanding the natural structure of this type of heterogeneous relational data is essential both for exploratory analysis and for performing various predictive modeling tasks. In this paper, we propose a principled multi-way clustering framework for relational data, wherein different types of entities are simultaneously clustered based not only on their intrinsic attribute values, but also on the multiple relations between the entities. To achieve this, we introduce a relation graph model that describes all the known relations between the different entity classes, in which each relation between a given set of entity classes is represented in the form of multi-modal tensor over an appropriate domain. Our multi-way clustering formulation is driven by the objective of capturing the maximal "information" in the original relation graph, i.e., accurately approximating the set of tensors corresponding to the various relations. This formulation is applicable to all Bregman divergences (a broad family of loss functions that includes squared Euclidean distance, KL-divergence), and also permits analysis of mixed data types using convex combinations of appropriate Bregman loss functions. Furthermore, we present a large family of structurally different multi-way clustering schemes that preserve various linear summary statistics of the original data. We accomplish the above generalizations by extending a recently proposed key theoretical result, namely the minimum Bregman information principle [1], to the relation graph setting. We also describe an efficient multi-way clustering algorithm based on alternate minimization that generalizes a number of other recently proposed clustering methods. Empirical results on datasets obtained from real-world domains (e.g., movie recommendations, newsgroup articles) demonstrate the generality and efficacy of our framework.

UR - http://www.scopus.com/inward/record.url?scp=70449102572&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=70449102572&partnerID=8YFLogxK

U2 - 10.1137/1.9781611972771.14

DO - 10.1137/1.9781611972771.14

M3 - Conference contribution

AN - SCOPUS:70449102572

SN - 9780898716306

T3 - Proceedings of the 7th SIAM International Conference on Data Mining

SP - 145

EP - 156

BT - Proceedings of the 7th SIAM International Conference on Data Mining

PB - Society for Industrial and Applied Mathematics Publications

T2 - 7th SIAM International Conference on Data Mining

Y2 - 26 April 2007 through 28 April 2007

ER -

Multi-way clustering on relation graphs

Abstract

Publication series

Other

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this