An efficient algorithm for discovering frequent subgraphs

Michihiro Kuramochi; George Karypis

doi:10.1109/TKDE.2004.33

An efficient algorithm for discovering frequent subgraphs

Michihiro Kuramochi, George Karypis

Computer Science and Engineering

Research output: Contribution to journal › Article › peer-review

248 Scopus citations

Abstract

Over the years, frequent itemset discovery algorithms have been used to find interesting patterns in various application areas. However, as data mining techniques are being increasingly applied to nontraditional domains, existing frequent pattern discovery approaches cannot be used. This is because the transaction framework that is assumed by these algorithms cannot be used to effectively model the data sets in these domains. An alternate way of modeling the objects in these data sets is to represent them using graphs. Within that model, one way of formulating the frequent pattern discovery problem is that of discovering subgraphs that occur frequently over the entire set of graphs. In this paper, we present a computationally efficient algorithm, called FSG, for finding all frequent subgraphs in large graph data sets. We experimentally evaluate the performance of FSG using a variety of real and synthetic data sets. Our results show that despite the underlying complexity associated with frequent subgraph discovery, FSG is effective in finding all frequently occurring subgraphs in data sets containing more than 200,000 graph transactions and scales linearly with respect to the size of the data set.

Original language	English (US)
Pages (from-to)	1038-1051
Number of pages	14
Journal	IEEE Transactions on Knowledge and Data Engineering
Volume	16
Issue number	9
DOIs	https://doi.org/10.1109/TKDE.2004.33
State	Published - Sep 2004

Bibliographical note

Funding Information:
This work was supported by the US National Science Foundation CCR-9972519, EIA-9986042, ACI-9982274 and ACI-0133464, by the US Army Research Office contract DA/DAAG55-98-1-0441, and by the US Army High Performance Computing Research Center contract number DAAH04-95-C-0008. Access to computing facilities was provided the by the Minnesota Supercomputing Institute. An earlier version of this work appeared in [29].

Keywords

Chemical compound data sets
Data mining
Frequent pattern discovery
Scientific data sets

Access

10.1109/TKDE.2004.33

OpenUrl availability

Full text

Cite this

@article{79b110e0e4c34a2898529aa4a4846beb,

title = "An efficient algorithm for discovering frequent subgraphs",

abstract = "Over the years, frequent itemset discovery algorithms have been used to find interesting patterns in various application areas. However, as data mining techniques are being increasingly applied to nontraditional domains, existing frequent pattern discovery approaches cannot be used. This is because the transaction framework that is assumed by these algorithms cannot be used to effectively model the data sets in these domains. An alternate way of modeling the objects in these data sets is to represent them using graphs. Within that model, one way of formulating the frequent pattern discovery problem is that of discovering subgraphs that occur frequently over the entire set of graphs. In this paper, we present a computationally efficient algorithm, called FSG, for finding all frequent subgraphs in large graph data sets. We experimentally evaluate the performance of FSG using a variety of real and synthetic data sets. Our results show that despite the underlying complexity associated with frequent subgraph discovery, FSG is effective in finding all frequently occurring subgraphs in data sets containing more than 200,000 graph transactions and scales linearly with respect to the size of the data set.",

keywords = "Chemical compound data sets, Data mining, Frequent pattern discovery, Scientific data sets",

author = "Michihiro Kuramochi and George Karypis",

note = "Funding Information: This work was supported by the US National Science Foundation CCR-9972519, EIA-9986042, ACI-9982274 and ACI-0133464, by the US Army Research Office contract DA/DAAG55-98-1-0441, and by the US Army High Performance Computing Research Center contract number DAAH04-95-C-0008. Access to computing facilities was provided the by the Minnesota Supercomputing Institute. An earlier version of this work appeared in [29].",

year = "2004",

month = sep,

doi = "10.1109/TKDE.2004.33",

language = "English (US)",

volume = "16",

pages = "1038--1051",

journal = "IEEE Transactions on Knowledge and Data Engineering",

issn = "1041-4347",

publisher = "IEEE Computer Society",

number = "9",

}

TY - JOUR

T1 - An efficient algorithm for discovering frequent subgraphs

AU - Kuramochi, Michihiro

AU - Karypis, George

N1 - Funding Information: This work was supported by the US National Science Foundation CCR-9972519, EIA-9986042, ACI-9982274 and ACI-0133464, by the US Army Research Office contract DA/DAAG55-98-1-0441, and by the US Army High Performance Computing Research Center contract number DAAH04-95-C-0008. Access to computing facilities was provided the by the Minnesota Supercomputing Institute. An earlier version of this work appeared in [29].

PY - 2004/9

Y1 - 2004/9

N2 - Over the years, frequent itemset discovery algorithms have been used to find interesting patterns in various application areas. However, as data mining techniques are being increasingly applied to nontraditional domains, existing frequent pattern discovery approaches cannot be used. This is because the transaction framework that is assumed by these algorithms cannot be used to effectively model the data sets in these domains. An alternate way of modeling the objects in these data sets is to represent them using graphs. Within that model, one way of formulating the frequent pattern discovery problem is that of discovering subgraphs that occur frequently over the entire set of graphs. In this paper, we present a computationally efficient algorithm, called FSG, for finding all frequent subgraphs in large graph data sets. We experimentally evaluate the performance of FSG using a variety of real and synthetic data sets. Our results show that despite the underlying complexity associated with frequent subgraph discovery, FSG is effective in finding all frequently occurring subgraphs in data sets containing more than 200,000 graph transactions and scales linearly with respect to the size of the data set.

AB - Over the years, frequent itemset discovery algorithms have been used to find interesting patterns in various application areas. However, as data mining techniques are being increasingly applied to nontraditional domains, existing frequent pattern discovery approaches cannot be used. This is because the transaction framework that is assumed by these algorithms cannot be used to effectively model the data sets in these domains. An alternate way of modeling the objects in these data sets is to represent them using graphs. Within that model, one way of formulating the frequent pattern discovery problem is that of discovering subgraphs that occur frequently over the entire set of graphs. In this paper, we present a computationally efficient algorithm, called FSG, for finding all frequent subgraphs in large graph data sets. We experimentally evaluate the performance of FSG using a variety of real and synthetic data sets. Our results show that despite the underlying complexity associated with frequent subgraph discovery, FSG is effective in finding all frequently occurring subgraphs in data sets containing more than 200,000 graph transactions and scales linearly with respect to the size of the data set.

KW - Chemical compound data sets

KW - Data mining

KW - Frequent pattern discovery

KW - Scientific data sets

UR - http://www.scopus.com/inward/record.url?scp=4544385908&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=4544385908&partnerID=8YFLogxK

U2 - 10.1109/TKDE.2004.33

DO - 10.1109/TKDE.2004.33

M3 - Article

AN - SCOPUS:4544385908

SN - 1041-4347

VL - 16

SP - 1038

EP - 1051

JO - IEEE Transactions on Knowledge and Data Engineering

JF - IEEE Transactions on Knowledge and Data Engineering

IS - 9

ER -

An efficient algorithm for discovering frequent subgraphs

Abstract

Bibliographical note

Keywords

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this