Finding frequent patterns using length-decreasing support constraints

Masakazu Seno; George Karypis

doi:10.1007/s10618-005-0364-0

Finding frequent patterns using length-decreasing support constraints

Masakazu Seno, George Karypis

Computer Science and Engineering

Research output: Contribution to journal › Article › peer-review

27 Scopus citations

Abstract

Finding prevalent patterns in large amount of data has been one of the major problems in the area of data mining. Particularly, the problem of finding frequent itemset or sequential patterns in very large databases has been studied extensively over the years, and a variety of algorithms have been developed for each problem. The key feature in most of these algorithms is that they use a constant support constraint to control the inherently exponential complexity of these two problems. In general, patterns that contain only a few items will tend to be interesting if they have a high support, whereas long patterns can still be interesting even if their support is relatively small. Ideally, we want to find all the frequent patterns whose support decreases as a function of their length without having to find many uninteresting infrequent short patterns. Developing such algorithms is particularly challenging because the downward closure property of the constant support constraint cannot be used to prune short infrequent patterns. In this paper we present two algorithms, LPMiner and SLPMiner. Given a length-decreasing support constraint, LPMiner finds all the frequent itemset patterns from an itemset database, and SLPMiner finds all the frequent sequential patterns from a sequential database. Each of these two algorithms combines a well-studied efficient algorithm for constant-support- based pattern discovery with three effective database pruning methods that dramatically reduce the runtime. Our experimental evaluations show that both LPMiner and SLPMiner, by effectively exploiting the length-decreasing support constraint, are up to two orders of magnitude faster, and their runtime increases gradually as the average length of the input patterns increases.

Original language	English (US)
Pages (from-to)	197-228
Number of pages	32
Journal	Data Mining and Knowledge Discovery
Volume	10
Issue number	3
DOIs	https://doi.org/10.1007/s10618-005-0364-0
State	Published - May 2005

Bibliographical note

Funding Information:
∗This work was supported by NSF CCR-9972519, EIA-9986042, ACI-9982274, ACI-0133464, and by Army High Performance Computing Research Center contract number DA/DAAG55-98-1-0441. Access to computing facilities was provided by the Minnesota Supercomputing Institute.

Keywords

Association rules
Data-mining
Frequent pattern discovery
Scalability

Access

10.1007/s10618-005-0364-0

OpenUrl availability

Full text

Cite this

@article{eef58df4d44b4bbeb24ff40f23b27edb,

title = "Finding frequent patterns using length-decreasing support constraints",

abstract = "Finding prevalent patterns in large amount of data has been one of the major problems in the area of data mining. Particularly, the problem of finding frequent itemset or sequential patterns in very large databases has been studied extensively over the years, and a variety of algorithms have been developed for each problem. The key feature in most of these algorithms is that they use a constant support constraint to control the inherently exponential complexity of these two problems. In general, patterns that contain only a few items will tend to be interesting if they have a high support, whereas long patterns can still be interesting even if their support is relatively small. Ideally, we want to find all the frequent patterns whose support decreases as a function of their length without having to find many uninteresting infrequent short patterns. Developing such algorithms is particularly challenging because the downward closure property of the constant support constraint cannot be used to prune short infrequent patterns. In this paper we present two algorithms, LPMiner and SLPMiner. Given a length-decreasing support constraint, LPMiner finds all the frequent itemset patterns from an itemset database, and SLPMiner finds all the frequent sequential patterns from a sequential database. Each of these two algorithms combines a well-studied efficient algorithm for constant-support- based pattern discovery with three effective database pruning methods that dramatically reduce the runtime. Our experimental evaluations show that both LPMiner and SLPMiner, by effectively exploiting the length-decreasing support constraint, are up to two orders of magnitude faster, and their runtime increases gradually as the average length of the input patterns increases.",

keywords = "Association rules, Data-mining, Frequent pattern discovery, Scalability",

author = "Masakazu Seno and George Karypis",

note = "Funding Information: ∗This work was supported by NSF CCR-9972519, EIA-9986042, ACI-9982274, ACI-0133464, and by Army High Performance Computing Research Center contract number DA/DAAG55-98-1-0441. Access to computing facilities was provided by the Minnesota Supercomputing Institute.",

year = "2005",

month = may,

doi = "10.1007/s10618-005-0364-0",

language = "English (US)",

volume = "10",

pages = "197--228",

journal = "Data Mining and Knowledge Discovery",

issn = "1384-5810",

publisher = "Springer Netherlands",

number = "3",

}

TY - JOUR

T1 - Finding frequent patterns using length-decreasing support constraints

AU - Seno, Masakazu

AU - Karypis, George

N1 - Funding Information: ∗This work was supported by NSF CCR-9972519, EIA-9986042, ACI-9982274, ACI-0133464, and by Army High Performance Computing Research Center contract number DA/DAAG55-98-1-0441. Access to computing facilities was provided by the Minnesota Supercomputing Institute.

PY - 2005/5

Y1 - 2005/5

N2 - Finding prevalent patterns in large amount of data has been one of the major problems in the area of data mining. Particularly, the problem of finding frequent itemset or sequential patterns in very large databases has been studied extensively over the years, and a variety of algorithms have been developed for each problem. The key feature in most of these algorithms is that they use a constant support constraint to control the inherently exponential complexity of these two problems. In general, patterns that contain only a few items will tend to be interesting if they have a high support, whereas long patterns can still be interesting even if their support is relatively small. Ideally, we want to find all the frequent patterns whose support decreases as a function of their length without having to find many uninteresting infrequent short patterns. Developing such algorithms is particularly challenging because the downward closure property of the constant support constraint cannot be used to prune short infrequent patterns. In this paper we present two algorithms, LPMiner and SLPMiner. Given a length-decreasing support constraint, LPMiner finds all the frequent itemset patterns from an itemset database, and SLPMiner finds all the frequent sequential patterns from a sequential database. Each of these two algorithms combines a well-studied efficient algorithm for constant-support- based pattern discovery with three effective database pruning methods that dramatically reduce the runtime. Our experimental evaluations show that both LPMiner and SLPMiner, by effectively exploiting the length-decreasing support constraint, are up to two orders of magnitude faster, and their runtime increases gradually as the average length of the input patterns increases.

AB - Finding prevalent patterns in large amount of data has been one of the major problems in the area of data mining. Particularly, the problem of finding frequent itemset or sequential patterns in very large databases has been studied extensively over the years, and a variety of algorithms have been developed for each problem. The key feature in most of these algorithms is that they use a constant support constraint to control the inherently exponential complexity of these two problems. In general, patterns that contain only a few items will tend to be interesting if they have a high support, whereas long patterns can still be interesting even if their support is relatively small. Ideally, we want to find all the frequent patterns whose support decreases as a function of their length without having to find many uninteresting infrequent short patterns. Developing such algorithms is particularly challenging because the downward closure property of the constant support constraint cannot be used to prune short infrequent patterns. In this paper we present two algorithms, LPMiner and SLPMiner. Given a length-decreasing support constraint, LPMiner finds all the frequent itemset patterns from an itemset database, and SLPMiner finds all the frequent sequential patterns from a sequential database. Each of these two algorithms combines a well-studied efficient algorithm for constant-support- based pattern discovery with three effective database pruning methods that dramatically reduce the runtime. Our experimental evaluations show that both LPMiner and SLPMiner, by effectively exploiting the length-decreasing support constraint, are up to two orders of magnitude faster, and their runtime increases gradually as the average length of the input patterns increases.

KW - Association rules

KW - Data-mining

KW - Frequent pattern discovery

KW - Scalability

UR - http://www.scopus.com/inward/record.url?scp=22044458573&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=22044458573&partnerID=8YFLogxK

U2 - 10.1007/s10618-005-0364-0

DO - 10.1007/s10618-005-0364-0

M3 - Article

AN - SCOPUS:22044458573

SN - 1384-5810

VL - 10

SP - 197

EP - 228

JO - Data Mining and Knowledge Discovery

JF - Data Mining and Knowledge Discovery

IS - 3

ER -

Finding frequent patterns using length-decreasing support constraints

Abstract

Bibliographical note

Keywords

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this