Abstract
Sequential pattern mining aims to find the subsequences in a sequence database that appear together in the order of timestamps. Although there exist sequential pattern mining techniques, they ignore the temporal relationship information between the itemsets in the subsequences. This information is important in many real-world applications. For example, even if healthcare providers know that symptom Y frequently occurs after symptom X, it is also valuable for them to be able to estimate when Y will occur after X so that they can provide treatment at the right time. Considering temporal relationship information for sequential pattern mining raises new issues to be solved, such as designing a new data structure to save this information and traversing this structure efficiently to discover patterns without re-scanning the database. In this paper, we propose an algorithm called Minits-AllOcc (MINIng Timed Sequential Pattern for All-time Occurrences) to find sequential patterns and the transition time between itemsets based on all possible occurrences of a pattern in the database. We also propose a parallel multicore CPU version of this algorithm, called MMinits-AllOcc (Multicore Minits-AllOcc), to deal with Big Data. Extensive experiments on real and synthetic datasets show the advantages of this approach over the brute-force method. Also, the multicore CPU version of the algorithm is shown to outperform the single-core version on Big Data by 2.5X.
Original language | English (US) |
---|---|
Title of host publication | Advances in Knowledge Discovery and Data Mining - 25th Pacific-Asia Conference, PAKDD 2021, Proceedings |
Editors | Kamal Karlapalem, Hong Cheng, Naren Ramakrishnan, R. K. Agrawal, P. Krishna Reddy, Jaideep Srivastava, Tanmoy Chakraborty |
Publisher | Springer Science and Business Media Deutschland GmbH |
Pages | 668-685 |
Number of pages | 18 |
ISBN (Print) | 9783030757618 |
DOIs | |
State | Published - 2021 |
Event | 25th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2021 - Virtual, Online Duration: May 11 2021 → May 14 2021 |
Publication series
Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|
Volume | 12712 LNAI |
ISSN (Print) | 0302-9743 |
ISSN (Electronic) | 1611-3349 |
Conference
Conference | 25th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2021 |
---|---|
City | Virtual, Online |
Period | 5/11/21 → 5/14/21 |
Bibliographical note
Publisher Copyright:© 2021, Springer Nature Switzerland AG.
Keywords
- Multicore
- Parallel sequential pattern mining
- Sequential pattern mining
- Timed sequential pattern