ParCube: Sparse parallelizable CANDECOMP-PARAFAC tensor decomposition

Evangelos E. Papalexakis; Christos Faloutsos; Nicholas D. Sidiropoulos

doi:10.1145/2729980

ParCube: Sparse parallelizable CANDECOMP-PARAFAC tensor decomposition

Evangelos E. Papalexakis, Christos Faloutsos, Nicholas D. Sidiropoulos

Electrical and Computer Engineering

Research output: Contribution to journal › Article › peer-review

28 Scopus citations

Abstract

How can we efficiently decompose a tensor into sparse factors, when the data do not fit in memory? Tensor decompositions have gained a steadily increasing popularity in data-mining applications; however, the current state-of-art decomposition algorithms operate on main memory and do not scale to truly large datasets. In this work, we propose PARCUBE, a new and highly parallelizable method for speeding up tensor decompositions that is well suited to produce sparse approximations. Experiments with even moderately large data indicate over 90% sparser outputs and 14 times faster execution, with approximation error close to the current state of the art irrespective of computation and memory requirements. We provide theoretical guarantees for the algorithm's correctness and we experimentally validate our claims through extensive experiments, including four different real world datasets (ENRON, LBNL, FACEBOOK and NELL), demonstrating its effectiveness for data-mining practitioners. In particular, we are the first to analyze the very large NELL dataset using a sparse tensor decomposition, demonstrating that PARCUBE enables us to handle effectively and efficiently very large datasets. Finally, we make our highly scalable parallel implementation publicly available, enabling reproducibility of our work.

Original language	English (US)
Article number	3
Journal	ACM Transactions on Knowledge Discovery from Data
Volume	10
Issue number	1
DOIs	https://doi.org/10.1145/2729980
State	Published - Jul 1 2015

Bibliographical note

Publisher Copyright:
© 2015 ACM.

Keywords

Algorithms
H.2.8 [database applications]: Data mining
H.3.3 [information search and retrieval]: Clustering
PARAFA decomposition
Parallel algorithms
Performance
Randomized algorithms
Sampling
Sparsity
Tensors

Access

10.1145/2729980

OpenUrl availability

Full text

Cite this

@article{9f7dc125f1cd41b39a74a26b8f945eb0,

title = "ParCube: Sparse parallelizable CANDECOMP-PARAFAC tensor decomposition",

abstract = "How can we efficiently decompose a tensor into sparse factors, when the data do not fit in memory? Tensor decompositions have gained a steadily increasing popularity in data-mining applications; however, the current state-of-art decomposition algorithms operate on main memory and do not scale to truly large datasets. In this work, we propose PARCUBE, a new and highly parallelizable method for speeding up tensor decompositions that is well suited to produce sparse approximations. Experiments with even moderately large data indicate over 90% sparser outputs and 14 times faster execution, with approximation error close to the current state of the art irrespective of computation and memory requirements. We provide theoretical guarantees for the algorithm's correctness and we experimentally validate our claims through extensive experiments, including four different real world datasets (ENRON, LBNL, FACEBOOK and NELL), demonstrating its effectiveness for data-mining practitioners. In particular, we are the first to analyze the very large NELL dataset using a sparse tensor decomposition, demonstrating that PARCUBE enables us to handle effectively and efficiently very large datasets. Finally, we make our highly scalable parallel implementation publicly available, enabling reproducibility of our work.",

keywords = "Algorithms, H.2.8 [database applications]: Data mining, H.3.3 [information search and retrieval]: Clustering, PARAFA decomposition, Parallel algorithms, Performance, Randomized algorithms, Sampling, Sparsity, Tensors",

author = "Papalexakis, {Evangelos E.} and Christos Faloutsos and Sidiropoulos, {Nicholas D.}",

note = "Publisher Copyright: {\textcopyright} 2015 ACM.",

year = "2015",

month = jul,

day = "1",

doi = "10.1145/2729980",

language = "English (US)",

volume = "10",

journal = "ACM Transactions on Knowledge Discovery from Data",

issn = "1556-4681",

publisher = "Association for Computing Machinery (ACM)",

number = "1",

}

TY - JOUR

T1 - ParCube

T2 - Sparse parallelizable CANDECOMP-PARAFAC tensor decomposition

AU - Papalexakis, Evangelos E.

AU - Faloutsos, Christos

AU - Sidiropoulos, Nicholas D.

PY - 2015/7/1

Y1 - 2015/7/1

N2 - How can we efficiently decompose a tensor into sparse factors, when the data do not fit in memory? Tensor decompositions have gained a steadily increasing popularity in data-mining applications; however, the current state-of-art decomposition algorithms operate on main memory and do not scale to truly large datasets. In this work, we propose PARCUBE, a new and highly parallelizable method for speeding up tensor decompositions that is well suited to produce sparse approximations. Experiments with even moderately large data indicate over 90% sparser outputs and 14 times faster execution, with approximation error close to the current state of the art irrespective of computation and memory requirements. We provide theoretical guarantees for the algorithm's correctness and we experimentally validate our claims through extensive experiments, including four different real world datasets (ENRON, LBNL, FACEBOOK and NELL), demonstrating its effectiveness for data-mining practitioners. In particular, we are the first to analyze the very large NELL dataset using a sparse tensor decomposition, demonstrating that PARCUBE enables us to handle effectively and efficiently very large datasets. Finally, we make our highly scalable parallel implementation publicly available, enabling reproducibility of our work.

AB - How can we efficiently decompose a tensor into sparse factors, when the data do not fit in memory? Tensor decompositions have gained a steadily increasing popularity in data-mining applications; however, the current state-of-art decomposition algorithms operate on main memory and do not scale to truly large datasets. In this work, we propose PARCUBE, a new and highly parallelizable method for speeding up tensor decompositions that is well suited to produce sparse approximations. Experiments with even moderately large data indicate over 90% sparser outputs and 14 times faster execution, with approximation error close to the current state of the art irrespective of computation and memory requirements. We provide theoretical guarantees for the algorithm's correctness and we experimentally validate our claims through extensive experiments, including four different real world datasets (ENRON, LBNL, FACEBOOK and NELL), demonstrating its effectiveness for data-mining practitioners. In particular, we are the first to analyze the very large NELL dataset using a sparse tensor decomposition, demonstrating that PARCUBE enables us to handle effectively and efficiently very large datasets. Finally, we make our highly scalable parallel implementation publicly available, enabling reproducibility of our work.

KW - Algorithms

KW - H.2.8 [database applications]: Data mining

KW - H.3.3 [information search and retrieval]: Clustering

KW - PARAFA decomposition

KW - Parallel algorithms

KW - Performance

KW - Randomized algorithms

KW - Sampling

KW - Sparsity

KW - Tensors

UR - http://www.scopus.com/inward/record.url?scp=84938355609&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84938355609&partnerID=8YFLogxK

U2 - 10.1145/2729980

DO - 10.1145/2729980

M3 - Article

AN - SCOPUS:84938355609

SN - 1556-4681

VL - 10

JO - ACM Transactions on Knowledge Discovery from Data

JF - ACM Transactions on Knowledge Discovery from Data

IS - 1

M1 - 3

ER -

ParCube: Sparse parallelizable CANDECOMP-PARAFAC tensor decomposition

Abstract

Bibliographical note

Keywords

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this