TY - JOUR
T1 - Turbo-SMT
T2 - Parallel coupled sparse matrix-Tensor factorizations and applications
AU - Papalexakis, Evangelos E.
AU - Mitchell, Tom M.
AU - Sidiropoulos, Nicholas D.
AU - Faloutsos, Christos
AU - Talukdar, Partha Pratim
AU - Murphy, Brian
N1 - Publisher Copyright:
© 2016 Wiley Periodicals, Inc.
PY - 2016/8/1
Y1 - 2016/8/1
N2 - How can we correlate the neural activity in the human brain as it responds to typed words, with properties of these terms (like ‘edible’, ‘fits in hand’)? In short, we want to find latent variables, that jointly explain both the brain activity, as well as the behavioral responses. This is one of many settings of the Coupled Matrix-Tensor Factorization (CMTF) problem. Can we enhance any CMTF solver, so that it can operate on potentially very large datasets that may not fit in main memory? We introduce Turbo-SMT, a meta-method capable of doing exactly that: it boosts the performance of any CMTF algorithm, produces sparse and interpretable solutions, and parallelizes any CMTF algorithm, producing sparse and interpretable solutions (up to 65 fold). Additionally, we improve upon ALS, the work-horse algorithm for CMTF, with respect to efficiency and robustness to missing values. We apply Turbo-SMT to BrainQ, a dataset consisting of a (nouns, brain voxels, human subjects) tensor and a (nouns, properties) matrix, with coupling along the nouns dimension. Turbo-SMT is able to find meaningful latent variables, as well as to predict brain activity with competitive accuracy. Finally, we demonstrate the generality of Turbo-SMT, by applying it on a FACEBOOK dataset (users, ‘friends', wall-postings); there, Turbo-SMT spots spammer-like anomalies.
AB - How can we correlate the neural activity in the human brain as it responds to typed words, with properties of these terms (like ‘edible’, ‘fits in hand’)? In short, we want to find latent variables, that jointly explain both the brain activity, as well as the behavioral responses. This is one of many settings of the Coupled Matrix-Tensor Factorization (CMTF) problem. Can we enhance any CMTF solver, so that it can operate on potentially very large datasets that may not fit in main memory? We introduce Turbo-SMT, a meta-method capable of doing exactly that: it boosts the performance of any CMTF algorithm, produces sparse and interpretable solutions, and parallelizes any CMTF algorithm, producing sparse and interpretable solutions (up to 65 fold). Additionally, we improve upon ALS, the work-horse algorithm for CMTF, with respect to efficiency and robustness to missing values. We apply Turbo-SMT to BrainQ, a dataset consisting of a (nouns, brain voxels, human subjects) tensor and a (nouns, properties) matrix, with coupling along the nouns dimension. Turbo-SMT is able to find meaningful latent variables, as well as to predict brain activity with competitive accuracy. Finally, we demonstrate the generality of Turbo-SMT, by applying it on a FACEBOOK dataset (users, ‘friends', wall-postings); there, Turbo-SMT spots spammer-like anomalies.
KW - algorithm
KW - coupled matrix-tensor factorization
KW - fMRI data
KW - neurosemantics
KW - parallel
KW - sparse
KW - speedup
KW - tensor
UR - http://www.scopus.com/inward/record.url?scp=84978520160&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84978520160&partnerID=8YFLogxK
U2 - 10.1002/sam.11315
DO - 10.1002/sam.11315
M3 - Article
C2 - 27672406
AN - SCOPUS:84978520160
SN - 1932-1864
VL - 9
SP - 269
EP - 290
JO - Statistical Analysis and Data Mining
JF - Statistical Analysis and Data Mining
IS - 4
ER -