Efficient and distributed algorithms for large-scale generalized canonical correlations analysis

Xiao Fu, Kejun Huang, Evangelos E. Papalexakis, Hyun Ah Song, Partha Pratim Talukdar, Nicholas D. Sidiropoulos, Christos Faloutsos, Tom Mitchell

Research output: Chapter in Book/Report/Conference proceedingConference contribution

9 Scopus citations

Abstract

extracting common structure from multiple 'views', i.e., high-dimensional matrices representing the same objects in different feature domains -An extension of classical two-view CCA. Existing (G)CCA algorithms have serious scalability issues, since they involve square root factorization of the correlation matrices of the views. The memory and computational complexity associated with this step grow as a quadratic and cubic function of the problem dimension (the number of samples / features), respectively. To circumvent such difficulties, we propose a GCCA algorithm whose memory and computational costs scale linearly in the problem dimension and the number of nonzero data elements, respectively. Consequently, the proposed algorithm can easily handle very large sparse views whose sample and feature dimensions both exceed 100, 000 - while the current approaches can only handle thousands of features / samples. Our second contribution is a distributed algorithm for GCCA, which computes the canonical components of different views in parallel and thus can further reduce the runtime significantly (by ≥ 30% in experiments) if multiple cores are available. Judiciously designed synthetic and real-data experiments using a multilingual dataset are employed to showcase the effectiveness of the proposed algorithms.

Original languageEnglish (US)
Title of host publicationProceedings - 16th IEEE International Conference on Data Mining, ICDM 2016
EditorsFrancesco Bonchi, Xindong Wu, Ricardo Baeza-Yates, Josep Domingo-Ferrer, Zhi-Hua Zhou
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages871-876
Number of pages6
ISBN (Electronic)9781509054725
DOIs
StatePublished - Jan 31 2017
Event16th IEEE International Conference on Data Mining, ICDM 2016 - Barcelona, Catalonia, Spain
Duration: Dec 12 2016Dec 15 2016

Publication series

NameProceedings - IEEE International Conference on Data Mining, ICDM
ISSN (Print)1550-4786

Other

Other16th IEEE International Conference on Data Mining, ICDM 2016
CountrySpain
CityBarcelona, Catalonia
Period12/12/1612/15/16

Keywords

  • Distributed GCCA
  • Lagre-scale generalized canonical correlation analysis
  • Multilingual word embeddings

Fingerprint Dive into the research topics of 'Efficient and distributed algorithms for large-scale generalized canonical correlations analysis'. Together they form a unique fingerprint.

Cite this