Abstract
extracting common structure from multiple 'views', i.e., high-dimensional matrices representing the same objects in different feature domains -An extension of classical two-view CCA. Existing (G)CCA algorithms have serious scalability issues, since they involve square root factorization of the correlation matrices of the views. The memory and computational complexity associated with this step grow as a quadratic and cubic function of the problem dimension (the number of samples / features), respectively. To circumvent such difficulties, we propose a GCCA algorithm whose memory and computational costs scale linearly in the problem dimension and the number of nonzero data elements, respectively. Consequently, the proposed algorithm can easily handle very large sparse views whose sample and feature dimensions both exceed 100, 000 - while the current approaches can only handle thousands of features / samples. Our second contribution is a distributed algorithm for GCCA, which computes the canonical components of different views in parallel and thus can further reduce the runtime significantly (by ≥ 30% in experiments) if multiple cores are available. Judiciously designed synthetic and real-data experiments using a multilingual dataset are employed to showcase the effectiveness of the proposed algorithms.
Original language | English (US) |
---|---|
Title of host publication | Proceedings - 16th IEEE International Conference on Data Mining, ICDM 2016 |
Editors | Francesco Bonchi, Josep Domingo-Ferrer, Ricardo Baeza-Yates, Zhi-Hua Zhou, Xindong Wu |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 871-876 |
Number of pages | 6 |
ISBN (Electronic) | 9781509054725 |
DOIs | |
State | Published - Jul 2 2016 |
Event | 16th IEEE International Conference on Data Mining, ICDM 2016 - Barcelona, Catalonia, Spain Duration: Dec 12 2016 → Dec 15 2016 |
Publication series
Name | Proceedings - IEEE International Conference on Data Mining, ICDM |
---|---|
Volume | 0 |
ISSN (Print) | 1550-4786 |
Other
Other | 16th IEEE International Conference on Data Mining, ICDM 2016 |
---|---|
Country/Territory | Spain |
City | Barcelona, Catalonia |
Period | 12/12/16 → 12/15/16 |
Bibliographical note
Publisher Copyright:© 2016 IEEE.
Keywords
- Distributed GCCA
- Lagre-scale generalized canonical correlation analysis
- Multilingual word embeddings