Translation invariant word embeddings

Matt Gardner; Kejun Huang; Evangelos Papalexakis; Xiao Fu; Partha Talukdar; Christos Faloutsos; Nicholas Sidiropoulos; Tom Mitchell

Translation invariant word embeddings

Matt Gardner, Kejun Huang, Evangelos Papalexakis, Xiao Fu, Partha Talukdar, Christos Faloutsos, Nicholas Sidiropoulos, Tom Mitchell

Electrical and Computer Engineering

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

20 Scopus citations

Abstract

This work focuses on the task of finding latent vector representations of the words in a corpus. In particular, we address the issue of what to do when there are multiple languages in the corpus. Prior work has, among other techniques, used canonical correlation analysis to project pre-trained vectors in two languages into a common space. We propose a simple and scalable method that is inspired by the notion that the learned vector representations should be invariant to translation between languages. We show empirically that our method outperforms prior work on multilingual tasks, matches the performance of prior work on monolingual tasks, and scales linearly with the size of the input data (and thus the number of languages being embedded).

Original language	English (US)
Title of host publication	Conference Proceedings - EMNLP 2015
Subtitle of host publication	Conference on Empirical Methods in Natural Language Processing
Publisher	Association for Computational Linguistics (ACL)
Pages	1084-1088
Number of pages	5
ISBN (Electronic)	9781941643327
State	Published - 2015
Event	Conference on Empirical Methods in Natural Language Processing, EMNLP 2015 - Lisbon, Portugal Duration: Sep 17 2015 → Sep 21 2015

Publication series

Name	Conference Proceedings - EMNLP 2015: Conference on Empirical Methods in Natural Language Processing

Other

Other	Conference on Empirical Methods in Natural Language Processing, EMNLP 2015
Country/Territory	Portugal
City	Lisbon
Period	9/17/15 → 9/21/15

Bibliographical note

Publisher Copyright:
© 2015 Association for Computational Linguistics.

OpenUrl availability

Full text

Cite this

Gardner, M., Huang, K., Papalexakis, E., Fu, X., Talukdar, P., Faloutsos, C., Sidiropoulos, N., & Mitchell, T. (2015). Translation invariant word embeddings. In Conference Proceedings - EMNLP 2015: Conference on Empirical Methods in Natural Language Processing (pp. 1084-1088). (Conference Proceedings - EMNLP 2015: Conference on Empirical Methods in Natural Language Processing). Association for Computational Linguistics (ACL).

Translation invariant word embeddings. / Gardner, Matt; Huang, Kejun; Papalexakis, Evangelos et al.
Conference Proceedings - EMNLP 2015: Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics (ACL), 2015. p. 1084-1088 (Conference Proceedings - EMNLP 2015: Conference on Empirical Methods in Natural Language Processing).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Gardner, M, Huang, K, Papalexakis, E, Fu, X, Talukdar, P, Faloutsos, C, Sidiropoulos, N & Mitchell, T 2015, Translation invariant word embeddings. in Conference Proceedings - EMNLP 2015: Conference on Empirical Methods in Natural Language Processing. Conference Proceedings - EMNLP 2015: Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics (ACL), pp. 1084-1088, Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, Lisbon, Portugal, 9/17/15.

Gardner M, Huang K, Papalexakis E, Fu X, Talukdar P, Faloutsos C et al. Translation invariant word embeddings. In Conference Proceedings - EMNLP 2015: Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics (ACL). 2015. p. 1084-1088. (Conference Proceedings - EMNLP 2015: Conference on Empirical Methods in Natural Language Processing).

Gardner, Matt ; Huang, Kejun ; Papalexakis, Evangelos et al. / Translation invariant word embeddings. Conference Proceedings - EMNLP 2015: Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics (ACL), 2015. pp. 1084-1088 (Conference Proceedings - EMNLP 2015: Conference on Empirical Methods in Natural Language Processing).

@inproceedings{dad9314001fa4fa6b6fa757b73ca08ed,

title = "Translation invariant word embeddings",

abstract = "This work focuses on the task of finding latent vector representations of the words in a corpus. In particular, we address the issue of what to do when there are multiple languages in the corpus. Prior work has, among other techniques, used canonical correlation analysis to project pre-trained vectors in two languages into a common space. We propose a simple and scalable method that is inspired by the notion that the learned vector representations should be invariant to translation between languages. We show empirically that our method outperforms prior work on multilingual tasks, matches the performance of prior work on monolingual tasks, and scales linearly with the size of the input data (and thus the number of languages being embedded).",

author = "Matt Gardner and Kejun Huang and Evangelos Papalexakis and Xiao Fu and Partha Talukdar and Christos Faloutsos and Nicholas Sidiropoulos and Tom Mitchell",

note = "Publisher Copyright: {\textcopyright} 2015 Association for Computational Linguistics.; Conference on Empirical Methods in Natural Language Processing, EMNLP 2015 ; Conference date: 17-09-2015 Through 21-09-2015",

year = "2015",

language = "English (US)",

series = "Conference Proceedings - EMNLP 2015: Conference on Empirical Methods in Natural Language Processing",

publisher = "Association for Computational Linguistics (ACL)",

pages = "1084--1088",

booktitle = "Conference Proceedings - EMNLP 2015",

}

TY - GEN

T1 - Translation invariant word embeddings

AU - Gardner, Matt

AU - Huang, Kejun

AU - Papalexakis, Evangelos

AU - Fu, Xiao

AU - Talukdar, Partha

AU - Faloutsos, Christos

AU - Sidiropoulos, Nicholas

AU - Mitchell, Tom

PY - 2015

Y1 - 2015

N2 - This work focuses on the task of finding latent vector representations of the words in a corpus. In particular, we address the issue of what to do when there are multiple languages in the corpus. Prior work has, among other techniques, used canonical correlation analysis to project pre-trained vectors in two languages into a common space. We propose a simple and scalable method that is inspired by the notion that the learned vector representations should be invariant to translation between languages. We show empirically that our method outperforms prior work on multilingual tasks, matches the performance of prior work on monolingual tasks, and scales linearly with the size of the input data (and thus the number of languages being embedded).

AB - This work focuses on the task of finding latent vector representations of the words in a corpus. In particular, we address the issue of what to do when there are multiple languages in the corpus. Prior work has, among other techniques, used canonical correlation analysis to project pre-trained vectors in two languages into a common space. We propose a simple and scalable method that is inspired by the notion that the learned vector representations should be invariant to translation between languages. We show empirically that our method outperforms prior work on multilingual tasks, matches the performance of prior work on monolingual tasks, and scales linearly with the size of the input data (and thus the number of languages being embedded).

UR - http://www.scopus.com/inward/record.url?scp=84959869126&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84959869126&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84959869126

T3 - Conference Proceedings - EMNLP 2015: Conference on Empirical Methods in Natural Language Processing

SP - 1084

EP - 1088

BT - Conference Proceedings - EMNLP 2015

PB - Association for Computational Linguistics (ACL)

T2 - Conference on Empirical Methods in Natural Language Processing, EMNLP 2015

Y2 - 17 September 2015 through 21 September 2015

ER -

Translation invariant word embeddings

Abstract

Publication series

Other

Bibliographical note

OpenUrl availability

Other files and links

Fingerprint

Cite this