DGL-KE: Training Knowledge Graph Embeddings at Scale

Da Zheng; Xiang Song; Chao Ma; Zeyuan Tan; Zihao Ye; Jin Dong; Hao Xiong; Zheng Zhang; George Karypis

doi:10.1145/3397271.3401172

DGL-KE: Training Knowledge Graph Embeddings at Scale

Da Zheng, Xiang Song, Chao Ma, Zeyuan Tan, Zihao Ye, Jin Dong, Hao Xiong, Zheng Zhang, George Karypis

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

98 Scopus citations

Abstract

Knowledge graphs have emerged as a key abstraction for organizing information in diverse domains and their embeddings are increasingly used to harness their information in various information retrieval and machine learning tasks. However, the ever growing size of knowledge graphs requires computationally efficient algorithms capable of scaling to graphs with millions of nodes and billions of edges. This paper presents DGL-KE, an open-source package to efficiently compute knowledge graph embeddings. DGL-KE introduces various novel optimizations that accelerate training on knowledge graphs with millions of nodes and billions of edges using multi-processing, multi-GPU, and distributed parallelism. These optimizations are designed to increase data locality, reduce communication overhead, overlap computations with memory accesses, and achieve high operation efficiency. Experiments on knowledge graphs consisting of over 86M nodes and 338M edges show that DGL-KE can compute embeddings in 100 minutes on an EC2 instance with 8 GPUs and 30 minutes on an EC2 cluster with 4 machines with 48 cores/machine. These results represent a 2× ∼ 5× speedup over the best competing approaches. DGL-KE is available on https://github.com/awslabs/dgl-ke.

Original language	English (US)
Title of host publication	SIGIR 2020 - Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval
Publisher	Association for Computing Machinery, Inc
Pages	739-748
Number of pages	10
ISBN (Electronic)	9781450380164
DOIs	https://doi.org/10.1145/3397271.3401172
State	Published - Jul 25 2020
Externally published	Yes
Event	43rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2020 - Virtual, Online, China Duration: Jul 25 2020 → Jul 30 2020

Publication series

Name	SIGIR 2020 - Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval

Conference

Conference	43rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2020
Country/Territory	China
City	Virtual, Online
Period	7/25/20 → 7/30/20

Bibliographical note

Publisher Copyright:
© 2020 ACM.

Keywords

distributed training
knowledge graph embeddings
large scale

Access

10.1145/3397271.3401172

OpenUrl availability

Full text

Cite this

Zheng, D., Song, X., Ma, C., Tan, Z., Ye, Z., Dong, J., Xiong, H., Zhang, Z., & Karypis, G. (2020). DGL-KE: Training Knowledge Graph Embeddings at Scale. In SIGIR 2020 - Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 739-748). (SIGIR 2020 - Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval). Association for Computing Machinery, Inc. https://doi.org/10.1145/3397271.3401172

DGL-KE: Training Knowledge Graph Embeddings at Scale. / Zheng, Da; Song, Xiang; Ma, Chao et al.
SIGIR 2020 - Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. Association for Computing Machinery, Inc, 2020. p. 739-748 (SIGIR 2020 - Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Zheng, D, Song, X, Ma, C, Tan, Z, Ye, Z, Dong, J, Xiong, H, Zhang, Z & Karypis, G 2020, DGL-KE: Training Knowledge Graph Embeddings at Scale. in SIGIR 2020 - Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR 2020 - Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Association for Computing Machinery, Inc, pp. 739-748, 43rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2020, Virtual, Online, China, 7/25/20. https://doi.org/10.1145/3397271.3401172

Zheng D, Song X, Ma C, Tan Z, Ye Z, Dong J et al. DGL-KE: Training Knowledge Graph Embeddings at Scale. In SIGIR 2020 - Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. Association for Computing Machinery, Inc. 2020. p. 739-748. (SIGIR 2020 - Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval). doi: 10.1145/3397271.3401172

Zheng, Da ; Song, Xiang ; Ma, Chao et al. / DGL-KE : Training Knowledge Graph Embeddings at Scale. SIGIR 2020 - Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. Association for Computing Machinery, Inc, 2020. pp. 739-748 (SIGIR 2020 - Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval).

@inproceedings{f3ee82f6f9c0482396c819369c66c3f1,

title = "DGL-KE: Training Knowledge Graph Embeddings at Scale",

abstract = "Knowledge graphs have emerged as a key abstraction for organizing information in diverse domains and their embeddings are increasingly used to harness their information in various information retrieval and machine learning tasks. However, the ever growing size of knowledge graphs requires computationally efficient algorithms capable of scaling to graphs with millions of nodes and billions of edges. This paper presents DGL-KE, an open-source package to efficiently compute knowledge graph embeddings. DGL-KE introduces various novel optimizations that accelerate training on knowledge graphs with millions of nodes and billions of edges using multi-processing, multi-GPU, and distributed parallelism. These optimizations are designed to increase data locality, reduce communication overhead, overlap computations with memory accesses, and achieve high operation efficiency. Experiments on knowledge graphs consisting of over 86M nodes and 338M edges show that DGL-KE can compute embeddings in 100 minutes on an EC2 instance with 8 GPUs and 30 minutes on an EC2 cluster with 4 machines with 48 cores/machine. These results represent a 2× ∼ 5× speedup over the best competing approaches. DGL-KE is available on https://github.com/awslabs/dgl-ke.",

keywords = "distributed training, knowledge graph embeddings, large scale",

author = "Da Zheng and Xiang Song and Chao Ma and Zeyuan Tan and Zihao Ye and Jin Dong and Hao Xiong and Zheng Zhang and George Karypis",

note = "Publisher Copyright: {\textcopyright} 2020 ACM.; 43rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2020 ; Conference date: 25-07-2020 Through 30-07-2020",

year = "2020",

month = jul,

day = "25",

doi = "10.1145/3397271.3401172",

language = "English (US)",

series = "SIGIR 2020 - Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval",

publisher = "Association for Computing Machinery, Inc",

pages = "739--748",

booktitle = "SIGIR 2020 - Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval",

}

TY - GEN

T1 - DGL-KE

T2 - 43rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2020

AU - Zheng, Da

AU - Song, Xiang

AU - Ma, Chao

AU - Tan, Zeyuan

AU - Ye, Zihao

AU - Dong, Jin

AU - Xiong, Hao

AU - Zhang, Zheng

AU - Karypis, George

PY - 2020/7/25

Y1 - 2020/7/25

N2 - Knowledge graphs have emerged as a key abstraction for organizing information in diverse domains and their embeddings are increasingly used to harness their information in various information retrieval and machine learning tasks. However, the ever growing size of knowledge graphs requires computationally efficient algorithms capable of scaling to graphs with millions of nodes and billions of edges. This paper presents DGL-KE, an open-source package to efficiently compute knowledge graph embeddings. DGL-KE introduces various novel optimizations that accelerate training on knowledge graphs with millions of nodes and billions of edges using multi-processing, multi-GPU, and distributed parallelism. These optimizations are designed to increase data locality, reduce communication overhead, overlap computations with memory accesses, and achieve high operation efficiency. Experiments on knowledge graphs consisting of over 86M nodes and 338M edges show that DGL-KE can compute embeddings in 100 minutes on an EC2 instance with 8 GPUs and 30 minutes on an EC2 cluster with 4 machines with 48 cores/machine. These results represent a 2× ∼ 5× speedup over the best competing approaches. DGL-KE is available on https://github.com/awslabs/dgl-ke.

AB - Knowledge graphs have emerged as a key abstraction for organizing information in diverse domains and their embeddings are increasingly used to harness their information in various information retrieval and machine learning tasks. However, the ever growing size of knowledge graphs requires computationally efficient algorithms capable of scaling to graphs with millions of nodes and billions of edges. This paper presents DGL-KE, an open-source package to efficiently compute knowledge graph embeddings. DGL-KE introduces various novel optimizations that accelerate training on knowledge graphs with millions of nodes and billions of edges using multi-processing, multi-GPU, and distributed parallelism. These optimizations are designed to increase data locality, reduce communication overhead, overlap computations with memory accesses, and achieve high operation efficiency. Experiments on knowledge graphs consisting of over 86M nodes and 338M edges show that DGL-KE can compute embeddings in 100 minutes on an EC2 instance with 8 GPUs and 30 minutes on an EC2 cluster with 4 machines with 48 cores/machine. These results represent a 2× ∼ 5× speedup over the best competing approaches. DGL-KE is available on https://github.com/awslabs/dgl-ke.

KW - distributed training

KW - knowledge graph embeddings

KW - large scale

UR - http://www.scopus.com/inward/record.url?scp=85090146028&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85090146028&partnerID=8YFLogxK

U2 - 10.1145/3397271.3401172

DO - 10.1145/3397271.3401172

M3 - Conference contribution

AN - SCOPUS:85090146028

T3 - SIGIR 2020 - Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval

SP - 739

EP - 748

BT - SIGIR 2020 - Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval

PB - Association for Computing Machinery, Inc

Y2 - 25 July 2020 through 30 July 2020

ER -

DGL-KE: Training Knowledge Graph Embeddings at Scale

Abstract

Publication series

Conference

Bibliographical note

Keywords

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this