MPI for Big Data: New tricks for an old dog

Dominique Lasalle, George Karypis

Research output: Contribution to journalArticlepeer-review

9 Scopus citations

Abstract

The processing of massive amounts of data on clusters with finite amount of memory has become an important problem facing the parallel/distributed computing community. While MapReduce-style technologies provide an effective means for addressing various problems that fit within the MapReduce paradigm, there are many classes of problems for which this paradigm is ill-suited. In this paper we present a runtime system for traditional MPI programs that enables the efficient and transparent out-of-core execution of distributed-memory parallel programs. This system, called BDMPI,1 leverages the semantics of MPI's API to orchestrate the execution of a large number of MPI processes on much fewer compute nodes, so that the running processes maximize the amount of computation that they perform with the data fetched from the disk. BDMPI enables the development of efficient out-of-core parallel distributed memory codes without the high engineering and algorithmic complexities associated with multiple levels of blocking. BDMPI achieves significantly better performance than existing technologies on a single node as well as on a small cluster, and performs within 30% of optimized out-of-core implementations.

Original languageEnglish (US)
Pages (from-to)754-767
Number of pages14
JournalParallel Computing
Volume40
Issue number10
DOIs
StatePublished - Dec 2014

Bibliographical note

Funding Information:
This work was supported in part by NSF ( IOS-0820730 , IIS-0905220 , OCI-1048018 , CNS-1162405 , and IIS-1247632 ) and the Digital Technology Center at the University of Minnesota. Access to research and computing facilities was provided by the Digital Technology Center and the Minnesota Supercomputing Institute.

Publisher Copyright:
© 2014 Published by Elsevier B.V.

Keywords

  • Big Data
  • Distributed
  • High performance
  • Out-of-core

Fingerprint

Dive into the research topics of 'MPI for Big Data: New tricks for an old dog'. Together they form a unique fingerprint.

Cite this