Dynamically negotiating capacity between on-demand and batch clusters

Feng Liu, Kate Keahey, Pierre Riteau, Jon B Weissman

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Scopus citations

Abstract

In the era of rapid experimental expansion data analysis needs are rapidly outpacing the capabilities of small institutional clusters and looking to integrate HPC resources into their workflow. We propose one way of reconciling on-demand needs of experimental analytics with the batch managed HPC resources within a system that dynamically moves nodes between an on-demand cluster configured with cloud technology (OpenStack) and a traditional HPC cluster managed by a batch scheduler (Torque). We evaluate this system experimentally both in the context of real-life traces representing two years of a specific institutional need, and via experiments in the context of synthetic traces that capture generalized characteristics of potential batch and on-demand workloads. Our results for the real-life scenario show that our approach could reduce the current investment in on-demand infrastructure by 82% while at the same time improving the mean batch wait time almost by an order of magnitude (8x).

Original languageEnglish (US)
Title of host publicationProceedings - International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages493-503
Number of pages11
ISBN (Electronic)9781538683842
DOIs
StatePublished - Mar 11 2019
Event2018 International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018 - Dallas, United States
Duration: Nov 11 2018Nov 16 2018

Publication series

NameProceedings - International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018

Conference

Conference2018 International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018
CountryUnited States
CityDallas
Period11/11/1811/16/18

Bibliographical note

Funding Information:
This material is based upon work supported by the U.S. Department of Energy, under the DOE-LAB-14-1003 and the NSF under the NSF-1443080 award. Results presented in this paper were obtained using the Chameleon testbed supported by the National Science Foundation.

Publisher Copyright:
© 2018 IEEE.

Keywords

  • Computers and information processing
  • Distributed computing
  • Grid computing
  • Metacomputing

Fingerprint

Dive into the research topics of 'Dynamically negotiating capacity between on-demand and batch clusters'. Together they form a unique fingerprint.

Cite this