A new metric for robustness with application to job scheduling

Darin A England, Jon B Weissman, Jayashree Sadagopan

Research output: Contribution to journalConference articlepeer-review

23 Scopus citations

Abstract

Scheduling strategies for parallel and distributed computing have mostly been oriented toward performance, while striving to achieve some notion of fairness. With the increase in size, complexity, and heterogeneity of today's computing environments, we argue that, in addition to performance metrics, scheduling algorithms should be designed for robustness. That is, they should have the ability to maintain performance under a wide variety of operating conditions. Although robustness is easy to define, there are no widely used metrics for this property. To this end, we present a methodology for characterizing and measuring the robustness of a system to a specific disturbance. The methodology is easily applied to many types of computing systems and it does not require sophisticated mathematical models. To illustrate its use, we show three applications of our technique to job scheduling; one supporting a previous result with respect to backfilling, one examining overload control in a streaming video server, and one comparing two different scheduling strategies for a distributed network service. The last example also demonstrates how consideration of robustness leads to better system design as we were able to devise a new and effective scheduling heuristic.

Original languageEnglish (US)
Pages (from-to)135-143
Number of pages9
JournalProceedings of the IEEE International Symposium on High Performance Distributed Computing
StatePublished - 2005
Event14th IEEE International Symposium on High Performance Distributed Computing, HPDC-14 - Research Triangle Park, NC, United States
Duration: Jul 24 2005Jul 27 2005

Fingerprint

Dive into the research topics of 'A new metric for robustness with application to job scheduling'. Together they form a unique fingerprint.

Cite this