A new metric for robustness with application to job scheduling

Darin A England; Jon B Weissman; Jayashree Sadagopan

A new metric for robustness with application to job scheduling

Darin A England, Jon B Weissman, Jayashree Sadagopan

Research output: Contribution to journal › Conference article › peer-review

Abstract

Scheduling strategies for parallel and distributed computing have mostly been oriented toward performance, while striving to achieve some notion of fairness. With the increase in size, complexity, and heterogeneity of today's computing environments, we argue that, in addition to performance metrics, scheduling algorithms should be designed for robustness. That is, they should have the ability to maintain performance under a wide variety of operating conditions. Although robustness is easy to define, there are no widely used metrics for this property. To this end, we present a methodology for characterizing and measuring the robustness of a system to a specific disturbance. The methodology is easily applied to many types of computing systems and it does not require sophisticated mathematical models. To illustrate its use, we show three applications of our technique to job scheduling; one supporting a previous result with respect to backfilling, one examining overload control in a streaming video server, and one comparing two different scheduling strategies for a distributed network service. The last example also demonstrates how consideration of robustness leads to better system design as we were able to devise a new and effective scheduling heuristic.

Original language	English (US)
Pages (from-to)	135-143
Number of pages	9
Journal	Proceedings of the IEEE International Symposium on High Performance Distributed Computing
State	Published - 2005
Event	14th IEEE International Symposium on High Performance Distributed Computing, HPDC-14 - Research Triangle Park, NC, United States Duration: Jul 24 2005 → Jul 27 2005

OpenUrl availability

Full text

Cite this

@article{610316048847483daedfc20fda0a878d,

title = "A new metric for robustness with application to job scheduling",

abstract = "Scheduling strategies for parallel and distributed computing have mostly been oriented toward performance, while striving to achieve some notion of fairness. With the increase in size, complexity, and heterogeneity of today's computing environments, we argue that, in addition to performance metrics, scheduling algorithms should be designed for robustness. That is, they should have the ability to maintain performance under a wide variety of operating conditions. Although robustness is easy to define, there are no widely used metrics for this property. To this end, we present a methodology for characterizing and measuring the robustness of a system to a specific disturbance. The methodology is easily applied to many types of computing systems and it does not require sophisticated mathematical models. To illustrate its use, we show three applications of our technique to job scheduling; one supporting a previous result with respect to backfilling, one examining overload control in a streaming video server, and one comparing two different scheduling strategies for a distributed network service. The last example also demonstrates how consideration of robustness leads to better system design as we were able to devise a new and effective scheduling heuristic.",

author = "England, {Darin A} and Weissman, {Jon B} and Jayashree Sadagopan",

year = "2005",

language = "English (US)",

pages = "135--143",

journal = "Proceedings of the IEEE International Symposium on High Performance Distributed Computing",

issn = "1082-8907",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

note = "14th IEEE International Symposium on High Performance Distributed Computing, HPDC-14 ; Conference date: 24-07-2005 Through 27-07-2005",

}

TY - JOUR

T1 - A new metric for robustness with application to job scheduling

AU - England, Darin A

AU - Weissman, Jon B

AU - Sadagopan, Jayashree

PY - 2005

Y1 - 2005

N2 - Scheduling strategies for parallel and distributed computing have mostly been oriented toward performance, while striving to achieve some notion of fairness. With the increase in size, complexity, and heterogeneity of today's computing environments, we argue that, in addition to performance metrics, scheduling algorithms should be designed for robustness. That is, they should have the ability to maintain performance under a wide variety of operating conditions. Although robustness is easy to define, there are no widely used metrics for this property. To this end, we present a methodology for characterizing and measuring the robustness of a system to a specific disturbance. The methodology is easily applied to many types of computing systems and it does not require sophisticated mathematical models. To illustrate its use, we show three applications of our technique to job scheduling; one supporting a previous result with respect to backfilling, one examining overload control in a streaming video server, and one comparing two different scheduling strategies for a distributed network service. The last example also demonstrates how consideration of robustness leads to better system design as we were able to devise a new and effective scheduling heuristic.

AB - Scheduling strategies for parallel and distributed computing have mostly been oriented toward performance, while striving to achieve some notion of fairness. With the increase in size, complexity, and heterogeneity of today's computing environments, we argue that, in addition to performance metrics, scheduling algorithms should be designed for robustness. That is, they should have the ability to maintain performance under a wide variety of operating conditions. Although robustness is easy to define, there are no widely used metrics for this property. To this end, we present a methodology for characterizing and measuring the robustness of a system to a specific disturbance. The methodology is easily applied to many types of computing systems and it does not require sophisticated mathematical models. To illustrate its use, we show three applications of our technique to job scheduling; one supporting a previous result with respect to backfilling, one examining overload control in a streaming video server, and one comparing two different scheduling strategies for a distributed network service. The last example also demonstrates how consideration of robustness leads to better system design as we were able to devise a new and effective scheduling heuristic.

UR - http://www.scopus.com/inward/record.url?scp=27544501646&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=27544501646&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:27544501646

SN - 1082-8907

SP - 135

EP - 143

JO - Proceedings of the IEEE International Symposium on High Performance Distributed Computing

JF - Proceedings of the IEEE International Symposium on High Performance Distributed Computing

T2 - 14th IEEE International Symposium on High Performance Distributed Computing, HPDC-14

Y2 - 24 July 2005 through 27 July 2005

ER -

A new metric for robustness with application to job scheduling

Abstract

OpenUrl availability

Other files and links

Fingerprint

Cite this