Abstract
Large-scale distributed applications are subject to frequent disruptions due to resource contention and failure. Such disruptions are inherently unpredictable and, therefore, robustness is a desirable property for the distributed operating environment. In this work, we describe and evaluate a robust topology for applications that operate on a spanning tree overlay network. Unlike previous work that is adaptive or reactive in nature, we take a proactive approach to robustness. The topology itself is able to simultaneously withstand disturbances and exhibit good performance. We present both centralized and distributed algorithms to construct the topology, and then demonstrate its effectiveness through analysis and simulation of two classes of distributed applications: Data collection in sensor networks and data dissemination in divisible load scheduling. The results show that our robust spanning trees achieve a desirable trade-off for two opposing metrics where traditional forms of spanning trees do not. In particular, the trees generated by our algorithms exhibit both resilience to data loss and low power consumption for sensor networks. When used as the overlay network for divisible load scheduling, they display both robustness to link congestion and low values for the makespan of the schedule.
Original language | English (US) |
---|---|
Pages (from-to) | 608-620 |
Number of pages | 13 |
Journal | IEEE Transactions on Parallel and Distributed Systems |
Volume | 18 |
Issue number | 5 |
DOIs | |
State | Published - May 2007 |
Bibliographical note
Funding Information:The authors would like to acknowledge the support of the US National Science Foundation under grants CNS-0305641 and ITR-0325949, the US Department of Energy’s Office of Science under grant DE-FG02-03ER25554, and the Minnesota Supercomputing Institute for Digital Simulation and Advanced Computation and the Digital Technology Center at the University of Minnesota. Bharadwaj Veeravalli would like to acknowledge the support under grant (0520150024/ R-263-000-350-592) by A*STAR SERC through the National Grid Office, Singapore.
Keywords
- Distributed computing
- Divisible load scheduling
- Fault tolerance
- Graph theory
- Robustness
- Wireless sensor networks