The guardian model and primitives for exception handling in distributed systems

Robert Miller, Anand Tripathi

Research output: Contribution to journalArticlepeer-review

24 Scopus citations

Abstract

This paper presents an abstraction called guardian for exception handling in distributed and concurrent systems that use coordinated exception handling. This model addresses two fundamental problems with distributed exception handling in a group of asynchronous processes. The first is to perform recovery when multiple exceptions are concurrently signaled. The second is to determine the correct context in which a process should execute its exception handling actions. Several schemes have been proposed in the past to address these problems. These are based on structuring a distributed program as atomic actions based on conversations or transactions and resolving multiple concurrent exceptions into a single one. The guardian in a distributed program represents the abstraction of a global exception handler, which encapsulates rules for handling concurrent exceptions and directing each process to the semantically correct context for executing its recovery actions. Its programming primitives and the underlying distributed execution model are presented here. In contrast to the existing approaches, this model is more basic and can be used to implement or enhance the existing schemes. Using several examples we illustrate the capabilities of this model. Finally, its advantages and limitations are discussed in contrast to existing approaches.

Original languageEnglish (US)
Pages (from-to)1008-1022
Number of pages15
JournalIEEE Transactions on Software Engineering
Volume30
Issue number12
DOIs
StatePublished - Dec 2004

Bibliographical note

Funding Information:
An initial version of this paper was presented at the IEEE Symposium for Reliable and Distributed Systems in October 2002. The authors thank Sujay Patankar and Devdatta Kulkarni for further refinements and testing of this implementation, and the three anonymous reviewers for their insightful and constructive comments in improving the presentation of this paper. This work was supported in part by US National Science Foundation grants ANI 0087514 and ITR 0082215. R. Miller thanks IBM for its support through the Work Study Program.

Keywords

  • Concurrent programming
  • Distributed programming
  • Fault tolerance

Fingerprint

Dive into the research topics of 'The guardian model and primitives for exception handling in distributed systems'. Together they form a unique fingerprint.

Cite this