Reducing latency via redundant requests: Exact analysis

Kristen Gardner; Samuel Zbarsky; Sherwin Doroudi; Mor Harchol-Balter; Esa Hyytiä; Alan Scheller-Wolf

doi:10.1145/2796314.2745873

Reducing latency via redundant requests: Exact analysis

Kristen Gardner, Samuel Zbarsky, Sherwin Doroudi, Mor Harchol-Balter, Esa Hyytiä, Alan Scheller-Wolf

Research output: Contribution to journal › Conference article › peer-review

111 Scopus citations

Abstract

Recent computer systems research has proposed using redundant requests to reduce latency. The idea is to run a request on multiple servers and wait for the first completion (discarding all remaining copies of the request). However there is no exact analysis of systems with redundancy. This paper presents the first exact analysis of systems with redundancy. We allow for any number of classes of redundant requests, any number of classes of non-redundant requests, any degree of redundancy, and any number of heterogeneous servers. In all cases we derive the limiting distribution on the state of the system. In small (two or three server) systems, we derive simple forms for the distribution of response time of both the redundant classes and non-redundant classes, and we quantify the "gain" to redundant classes and "pain" to non-redundant classes caused by redundancy. We find some surprising results. First, the response time of a fully redundant class follows a simple Exponential distribution and that of the non-redundant class follows a Generalized Hyperexponential. Second, fully redundant classes are "immune" to any pain caused by other classes becoming redundant. We also compare redundancy with other approaches for reducing latency, such as optimal probabilistic splitting of a class among servers (Opt-Split) and Join-the-Shortest-Queue (JSQ) routing of a class. We find that, in many cases, redundancy outperforms JSQ and Opt-Split with respect to overall response time, making it an attractive solution.

Original language	English (US)
Pages (from-to)	347-360
Number of pages	14
Journal	Performance Evaluation Review
Volume	43
Issue number	1
DOIs	https://doi.org/10.1145/2796314.2745873
State	Published - Jun 24 2015
Externally published	Yes
Event	ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS 2015 - Portland, United States Duration: Jun 15 2015 → Jun 19 2015

Bibliographical note

Funding Information:
This material is based upon work supported by the Na-tional Science Foundation Graduate Research Fellowship un-der Grant No. DGE-1252522; was funded by NSF-CMMI-1334194 and NSF-CSR-1116282, by the Intel Science and Technology Center for Cloud Computing, and by a Google Faculty Research Award 2015/16; and has been supported by the Academy of Finland in TOP-Energy project (grant no. 268992).

Publisher Copyright:
© Copyright 2015 ACM.

Keywords

Markov chain analysis
Redundancy

Access

10.1145/2796314.2745873

OpenUrl availability

Full text

Cite this

@article{af23b326d23f4d269965209895bd0020,

title = "Reducing latency via redundant requests: Exact analysis",

abstract = "Recent computer systems research has proposed using redundant requests to reduce latency. The idea is to run a request on multiple servers and wait for the first completion (discarding all remaining copies of the request). However there is no exact analysis of systems with redundancy. This paper presents the first exact analysis of systems with redundancy. We allow for any number of classes of redundant requests, any number of classes of non-redundant requests, any degree of redundancy, and any number of heterogeneous servers. In all cases we derive the limiting distribution on the state of the system. In small (two or three server) systems, we derive simple forms for the distribution of response time of both the redundant classes and non-redundant classes, and we quantify the {"}gain{"} to redundant classes and {"}pain{"} to non-redundant classes caused by redundancy. We find some surprising results. First, the response time of a fully redundant class follows a simple Exponential distribution and that of the non-redundant class follows a Generalized Hyperexponential. Second, fully redundant classes are {"}immune{"} to any pain caused by other classes becoming redundant. We also compare redundancy with other approaches for reducing latency, such as optimal probabilistic splitting of a class among servers (Opt-Split) and Join-the-Shortest-Queue (JSQ) routing of a class. We find that, in many cases, redundancy outperforms JSQ and Opt-Split with respect to overall response time, making it an attractive solution.",

keywords = "Markov chain analysis, Redundancy",

author = "Kristen Gardner and Samuel Zbarsky and Sherwin Doroudi and Mor Harchol-Balter and Esa Hyyti{\"a} and Alan Scheller-Wolf",

note = "Funding Information: This material is based upon work supported by the Na-tional Science Foundation Graduate Research Fellowship un-der Grant No. DGE-1252522; was funded by NSF-CMMI-1334194 and NSF-CSR-1116282, by the Intel Science and Technology Center for Cloud Computing, and by a Google Faculty Research Award 2015/16; and has been supported by the Academy of Finland in TOP-Energy project (grant no. 268992). Publisher Copyright: {\textcopyright} Copyright 2015 ACM.; ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS 2015 ; Conference date: 15-06-2015 Through 19-06-2015",

year = "2015",

month = jun,

day = "24",

doi = "10.1145/2796314.2745873",

language = "English (US)",

volume = "43",

pages = "347--360",

journal = "Performance Evaluation Review",

issn = "0163-5999",

publisher = "Association for Computing Machinery (ACM)",

number = "1",

}

TY - JOUR

T1 - Reducing latency via redundant requests

T2 - ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS 2015

AU - Gardner, Kristen

AU - Zbarsky, Samuel

AU - Doroudi, Sherwin

AU - Harchol-Balter, Mor

AU - Hyytiä, Esa

AU - Scheller-Wolf, Alan

N1 - Funding Information: This material is based upon work supported by the Na-tional Science Foundation Graduate Research Fellowship un-der Grant No. DGE-1252522; was funded by NSF-CMMI-1334194 and NSF-CSR-1116282, by the Intel Science and Technology Center for Cloud Computing, and by a Google Faculty Research Award 2015/16; and has been supported by the Academy of Finland in TOP-Energy project (grant no. 268992). Publisher Copyright: © Copyright 2015 ACM.

PY - 2015/6/24

Y1 - 2015/6/24

N2 - Recent computer systems research has proposed using redundant requests to reduce latency. The idea is to run a request on multiple servers and wait for the first completion (discarding all remaining copies of the request). However there is no exact analysis of systems with redundancy. This paper presents the first exact analysis of systems with redundancy. We allow for any number of classes of redundant requests, any number of classes of non-redundant requests, any degree of redundancy, and any number of heterogeneous servers. In all cases we derive the limiting distribution on the state of the system. In small (two or three server) systems, we derive simple forms for the distribution of response time of both the redundant classes and non-redundant classes, and we quantify the "gain" to redundant classes and "pain" to non-redundant classes caused by redundancy. We find some surprising results. First, the response time of a fully redundant class follows a simple Exponential distribution and that of the non-redundant class follows a Generalized Hyperexponential. Second, fully redundant classes are "immune" to any pain caused by other classes becoming redundant. We also compare redundancy with other approaches for reducing latency, such as optimal probabilistic splitting of a class among servers (Opt-Split) and Join-the-Shortest-Queue (JSQ) routing of a class. We find that, in many cases, redundancy outperforms JSQ and Opt-Split with respect to overall response time, making it an attractive solution.

AB - Recent computer systems research has proposed using redundant requests to reduce latency. The idea is to run a request on multiple servers and wait for the first completion (discarding all remaining copies of the request). However there is no exact analysis of systems with redundancy. This paper presents the first exact analysis of systems with redundancy. We allow for any number of classes of redundant requests, any number of classes of non-redundant requests, any degree of redundancy, and any number of heterogeneous servers. In all cases we derive the limiting distribution on the state of the system. In small (two or three server) systems, we derive simple forms for the distribution of response time of both the redundant classes and non-redundant classes, and we quantify the "gain" to redundant classes and "pain" to non-redundant classes caused by redundancy. We find some surprising results. First, the response time of a fully redundant class follows a simple Exponential distribution and that of the non-redundant class follows a Generalized Hyperexponential. Second, fully redundant classes are "immune" to any pain caused by other classes becoming redundant. We also compare redundancy with other approaches for reducing latency, such as optimal probabilistic splitting of a class among servers (Opt-Split) and Join-the-Shortest-Queue (JSQ) routing of a class. We find that, in many cases, redundancy outperforms JSQ and Opt-Split with respect to overall response time, making it an attractive solution.

KW - Markov chain analysis

KW - Redundancy

UR - http://www.scopus.com/inward/record.url?scp=84955602825&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84955602825&partnerID=8YFLogxK

U2 - 10.1145/2796314.2745873

DO - 10.1145/2796314.2745873

M3 - Conference article

AN - SCOPUS:84955602825

SN - 0163-5999

VL - 43

SP - 347

EP - 360

JO - Performance Evaluation Review

JF - Performance Evaluation Review

IS - 1

Y2 - 15 June 2015 through 19 June 2015

ER -

Reducing latency via redundant requests: Exact analysis

Abstract

Bibliographical note

Keywords

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this