Risk intelligence: Profitting from uncertainty in data processing system

Si Zheng; Yunhuai Liu; Shanshan Li; Tian He; Xiangke Liao

doi:10.1109/ICPP.2013.55

Risk intelligence: Profitting from uncertainty in data processing system

Si Zheng, Yunhuai Liu, Shanshan Li, Tian He, Xiangke Liao

Computer Science and Engineering

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Abstract

Fault-tolerance is essential in extreme-scale data processing systems. Pro-active fault-tolerance scheme (such as the speculative execution in MapReduce framework), can dramatically improve the response time of job executions when the failure becomes norm rather than an exception. Efficient pro-active fault-tolerance schemes require precise knowledge on the task executions, which has been an open challenges for decades. To well address the issue, in this paper we design and implement RiskI, a profile-based prediction algorithm in conjunction with a risk-aware task assignment algorithm to accelerate task executions, taking the uncertainty nature of tasks into account. Our design demonstrates that the nature uncertain not only brings great challenges but also new opportunities. With a careful design, we can benefit from such uncertainties. We implement the idea in Hadoop 0.21.0 systems and the experimental results show that compared with the traditional LATE algorithm, the response time can be improved by 46% with the same system throughput.

Original language	English (US)
Title of host publication	Proceedings
Subtitle of host publication	International Conference on Parallel Processing - The 42nd Annual Conference, ICPP 2013
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	458-467
Number of pages	10
ISBN (Print)	9780769551173
DOIs	https://doi.org/10.1109/ICPP.2013.55
State	Published - 2013
Event	42nd Annual International Conference on Parallel Processing, ICPP 2013 - Lyon, France Duration: Oct 1 2013 → Oct 4 2013

Publication series

Name	Proceedings of the International Conference on Parallel Processing
ISSN (Print)	0190-3918

Other

Other	42nd Annual International Conference on Parallel Processing, ICPP 2013
Country/Territory	France
City	Lyon
Period	10/1/13 → 10/4/13

Keywords

Data processing systems
Fault-tolerance
MapReduce
Prediction
Risk-management
Task assignment

Access

10.1109/ICPP.2013.55

OpenUrl availability

Full text

Cite this

Zheng, S., Liu, Y., Li, S., He, T., & Liao, X. (2013). Risk intelligence: Profitting from uncertainty in data processing system. In Proceedings: International Conference on Parallel Processing - The 42nd Annual Conference, ICPP 2013 (pp. 458-467). Article 6687379 (Proceedings of the International Conference on Parallel Processing). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICPP.2013.55

Risk intelligence: Profitting from uncertainty in data processing system. / Zheng, Si; Liu, Yunhuai; Li, Shanshan et al.
Proceedings: International Conference on Parallel Processing - The 42nd Annual Conference, ICPP 2013. Institute of Electrical and Electronics Engineers Inc., 2013. p. 458-467 6687379 (Proceedings of the International Conference on Parallel Processing).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Zheng, S, Liu, Y, Li, S, He, T & Liao, X 2013, Risk intelligence: Profitting from uncertainty in data processing system. in Proceedings: International Conference on Parallel Processing - The 42nd Annual Conference, ICPP 2013., 6687379, Proceedings of the International Conference on Parallel Processing, Institute of Electrical and Electronics Engineers Inc., pp. 458-467, 42nd Annual International Conference on Parallel Processing, ICPP 2013, Lyon, France, 10/1/13. https://doi.org/10.1109/ICPP.2013.55

Zheng S, Liu Y, Li S, He T, Liao X. Risk intelligence: Profitting from uncertainty in data processing system. In Proceedings: International Conference on Parallel Processing - The 42nd Annual Conference, ICPP 2013. Institute of Electrical and Electronics Engineers Inc. 2013. p. 458-467. 6687379. (Proceedings of the International Conference on Parallel Processing). doi: 10.1109/ICPP.2013.55

@inproceedings{1cd5f2e995f643559fee701e8cafef44,

title = "Risk intelligence: Profitting from uncertainty in data processing system",

abstract = "Fault-tolerance is essential in extreme-scale data processing systems. Pro-active fault-tolerance scheme (such as the speculative execution in MapReduce framework), can dramatically improve the response time of job executions when the failure becomes norm rather than an exception. Efficient pro-active fault-tolerance schemes require precise knowledge on the task executions, which has been an open challenges for decades. To well address the issue, in this paper we design and implement RiskI, a profile-based prediction algorithm in conjunction with a risk-aware task assignment algorithm to accelerate task executions, taking the uncertainty nature of tasks into account. Our design demonstrates that the nature uncertain not only brings great challenges but also new opportunities. With a careful design, we can benefit from such uncertainties. We implement the idea in Hadoop 0.21.0 systems and the experimental results show that compared with the traditional LATE algorithm, the response time can be improved by 46% with the same system throughput.",

keywords = "Data processing systems, Fault-tolerance, MapReduce, Prediction, Risk-management, Task assignment",

author = "Si Zheng and Yunhuai Liu and Shanshan Li and Tian He and Xiangke Liao",

year = "2013",

doi = "10.1109/ICPP.2013.55",

language = "English (US)",

isbn = "9780769551173",

series = "Proceedings of the International Conference on Parallel Processing",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "458--467",

booktitle = "Proceedings",

note = "42nd Annual International Conference on Parallel Processing, ICPP 2013 ; Conference date: 01-10-2013 Through 04-10-2013",

}

TY - GEN

T1 - Risk intelligence

T2 - 42nd Annual International Conference on Parallel Processing, ICPP 2013

AU - Zheng, Si

AU - Liu, Yunhuai

AU - Li, Shanshan

AU - He, Tian

AU - Liao, Xiangke

PY - 2013

Y1 - 2013

N2 - Fault-tolerance is essential in extreme-scale data processing systems. Pro-active fault-tolerance scheme (such as the speculative execution in MapReduce framework), can dramatically improve the response time of job executions when the failure becomes norm rather than an exception. Efficient pro-active fault-tolerance schemes require precise knowledge on the task executions, which has been an open challenges for decades. To well address the issue, in this paper we design and implement RiskI, a profile-based prediction algorithm in conjunction with a risk-aware task assignment algorithm to accelerate task executions, taking the uncertainty nature of tasks into account. Our design demonstrates that the nature uncertain not only brings great challenges but also new opportunities. With a careful design, we can benefit from such uncertainties. We implement the idea in Hadoop 0.21.0 systems and the experimental results show that compared with the traditional LATE algorithm, the response time can be improved by 46% with the same system throughput.

AB - Fault-tolerance is essential in extreme-scale data processing systems. Pro-active fault-tolerance scheme (such as the speculative execution in MapReduce framework), can dramatically improve the response time of job executions when the failure becomes norm rather than an exception. Efficient pro-active fault-tolerance schemes require precise knowledge on the task executions, which has been an open challenges for decades. To well address the issue, in this paper we design and implement RiskI, a profile-based prediction algorithm in conjunction with a risk-aware task assignment algorithm to accelerate task executions, taking the uncertainty nature of tasks into account. Our design demonstrates that the nature uncertain not only brings great challenges but also new opportunities. With a careful design, we can benefit from such uncertainties. We implement the idea in Hadoop 0.21.0 systems and the experimental results show that compared with the traditional LATE algorithm, the response time can be improved by 46% with the same system throughput.

KW - Data processing systems

KW - Fault-tolerance

KW - MapReduce

KW - Prediction

KW - Risk-management

KW - Task assignment

UR - http://www.scopus.com/inward/record.url?scp=84893271059&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84893271059&partnerID=8YFLogxK

U2 - 10.1109/ICPP.2013.55

DO - 10.1109/ICPP.2013.55

M3 - Conference contribution

AN - SCOPUS:84893271059

SN - 9780769551173

T3 - Proceedings of the International Conference on Parallel Processing

SP - 458

EP - 467

BT - Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

Y2 - 1 October 2013 through 4 October 2013

ER -

Risk intelligence: Profitting from uncertainty in data processing system

Abstract

Publication series

Other

Keywords

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this