TY - JOUR
T1 - On interference-aware provisioning for cloud-based big data processing
AU - Yuan, Yi
AU - Wang, Haiyang
AU - Wang, Dan
AU - Liu, Jiangchuan
PY - 2013/8/15
Y1 - 2013/8/15
N2 - Recent advances in cloud-based big data analysis offers a convenient mean for providing an elastic and cost-efficient exploration of voluminous data sets. Following such a trend, industry leaders as Amazon, Google and IBM deploy various of big data systems on their cloud platforms, aiming to occupy the huge market around the globe. While these cloud systems greatly facilitate the implementation of big data analysis, their real-world applicability remains largely unclear. In this paper, we take the first steps towards a better understanding of the big data system on the cloud platforms. Using the typical MapReduce framework as a case study, we find that its pipeline-based design intergrades the computational-intensive operations (such as mapping/reducing) together with the I/O-intensive operations (such as shuffling). Such computational-intensive and I/O-intensive operations will seriously affect the performance of each other and largely reduces the system efficiency especially on the low-end virtual machines (VMs). To make the matter worse, our measurement also indicates that more than 90 % of the task-lifetime is in the shadow of such interference. This unavoidably reduces the applicability of cloud-based big data processing and makes the overall performance hard to predict. To address this problem, we re-model the resource provisioning problem in the cloud-based big data systems and present an interference-aware solution that smartly allocates the MapReduce jobs to different VMs. Our evaluation result shows that our new model can accurately predict the job completion time across different configurations and significantly improve the user experience for this new generation of data processing service.
AB - Recent advances in cloud-based big data analysis offers a convenient mean for providing an elastic and cost-efficient exploration of voluminous data sets. Following such a trend, industry leaders as Amazon, Google and IBM deploy various of big data systems on their cloud platforms, aiming to occupy the huge market around the globe. While these cloud systems greatly facilitate the implementation of big data analysis, their real-world applicability remains largely unclear. In this paper, we take the first steps towards a better understanding of the big data system on the cloud platforms. Using the typical MapReduce framework as a case study, we find that its pipeline-based design intergrades the computational-intensive operations (such as mapping/reducing) together with the I/O-intensive operations (such as shuffling). Such computational-intensive and I/O-intensive operations will seriously affect the performance of each other and largely reduces the system efficiency especially on the low-end virtual machines (VMs). To make the matter worse, our measurement also indicates that more than 90 % of the task-lifetime is in the shadow of such interference. This unavoidably reduces the applicability of cloud-based big data processing and makes the overall performance hard to predict. To address this problem, we re-model the resource provisioning problem in the cloud-based big data systems and present an interference-aware solution that smartly allocates the MapReduce jobs to different VMs. Our evaluation result shows that our new model can accurately predict the job completion time across different configurations and significantly improve the user experience for this new generation of data processing service.
UR - http://www.scopus.com/inward/record.url?scp=84881330111&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84881330111&partnerID=8YFLogxK
U2 - 10.1109/IWQoS.2013.6550282
DO - 10.1109/IWQoS.2013.6550282
M3 - Conference article
AN - SCOPUS:84881330111
SN - 1548-615X
SP - 201
EP - 206
JO - IEEE International Workshop on Quality of Service, IWQoS
JF - IEEE International Workshop on Quality of Service, IWQoS
M1 - 6550282
T2 - 2013 IEEE/ACM 21st International Symposium on Quality of Service, IWQoS 2013
Y2 - 3 June 2013 through 4 June 2013
ER -