Multi-query optimization in wide-area streaming analytics

Albert Jonathan, Abhishek Chandra, Jon Weissman

Research output: Chapter in Book/Report/Conference proceedingConference contribution

33 Scopus citations

Abstract

Wide-area data analytics has gained much attention in recent years due to the increasing need for analyzing data that are geographically distributed. Many of such queries often require real-time analysis on data streams that are continuously being generated across multiple locations. Yet, analyzing these geo-distributed data streams in a timely manner is very challenging due to the highly heterogeneous and limited bandwidth availability of the wide-area network (WAN). This paper examines the opportunity of applying multi-query optimization in the context of wide-area streaming analytics, with the goal of utilizing WAN bandwidth efficiently while achieving high throughput and low latency execution. Our approach is based on the insight that many streaming analytics queries often exhibit common executions, whether in consuming a common set of input data or performing the same data processing. In this work, we study different types of sharing opportunities and propose a practical online algorithm that allows streaming analytics queries to share their common executions incrementally. We further address the importance of WAN awareness in applying multi-query optimization. Without WAN awareness, sharing executions in a wide-area environment may lead to performance degradation. We have implemented our WAN-aware multi-query optimization in a prototype implementation based on Apache Flink. Experimental evaluation using Twitter traces on a real wide-area system deployment across geo-distributed EC2 data centers shows that our technique is able to achieve 21% higher throughput while saving WAN bandwidth consumption by 33% compared to a WAN-aware, sharing-agnostic system.

Original languageEnglish (US)
Title of host publicationSoCC 2018 - Proceedings of the 2018 ACM Symposium on Cloud Computing
PublisherAssociation for Computing Machinery, Inc
Pages412-425
Number of pages14
ISBN (Electronic)9781450360111
DOIs
StatePublished - Oct 11 2018
Event2018 ACM Symposium on Cloud Computing, SoCC 2018 - Carlsbad, United States
Duration: Oct 11 2018Oct 13 2018

Publication series

NameSoCC 2018 - Proceedings of the 2018 ACM Symposium on Cloud Computing

Other

Other2018 ACM Symposium on Cloud Computing, SoCC 2018
Country/TerritoryUnited States
CityCarlsbad
Period10/11/1810/13/18

Bibliographical note

Funding Information:
The authors would like to thank the anonymous SoCC reviewers for their valuable comments and feedback. The work is supported by grant NSF CNS-1619254 and CNS-1717834.

Publisher Copyright:
© 2018 Association for Computing Machinery.

Keywords

  • Execution sharing
  • Geo-distributed systems
  • Multi-query optimization
  • Stream processing systems

Fingerprint

Dive into the research topics of 'Multi-query optimization in wide-area streaming analytics'. Together they form a unique fingerprint.

Cite this