Trends in big data analytics

Karthik Kambatla, Giorgos Kollias, Vipin Kumar, Ananth Grama

Research output: Contribution to journalArticlepeer-review

609 Scopus citations

Abstract

One of the major applications of future generation parallel and distributed systems is in big-data analytics. Data repositories for such applications currently exceed exabytes and are rapidly increasing in size. Beyond their sheer magnitude, these datasets and associated applications' considerations pose significant challenges for method and software development. Datasets are often distributed and their size and privacy considerations warrant distributed techniques. Data often resides on platforms with widely varying computational and network capabilities. Considerations of fault-tolerance, security, and access control are critical in many applications (Dean and Ghemawat, 2004; Apache hadoop). Analysis tasks often have hard deadlines, and data quality is a major concern in yet other applications. For most emerging applications, data-driven models and methods, capable of operating at scale, are as-yet unknown. Even when known methods can be scaled, validation of results is a major issue. Characteristics of hardware platforms and the software stack fundamentally impact data analytics. In this article, we provide an overview of the state-of-the-art and focus on emerging trends to highlight the hardware, software, and application landscape of big-data analytics.

Original languageEnglish (US)
Pages (from-to)2561-2573
Number of pages13
JournalJournal of Parallel and Distributed Computing
Volume74
Issue number7
DOIs
StatePublished - Jul 2014

Bibliographical note

Funding Information:
Ananth Grama is the Director of the Computational Science and Engineering program and Professor of Computer Science at Purdue University. He also serves as the Associate Director of the Center for Science of Information. Ananth received his B. Engg from Indian Institute of Technology, Roorkee (1989), his M.S. from Wayne State University (1990), and Ph.D. from the University of Minnesota (1996). His research interests lie in parallel and distributed systems, numerical methods, large-scale data analysis, and their applications. Ananth is a recipient of the National Science Foundation CAREER award (1998), University Faculty Scholar Award (2002–07), and is a Fellow of the American Association for the Advancement of Sciences (2013).

Keywords

  • Analytics
  • Big-data
  • Data centers
  • Distributed systems

Fingerprint

Dive into the research topics of 'Trends in big data analytics'. Together they form a unique fingerprint.

Cite this