TY - GEN
T1 - Large-scale neural modeling in MapReduce and Giraph
AU - Yang, Shuo
AU - Spielman, Nicholas D.
AU - Jackson, Jadin C.
AU - Rubin, Brad S.
PY - 2014
Y1 - 2014
N2 - One of the most crucial challenges in scientific computing is scalability. Hadoop, an open-source implementation of the MapReduce parallel programming model developed by Google, has emerged as a powerful platform for performing large-scale scientific computing at very low costs. In this paper, we explore the use of Hadoop to model large-scale neural networks. A neural network is most naturally modeled by a graph structure with iterative processing. In this paper, we first present an improved graph algorithm design pattern in MapReduce called Mapper-side Schimmy. Experiments show that the application of our design pattern, combined with the current best practices, can reduce the running time of the neural network simulation on a neural network with 100,000 neurons and 2.3 billion edges by 64%. MapReduce, however, is inherently not efficient for iterative graph processing. To address the limitation of the MapReduce model, we then explore the use of Giraph, an open source large-scale graph processing framework that sits on top of Hadoop to implement graph algorithms with a vertex-centric approach. We show that our Giraph implementation boosted performance by 91% compared to a basic MapReduce implementation and by 60% compared to our improved Mapper-side Schimmy algorithm.
AB - One of the most crucial challenges in scientific computing is scalability. Hadoop, an open-source implementation of the MapReduce parallel programming model developed by Google, has emerged as a powerful platform for performing large-scale scientific computing at very low costs. In this paper, we explore the use of Hadoop to model large-scale neural networks. A neural network is most naturally modeled by a graph structure with iterative processing. In this paper, we first present an improved graph algorithm design pattern in MapReduce called Mapper-side Schimmy. Experiments show that the application of our design pattern, combined with the current best practices, can reduce the running time of the neural network simulation on a neural network with 100,000 neurons and 2.3 billion edges by 64%. MapReduce, however, is inherently not efficient for iterative graph processing. To address the limitation of the MapReduce model, we then explore the use of Giraph, an open source large-scale graph processing framework that sits on top of Hadoop to implement graph algorithms with a vertex-centric approach. We show that our Giraph implementation boosted performance by 91% compared to a basic MapReduce implementation and by 60% compared to our improved Mapper-side Schimmy algorithm.
UR - http://www.scopus.com/inward/record.url?scp=84906559156&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84906559156&partnerID=8YFLogxK
U2 - 10.1109/EIT.2014.6871824
DO - 10.1109/EIT.2014.6871824
M3 - Conference contribution
AN - SCOPUS:84906559156
SN - 9781479947744
T3 - IEEE International Conference on Electro Information Technology
SP - 556
EP - 561
BT - 2014 IEEE International Conference on Electro/Information Technology, EIT 2014
PB - IEEE Computer Society
T2 - 2014 IEEE International Conference on Electro/Information Technology, EIT 2014
Y2 - 5 June 2014 through 7 June 2014
ER -