Dlion: Decentralized distributed deep learning in micro-clouds

Rankyung Hong; Abhishek Chandra

Dlion: Decentralized distributed deep learning in micro-clouds

Rankyung Hong, Abhishek Chandra

Computer Science and Engineering

Research output: Contribution to conference › Paper › peer-review

13 Scopus citations

Abstract

Deep learning is a popular technique for building models from large quantities of input data for applications in many domains. With the proliferation of edge devices such as sensor and mobile devices, large volumes of data are generated at rapid pace all over the world. Migrating large amounts of data into centralized data center(s) over WAN environments is often infeasible due to cost, performance or privacy reasons. Moreover, there is an increasing need for incremental or online deep learning over newly generated data in real-time. These trends require rethinking of the traditional training approach to deep learning. To handle the computation on distributed input data, micro-clouds—small-scale clouds deployed near edge devices in many different locations—provide an attractive alternative for data locality reasons. However, existing distributed deep learning systems do not support training in micro-clouds, due to the unique characteristics and challenges in this environment. In this paper, we examine the key challenges of deep learning in micro-clouds: computation and network resource heterogeneity at inter- and intra micro-cloud levels and their scale. We present DLion, a decentralized distributed deep learning system for such environments. It employs techniques specifically designed to address the above challenges to reduce training time, enhance model accuracy, and provide system scalability. We have implemented a prototype of DLion in TensorFlow and our preliminary experiments show promising results towards achieving accurate and efficient distributed deep learning in micro-clouds.

Original language	English (US)
State	Published - 2019
Event	11th USENIX Workshop on Hot Topics in Cloud Computing, HotCloud 2019 - Renton, United States Duration: Jul 8 2019 → …

Conference

Conference	11th USENIX Workshop on Hot Topics in Cloud Computing, HotCloud 2019
Country/Territory	United States
City	Renton
Period	7/8/19 → …

OpenUrl availability

Full text

Cite this

@conference{e2ede30788af4079aa3add996eaebc17,

title = "Dlion: Decentralized distributed deep learning in micro-clouds",

abstract = "Deep learning is a popular technique for building models from large quantities of input data for applications in many domains. With the proliferation of edge devices such as sensor and mobile devices, large volumes of data are generated at rapid pace all over the world. Migrating large amounts of data into centralized data center(s) over WAN environments is often infeasible due to cost, performance or privacy reasons. Moreover, there is an increasing need for incremental or online deep learning over newly generated data in real-time. These trends require rethinking of the traditional training approach to deep learning. To handle the computation on distributed input data, micro-clouds—small-scale clouds deployed near edge devices in many different locations—provide an attractive alternative for data locality reasons. However, existing distributed deep learning systems do not support training in micro-clouds, due to the unique characteristics and challenges in this environment. In this paper, we examine the key challenges of deep learning in micro-clouds: computation and network resource heterogeneity at inter- and intra micro-cloud levels and their scale. We present DLion, a decentralized distributed deep learning system for such environments. It employs techniques specifically designed to address the above challenges to reduce training time, enhance model accuracy, and provide system scalability. We have implemented a prototype of DLion in TensorFlow and our preliminary experiments show promising results towards achieving accurate and efficient distributed deep learning in micro-clouds.",

author = "Rankyung Hong and Abhishek Chandra",

year = "2019",

language = "English (US)",

note = "11th USENIX Workshop on Hot Topics in Cloud Computing, HotCloud 2019 ; Conference date: 08-07-2019",

}

TY - CONF

T1 - Dlion

T2 - 11th USENIX Workshop on Hot Topics in Cloud Computing, HotCloud 2019

AU - Hong, Rankyung

AU - Chandra, Abhishek

PY - 2019

Y1 - 2019

N2 - Deep learning is a popular technique for building models from large quantities of input data for applications in many domains. With the proliferation of edge devices such as sensor and mobile devices, large volumes of data are generated at rapid pace all over the world. Migrating large amounts of data into centralized data center(s) over WAN environments is often infeasible due to cost, performance or privacy reasons. Moreover, there is an increasing need for incremental or online deep learning over newly generated data in real-time. These trends require rethinking of the traditional training approach to deep learning. To handle the computation on distributed input data, micro-clouds—small-scale clouds deployed near edge devices in many different locations—provide an attractive alternative for data locality reasons. However, existing distributed deep learning systems do not support training in micro-clouds, due to the unique characteristics and challenges in this environment. In this paper, we examine the key challenges of deep learning in micro-clouds: computation and network resource heterogeneity at inter- and intra micro-cloud levels and their scale. We present DLion, a decentralized distributed deep learning system for such environments. It employs techniques specifically designed to address the above challenges to reduce training time, enhance model accuracy, and provide system scalability. We have implemented a prototype of DLion in TensorFlow and our preliminary experiments show promising results towards achieving accurate and efficient distributed deep learning in micro-clouds.

AB - Deep learning is a popular technique for building models from large quantities of input data for applications in many domains. With the proliferation of edge devices such as sensor and mobile devices, large volumes of data are generated at rapid pace all over the world. Migrating large amounts of data into centralized data center(s) over WAN environments is often infeasible due to cost, performance or privacy reasons. Moreover, there is an increasing need for incremental or online deep learning over newly generated data in real-time. These trends require rethinking of the traditional training approach to deep learning. To handle the computation on distributed input data, micro-clouds—small-scale clouds deployed near edge devices in many different locations—provide an attractive alternative for data locality reasons. However, existing distributed deep learning systems do not support training in micro-clouds, due to the unique characteristics and challenges in this environment. In this paper, we examine the key challenges of deep learning in micro-clouds: computation and network resource heterogeneity at inter- and intra micro-cloud levels and their scale. We present DLion, a decentralized distributed deep learning system for such environments. It employs techniques specifically designed to address the above challenges to reduce training time, enhance model accuracy, and provide system scalability. We have implemented a prototype of DLion in TensorFlow and our preliminary experiments show promising results towards achieving accurate and efficient distributed deep learning in micro-clouds.

UR - http://www.scopus.com/inward/record.url?scp=85083213248&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85083213248&partnerID=8YFLogxK

M3 - Paper

AN - SCOPUS:85083213248

Y2 - 8 July 2019

ER -

Dlion: Decentralized distributed deep learning in micro-clouds

Abstract

Conference

OpenUrl availability

Other files and links

Fingerprint

Cite this