TY - JOUR
T1 - Optimal and Scalable Caching for 5G Using Reinforcement Learning of Space-Time Popularities
AU - Sadeghi, Alireza
AU - Sheikholeslami, Fatemeh
AU - Giannakis, Georgios B.
N1 - Publisher Copyright:
© 2007-2012 IEEE.
PY - 2018/2
Y1 - 2018/2
N2 - Small basestations (SBs) equipped with caching units have potential to handle the unprecedented demand growth in heterogeneous networks. Through low-rate, backhaul connections with the backbone, SBs can prefetch popular files during off-peak traffic hours, and service them to the edge at peak periods. To intelligently prefetch, each SB must learn what and when to cache, while taking into account SB memory limitations, the massive number of available contents, the unknown popularity profiles, as well as the space-time popularity dynamics of user file requests. In this paper, local and global Markov processes model user requests, and a reinforcement learning (RL) framework is put forth for finding the optimal caching policy when the transition probabilities involved are unknown. Joint consideration of global and local popularity demands along with cache-refreshing costs allow for a simple, yet practical asynchronous caching approach. The novel RL-based caching relies on a Q-learning algorithm to implement the optimal policy in an online fashion, thus, enabling the cache control unit at the SB to learn, track, and possibly adapt to the underlying dynamics. To endow the algorithm with scalability, a linear function approximation of the proposed Q-learning scheme is introduced, offering faster convergence as well as reduced complexity and memory requirements. Numerical tests corroborate the merits of the proposed approach in various realistic settings.
AB - Small basestations (SBs) equipped with caching units have potential to handle the unprecedented demand growth in heterogeneous networks. Through low-rate, backhaul connections with the backbone, SBs can prefetch popular files during off-peak traffic hours, and service them to the edge at peak periods. To intelligently prefetch, each SB must learn what and when to cache, while taking into account SB memory limitations, the massive number of available contents, the unknown popularity profiles, as well as the space-time popularity dynamics of user file requests. In this paper, local and global Markov processes model user requests, and a reinforcement learning (RL) framework is put forth for finding the optimal caching policy when the transition probabilities involved are unknown. Joint consideration of global and local popularity demands along with cache-refreshing costs allow for a simple, yet practical asynchronous caching approach. The novel RL-based caching relies on a Q-learning algorithm to implement the optimal policy in an online fashion, thus, enabling the cache control unit at the SB to learn, track, and possibly adapt to the underlying dynamics. To endow the algorithm with scalability, a linear function approximation of the proposed Q-learning scheme is introduced, offering faster convergence as well as reduced complexity and memory requirements. Numerical tests corroborate the merits of the proposed approach in various realistic settings.
KW - Caching
KW - Markov decision process (MDP)
KW - Q-learning
KW - dynamic popularity profile
KW - reinforcement learning
UR - http://www.scopus.com/inward/record.url?scp=85040051761&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85040051761&partnerID=8YFLogxK
U2 - 10.1109/JSTSP.2017.2787979
DO - 10.1109/JSTSP.2017.2787979
M3 - Article
AN - SCOPUS:85040051761
SN - 1932-4553
VL - 12
SP - 180
EP - 190
JO - IEEE Journal on Selected Topics in Signal Processing
JF - IEEE Journal on Selected Topics in Signal Processing
IS - 1
ER -