Small basestations (SBs) equipped with caching units have potential to handle the unprecedented demand growth in heterogeneous networks. Through low-rate, backhaul connections with the backbone, SBs can prefetch popular files during off-peak traffic hours, and service them to the edge at peak periods. To intelligently prefetch, each SB must learn what and when to cache, while taking into account SB memory limitations, the massive number of available contents, the unknown popularity profiles, as well as the space-time popularity dynamics of user file requests. In this paper, local and global Markov processes model user requests, and a reinforcement learning (RL) framework is put forth for finding the optimal caching policy when the transition probabilities involved are unknown. Joint consideration of global and local popularity demands along with cache-refreshing costs allow for a simple, yet practical asynchronous caching approach. The novel RL-based caching relies on a Q-learning algorithm to implement the optimal policy in an online fashion, thus, enabling the cache control unit at the SB to learn, track, and possibly adapt to the underlying dynamics. To endow the algorithm with scalability, a linear function approximation of the proposed Q-learning scheme is introduced, offering faster convergence as well as reduced complexity and memory requirements. Numerical tests corroborate the merits of the proposed approach in various realistic settings.
|Original language||English (US)|
|Number of pages||11|
|Journal||IEEE Journal on Selected Topics in Signal Processing|
|State||Published - Feb 2018|
Bibliographical noteFunding Information:
Manuscript received July 14, 2017; revised November 13, 2017; accepted December 11, 2017. Date of publication December 29, 2017; date of current version February 16, 2018. This work was supported by the National Science Foundation under Grants 1423316, 1508993, 1514056, and 1711471. The guest editor coordinating the review of this manuscript and approving it for publication was Prof. Liang Xiao. (Corresponding author: Alireza Sadeghi.) The authors are with the Digital Technology Center and the Department of Electric and Computer Engineering, University of Minnesota, Minneapolis, MN 55455 USA (e-mail: email@example.com; firstname.lastname@example.org; georgios@umn. edu).
© 2007-2012 IEEE.
- Markov decision process (MDP)
- dynamic popularity profile
- reinforcement learning