In this paper, we consider a package delivery drone that is desired to simultaneously perform aerial monitoring as a secondary mission. To integrate this secondary mission, we utilize a reward function representing the value of information gathered via aerial monitoring. We use time window temporal logic (TWTL) specifications to define the pickup and delivery tasks while utilizing reinforcement learning (RL) to maximize the expected sum of rewards. The high-level decision-making of the drone is modeled as a Markov decision process (MDP). In this regard, we extend the previous work where a model-free RL algorithm was used to solve this optimization problem. We propose a modified Dyna-Q algorithm to address the shortage of online samples. We provide extensive simulation results to compare the performance of the model-free and hybrid RL algorithms in this application and investigate the effect of the different system parameters on the overall performance.
|Original language||English (US)|
|Title of host publication||2021 International Conference on Unmanned Aircraft Systems, ICUAS 2021|
|Publisher||Institute of Electrical and Electronics Engineers Inc.|
|Number of pages||10|
|State||Published - Jun 15 2021|
|Event||2021 International Conference on Unmanned Aircraft Systems, ICUAS 2021 - Athens, Greece|
Duration: Jun 15 2021 → Jun 18 2021
|Name||2021 International Conference on Unmanned Aircraft Systems, ICUAS 2021|
|Conference||2021 International Conference on Unmanned Aircraft Systems, ICUAS 2021|
|Period||6/15/21 → 6/18/21|
Bibliographical notePublisher Copyright:
© 2021 IEEE.