We propose a new algorithm for learning the model parameters of a partially observable Markov decision process (POMDP) based on coupled canonical polyadic decomposition (CPD). Coupled CPD for a set of tensors is an extension to CPD for individual tensors, which has improved identifiability properties, as well as an analogous simultaneous diagonalization (SD) algorithm for uniquely recovering the latent factors efficiently. We explain how to form a set of three-way tensors from the trajectory of a POMDP under a stationary memoryless policy, so that coupled CPD can be applied afterwards to recover the model parameters, with identifiability and computational guarantees.
|Original language||English (US)|
|Title of host publication||2019 IEEE Data Science Workshop, DSW 2019 - Proceedings|
|Publisher||Institute of Electrical and Electronics Engineers Inc.|
|Number of pages||5|
|State||Published - Jun 2019|
|Event||2019 IEEE Data Science Workshop, DSW 2019 - Minneapolis, United States|
Duration: Jun 2 2019 → Jun 5 2019
|Name||2019 IEEE Data Science Workshop, DSW 2019 - Proceedings|
|Conference||2019 IEEE Data Science Workshop, DSW 2019|
|Period||6/2/19 → 6/5/19|
Bibliographical noteFunding Information:
M. Hong is supported by a NSF grant CMMI-1727757, and an AFOSR grant 15RT0767.
© 2019 IEEE.
- coupled CPD
- partially observable Markov decision process
- reinforcement learning
- tensor decomposition