TY - GEN
T1 - Natural gradients for state and output feedback control
AU - Lamperski, Andrew
PY - 2016/12/27
Y1 - 2016/12/27
N2 - Policy gradient methods for approximate optimal control and reinforcement learning fix parameterized form of the controller and then perform gradient descent on the cost-to-go function. In reinforcement learning for stochastic state-feedback problems, it has been shown that the natural gradient of the cost-to-go function can be approximated via samples of the state and step-cost, using no information about the plant model. There, the natural gradient is the gradient with respect to the Riemannian metric defined by the Fisher information matrix of the controller parameters. We give a general method for approximating the natural gradient for nonlinear output-feedback stochastic control problems with dynamic controllers. For linear systems, we give explicit formulas to compute the natural gradient when plant matrices are known, in both state and output feedback cases.
AB - Policy gradient methods for approximate optimal control and reinforcement learning fix parameterized form of the controller and then perform gradient descent on the cost-to-go function. In reinforcement learning for stochastic state-feedback problems, it has been shown that the natural gradient of the cost-to-go function can be approximated via samples of the state and step-cost, using no information about the plant model. There, the natural gradient is the gradient with respect to the Riemannian metric defined by the Fisher information matrix of the controller parameters. We give a general method for approximating the natural gradient for nonlinear output-feedback stochastic control problems with dynamic controllers. For linear systems, we give explicit formulas to compute the natural gradient when plant matrices are known, in both state and output feedback cases.
UR - http://www.scopus.com/inward/record.url?scp=85010815568&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85010815568&partnerID=8YFLogxK
U2 - 10.1109/CDC.2016.7798555
DO - 10.1109/CDC.2016.7798555
M3 - Conference contribution
AN - SCOPUS:85010815568
T3 - 2016 IEEE 55th Conference on Decision and Control, CDC 2016
SP - 1984
EP - 1989
BT - 2016 IEEE 55th Conference on Decision and Control, CDC 2016
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 55th IEEE Conference on Decision and Control, CDC 2016
Y2 - 12 December 2016 through 14 December 2016
ER -