Efficient learning in linearly solvable MDP models

Ang Li; Paul R Schrater

Efficient learning in linearly solvable MDP models

Ang Li, Paul R Schrater

Psychology (Twin Cities)

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

2 Scopus citations

Abstract

Linearly solvable Markov Decision Process (MDP) models are a powerful subclass of problems with a simple structure that allow the policy to be written directly in terms of the uncontrolled (passive) dynamics of the environment and the goals of the agent. However, there have been no learning algorithms for this class of models. In this research, we develop a robust learning approach to linearly solvable MDPs. To exploit the simple solution for general problems, we show how to construct passive dynamics from any transition matrix, use Bayesian updating to estimate the model parameters and apply approximate and efficient Bayesian exploration to speed learning. In addition, we reduce the computational cost of learning using intermittent Bayesian updating and policy solving. We also gave a polynomial theoretical time complexity bound for the convergence of our learning algorithm, and demonstrate a linear bound for the subclass of the reinforcement learning problems with the property that the transition error depends only on the agent itself. Test results for our algorithm in a grid world are presented, comparing our algorithm with the BEB algorithm. The results showed that our algorithm learned more than the BEB algorithm without losing convergence speed, so that the advantage of our algorithm increased as the environment got more complex. We also showed that our algorithm's performance is more stable after convergence. Finally, we show how to apply our approach to the Cellular Telephones problem by defining the passive dynamics.

Original language	English (US)
Title of host publication	IJCAI 2013 - Proceedings of the 23rd International Joint Conference on Artificial Intelligence
Pages	248-253
Number of pages	6
State	Published - Dec 1 2013
Event	23rd International Joint Conference on Artificial Intelligence, IJCAI 2013 - Beijing, China Duration: Aug 3 2013 → Aug 9 2013

Publication series

Name	IJCAI International Joint Conference on Artificial Intelligence
ISSN (Print)	1045-0823

Other

Other	23rd International Joint Conference on Artificial Intelligence, IJCAI 2013
Country/Territory	China
City	Beijing
Period	8/3/13 → 8/9/13

OpenUrl availability

Full text

Cite this

@inproceedings{1ba3f808e85d4248a41c7d655c03b388,

title = "Efficient learning in linearly solvable MDP models",

abstract = "Linearly solvable Markov Decision Process (MDP) models are a powerful subclass of problems with a simple structure that allow the policy to be written directly in terms of the uncontrolled (passive) dynamics of the environment and the goals of the agent. However, there have been no learning algorithms for this class of models. In this research, we develop a robust learning approach to linearly solvable MDPs. To exploit the simple solution for general problems, we show how to construct passive dynamics from any transition matrix, use Bayesian updating to estimate the model parameters and apply approximate and efficient Bayesian exploration to speed learning. In addition, we reduce the computational cost of learning using intermittent Bayesian updating and policy solving. We also gave a polynomial theoretical time complexity bound for the convergence of our learning algorithm, and demonstrate a linear bound for the subclass of the reinforcement learning problems with the property that the transition error depends only on the agent itself. Test results for our algorithm in a grid world are presented, comparing our algorithm with the BEB algorithm. The results showed that our algorithm learned more than the BEB algorithm without losing convergence speed, so that the advantage of our algorithm increased as the environment got more complex. We also showed that our algorithm's performance is more stable after convergence. Finally, we show how to apply our approach to the Cellular Telephones problem by defining the passive dynamics.",

author = "Ang Li and Schrater, {Paul R}",

year = "2013",

month = dec,

day = "1",

language = "English (US)",

isbn = "9781577356332",

series = "IJCAI International Joint Conference on Artificial Intelligence",

pages = "248--253",

booktitle = "IJCAI 2013 - Proceedings of the 23rd International Joint Conference on Artificial Intelligence",

note = "23rd International Joint Conference on Artificial Intelligence, IJCAI 2013 ; Conference date: 03-08-2013 Through 09-08-2013",

}

TY - GEN

T1 - Efficient learning in linearly solvable MDP models

AU - Li, Ang

AU - Schrater, Paul R

PY - 2013/12/1

Y1 - 2013/12/1

N2 - Linearly solvable Markov Decision Process (MDP) models are a powerful subclass of problems with a simple structure that allow the policy to be written directly in terms of the uncontrolled (passive) dynamics of the environment and the goals of the agent. However, there have been no learning algorithms for this class of models. In this research, we develop a robust learning approach to linearly solvable MDPs. To exploit the simple solution for general problems, we show how to construct passive dynamics from any transition matrix, use Bayesian updating to estimate the model parameters and apply approximate and efficient Bayesian exploration to speed learning. In addition, we reduce the computational cost of learning using intermittent Bayesian updating and policy solving. We also gave a polynomial theoretical time complexity bound for the convergence of our learning algorithm, and demonstrate a linear bound for the subclass of the reinforcement learning problems with the property that the transition error depends only on the agent itself. Test results for our algorithm in a grid world are presented, comparing our algorithm with the BEB algorithm. The results showed that our algorithm learned more than the BEB algorithm without losing convergence speed, so that the advantage of our algorithm increased as the environment got more complex. We also showed that our algorithm's performance is more stable after convergence. Finally, we show how to apply our approach to the Cellular Telephones problem by defining the passive dynamics.

AB - Linearly solvable Markov Decision Process (MDP) models are a powerful subclass of problems with a simple structure that allow the policy to be written directly in terms of the uncontrolled (passive) dynamics of the environment and the goals of the agent. However, there have been no learning algorithms for this class of models. In this research, we develop a robust learning approach to linearly solvable MDPs. To exploit the simple solution for general problems, we show how to construct passive dynamics from any transition matrix, use Bayesian updating to estimate the model parameters and apply approximate and efficient Bayesian exploration to speed learning. In addition, we reduce the computational cost of learning using intermittent Bayesian updating and policy solving. We also gave a polynomial theoretical time complexity bound for the convergence of our learning algorithm, and demonstrate a linear bound for the subclass of the reinforcement learning problems with the property that the transition error depends only on the agent itself. Test results for our algorithm in a grid world are presented, comparing our algorithm with the BEB algorithm. The results showed that our algorithm learned more than the BEB algorithm without losing convergence speed, so that the advantage of our algorithm increased as the environment got more complex. We also showed that our algorithm's performance is more stable after convergence. Finally, we show how to apply our approach to the Cellular Telephones problem by defining the passive dynamics.

UR - http://www.scopus.com/inward/record.url?scp=84896062012&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84896062012&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84896062012

SN - 9781577356332

T3 - IJCAI International Joint Conference on Artificial Intelligence

SP - 248

EP - 253

BT - IJCAI 2013 - Proceedings of the 23rd International Joint Conference on Artificial Intelligence

T2 - 23rd International Joint Conference on Artificial Intelligence, IJCAI 2013

Y2 - 3 August 2013 through 9 August 2013

ER -

Efficient learning in linearly solvable MDP models

Abstract

Publication series

Other

OpenUrl availability

Other files and links

Fingerprint

Cite this