Q-Learning for robust satisfaction of signal temporal logic specifications

Derya Aksaray, Austin Jones, Zhaodan Kong, Mac Schwager, Calin Belta

Research output: Chapter in Book/Report/Conference proceedingConference contribution

109 Scopus citations

Abstract

This paper addresses the problem of learning optimal policies for satisfying signal temporal logic (STL) specifications by agents with unknown stochastic dynamics. The system is modeled as a Markov decision process, in which the states represent partitions of a continuous space and the transition probabilities are unknown. We formulate two synthesis problems where the desired STL specification is enforced by maximizing the probability of satisfaction, and the expected robustness degree, that is, a measure quantifying the quality of satisfaction. We discuss that Q-learning is not directly applicable to these problems because, based on the quantitative semantics of STL, the probability of satisfaction and expected robustness degree are not in the standard objective form of Q-learning. To resolve this issue, we propose an approximation of STL synthesis problems that can be solved via Q-learning, and we derive some performance bounds for the policies obtained by the approximate approach. The performance of the proposed method is demonstrated via simulations.

Original languageEnglish (US)
Title of host publication2016 IEEE 55th Conference on Decision and Control, CDC 2016
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages6565-6570
Number of pages6
ISBN (Electronic)9781509018376
DOIs
StatePublished - Dec 27 2016
Externally publishedYes
Event55th IEEE Conference on Decision and Control, CDC 2016 - Las Vegas, United States
Duration: Dec 12 2016Dec 14 2016

Publication series

Name2016 IEEE 55th Conference on Decision and Control, CDC 2016

Other

Other55th IEEE Conference on Decision and Control, CDC 2016
Country/TerritoryUnited States
CityLas Vegas
Period12/12/1612/14/16

Bibliographical note

Publisher Copyright:
© 2016 IEEE.

Fingerprint

Dive into the research topics of 'Q-Learning for robust satisfaction of signal temporal logic specifications'. Together they form a unique fingerprint.

Cite this