Q-Learning for robust satisfaction of signal temporal logic specifications

Derya Aksaray; Austin Jones; Zhaodan Kong; Mac Schwager; Calin Belta

doi:10.1109/CDC.2016.7799279

Q-Learning for robust satisfaction of signal temporal logic specifications

Derya Aksaray, Austin Jones, Zhaodan Kong, Mac Schwager, Calin Belta

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

109 Scopus citations

Abstract

This paper addresses the problem of learning optimal policies for satisfying signal temporal logic (STL) specifications by agents with unknown stochastic dynamics. The system is modeled as a Markov decision process, in which the states represent partitions of a continuous space and the transition probabilities are unknown. We formulate two synthesis problems where the desired STL specification is enforced by maximizing the probability of satisfaction, and the expected robustness degree, that is, a measure quantifying the quality of satisfaction. We discuss that Q-learning is not directly applicable to these problems because, based on the quantitative semantics of STL, the probability of satisfaction and expected robustness degree are not in the standard objective form of Q-learning. To resolve this issue, we propose an approximation of STL synthesis problems that can be solved via Q-learning, and we derive some performance bounds for the policies obtained by the approximate approach. The performance of the proposed method is demonstrated via simulations.

Original language	English (US)
Title of host publication	2016 IEEE 55th Conference on Decision and Control, CDC 2016
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	6565-6570
Number of pages	6
ISBN (Electronic)	9781509018376
DOIs	https://doi.org/10.1109/CDC.2016.7799279
State	Published - Dec 27 2016
Externally published	Yes
Event	55th IEEE Conference on Decision and Control, CDC 2016 - Las Vegas, United States Duration: Dec 12 2016 → Dec 14 2016

Publication series

Name	2016 IEEE 55th Conference on Decision and Control, CDC 2016

Other

Other	55th IEEE Conference on Decision and Control, CDC 2016
Country/Territory	United States
City	Las Vegas
Period	12/12/16 → 12/14/16

Bibliographical note

Publisher Copyright:
© 2016 IEEE.

Access

10.1109/CDC.2016.7799279

OpenUrl availability

Full text

Cite this

Aksaray, D., Jones, A., Kong, Z., Schwager, M., & Belta, C. (2016). Q-Learning for robust satisfaction of signal temporal logic specifications. In 2016 IEEE 55th Conference on Decision and Control, CDC 2016 (pp. 6565-6570). Article 7799279 (2016 IEEE 55th Conference on Decision and Control, CDC 2016). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/CDC.2016.7799279

Q-Learning for robust satisfaction of signal temporal logic specifications. / Aksaray, Derya; Jones, Austin; Kong, Zhaodan et al.
2016 IEEE 55th Conference on Decision and Control, CDC 2016. Institute of Electrical and Electronics Engineers Inc., 2016. p. 6565-6570 7799279 (2016 IEEE 55th Conference on Decision and Control, CDC 2016).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Aksaray, D, Jones, A, Kong, Z, Schwager, M & Belta, C 2016, Q-Learning for robust satisfaction of signal temporal logic specifications. in 2016 IEEE 55th Conference on Decision and Control, CDC 2016., 7799279, 2016 IEEE 55th Conference on Decision and Control, CDC 2016, Institute of Electrical and Electronics Engineers Inc., pp. 6565-6570, 55th IEEE Conference on Decision and Control, CDC 2016, Las Vegas, United States, 12/12/16. https://doi.org/10.1109/CDC.2016.7799279

@inproceedings{354f23f88eca4b18893fb99e3fbc237b,

title = "Q-Learning for robust satisfaction of signal temporal logic specifications",

abstract = "This paper addresses the problem of learning optimal policies for satisfying signal temporal logic (STL) specifications by agents with unknown stochastic dynamics. The system is modeled as a Markov decision process, in which the states represent partitions of a continuous space and the transition probabilities are unknown. We formulate two synthesis problems where the desired STL specification is enforced by maximizing the probability of satisfaction, and the expected robustness degree, that is, a measure quantifying the quality of satisfaction. We discuss that Q-learning is not directly applicable to these problems because, based on the quantitative semantics of STL, the probability of satisfaction and expected robustness degree are not in the standard objective form of Q-learning. To resolve this issue, we propose an approximation of STL synthesis problems that can be solved via Q-learning, and we derive some performance bounds for the policies obtained by the approximate approach. The performance of the proposed method is demonstrated via simulations.",

author = "Derya Aksaray and Austin Jones and Zhaodan Kong and Mac Schwager and Calin Belta",

note = "Publisher Copyright: {\textcopyright} 2016 IEEE.; 55th IEEE Conference on Decision and Control, CDC 2016 ; Conference date: 12-12-2016 Through 14-12-2016",

year = "2016",

month = dec,

day = "27",

doi = "10.1109/CDC.2016.7799279",

language = "English (US)",

series = "2016 IEEE 55th Conference on Decision and Control, CDC 2016",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "6565--6570",

booktitle = "2016 IEEE 55th Conference on Decision and Control, CDC 2016",

}

TY - GEN

T1 - Q-Learning for robust satisfaction of signal temporal logic specifications

AU - Aksaray, Derya

AU - Jones, Austin

AU - Kong, Zhaodan

AU - Schwager, Mac

AU - Belta, Calin

PY - 2016/12/27

Y1 - 2016/12/27

N2 - This paper addresses the problem of learning optimal policies for satisfying signal temporal logic (STL) specifications by agents with unknown stochastic dynamics. The system is modeled as a Markov decision process, in which the states represent partitions of a continuous space and the transition probabilities are unknown. We formulate two synthesis problems where the desired STL specification is enforced by maximizing the probability of satisfaction, and the expected robustness degree, that is, a measure quantifying the quality of satisfaction. We discuss that Q-learning is not directly applicable to these problems because, based on the quantitative semantics of STL, the probability of satisfaction and expected robustness degree are not in the standard objective form of Q-learning. To resolve this issue, we propose an approximation of STL synthesis problems that can be solved via Q-learning, and we derive some performance bounds for the policies obtained by the approximate approach. The performance of the proposed method is demonstrated via simulations.

AB - This paper addresses the problem of learning optimal policies for satisfying signal temporal logic (STL) specifications by agents with unknown stochastic dynamics. The system is modeled as a Markov decision process, in which the states represent partitions of a continuous space and the transition probabilities are unknown. We formulate two synthesis problems where the desired STL specification is enforced by maximizing the probability of satisfaction, and the expected robustness degree, that is, a measure quantifying the quality of satisfaction. We discuss that Q-learning is not directly applicable to these problems because, based on the quantitative semantics of STL, the probability of satisfaction and expected robustness degree are not in the standard objective form of Q-learning. To resolve this issue, we propose an approximation of STL synthesis problems that can be solved via Q-learning, and we derive some performance bounds for the policies obtained by the approximate approach. The performance of the proposed method is demonstrated via simulations.

UR - http://www.scopus.com/inward/record.url?scp=85010739515&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85010739515&partnerID=8YFLogxK

U2 - 10.1109/CDC.2016.7799279

DO - 10.1109/CDC.2016.7799279

M3 - Conference contribution

AN - SCOPUS:85010739515

T3 - 2016 IEEE 55th Conference on Decision and Control, CDC 2016

SP - 6565

EP - 6570

BT - 2016 IEEE 55th Conference on Decision and Control, CDC 2016

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 55th IEEE Conference on Decision and Control, CDC 2016

Y2 - 12 December 2016 through 14 December 2016

ER -

Q-Learning for robust satisfaction of signal temporal logic specifications

Abstract

Publication series

Other

Bibliographical note

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this