Bisimulation for Markov decision processes through families of functional expressions

Norm Ferns; Doina Precup; Sophia Knight

doi:10.1007/978-3-319-06880-0_17

Bisimulation for Markov decision processes through families of functional expressions

Norm Ferns, Doina Precup, Sophia Knight

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

9 Scopus citations

Abstract

We transfer a notion of quantitative bisimilarity for labelled Markov processes [1] to Markov decision processes with continuous state spaces. This notion takes the form of a pseudometric on the system states, cast in terms of the equivalence of a family of functional expressions evaluated on those states and interpreted as a real-valued modal logic. Our proof amounts to a slight modification of previous techniques [2,3] used to prove equivalence with a fixed-point pseudometric on the state-space of a labelled Markov process and making heavy use of the Kantorovich probability metric. Indeed, we again demonstrate equivalence with a fixed-point pseudometric defined on Markov decision processes [4]; what is novel is that we recast this proof in terms of integral probability metrics [5] defined through the family of functional expressions, shifting emphasis back to properties of such families. The hope is that a judicious choice of family might lead to something more computationally tractable than bisimilarity whilst maintaining its pleasing theoretical guarantees. Moreover, we use a trick from descriptive set theory to extend our results to MDPs with bounded measurable reward functions, dropping a previous continuity constraint on rewards and Markov kernels.

Original language	English (US)
Title of host publication	Horizons of the Mind
Subtitle of host publication	A Tribute to Prakash Panangaden - Essays Dedicated to Prakash Panangaden on the Occasion of His 60th Birthday
Publisher	Springer Verlag
Pages	319-342
Number of pages	24
ISBN (Print)	9783319068794
DOIs	https://doi.org/10.1007/978-3-319-06880-0_17
State	Published - 2014
Externally published	Yes
Event	PrakashFest Conference - Oxford, United Kingdom Duration: May 19 2014 → May 22 2014

Publication series

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume	8464 LNCS
ISSN (Print)	0302-9743
ISSN (Electronic)	1611-3349

Other

Other	PrakashFest Conference
Country/Territory	United Kingdom
City	Oxford
Period	5/19/14 → 5/22/14

Access

10.1007/978-3-319-06880-0_17

OpenUrl availability

Full text

Cite this

Ferns, N., Precup, D., & Knight, S. (2014). Bisimulation for Markov decision processes through families of functional expressions. In Horizons of the Mind: A Tribute to Prakash Panangaden - Essays Dedicated to Prakash Panangaden on the Occasion of His 60th Birthday (pp. 319-342). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 8464 LNCS). Springer Verlag. https://doi.org/10.1007/978-3-319-06880-0_17

Bisimulation for Markov decision processes through families of functional expressions. / Ferns, Norm; Precup, Doina; Knight, Sophia.
Horizons of the Mind: A Tribute to Prakash Panangaden - Essays Dedicated to Prakash Panangaden on the Occasion of His 60th Birthday. Springer Verlag, 2014. p. 319-342 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 8464 LNCS).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Ferns, N, Precup, D & Knight, S 2014, Bisimulation for Markov decision processes through families of functional expressions. in Horizons of the Mind: A Tribute to Prakash Panangaden - Essays Dedicated to Prakash Panangaden on the Occasion of His 60th Birthday. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 8464 LNCS, Springer Verlag, pp. 319-342, PrakashFest Conference, Oxford, United Kingdom, 5/19/14. https://doi.org/10.1007/978-3-319-06880-0_17

Ferns N, Precup D, Knight S. Bisimulation for Markov decision processes through families of functional expressions. In Horizons of the Mind: A Tribute to Prakash Panangaden - Essays Dedicated to Prakash Panangaden on the Occasion of His 60th Birthday. Springer Verlag. 2014. p. 319-342. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-319-06880-0_17

Ferns, Norm ; Precup, Doina ; Knight, Sophia. / Bisimulation for Markov decision processes through families of functional expressions. Horizons of the Mind: A Tribute to Prakash Panangaden - Essays Dedicated to Prakash Panangaden on the Occasion of His 60th Birthday. Springer Verlag, 2014. pp. 319-342 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{c3e1db73a1ce44c3a43c7bbfd4aa0635,

title = "Bisimulation for Markov decision processes through families of functional expressions",

abstract = "We transfer a notion of quantitative bisimilarity for labelled Markov processes [1] to Markov decision processes with continuous state spaces. This notion takes the form of a pseudometric on the system states, cast in terms of the equivalence of a family of functional expressions evaluated on those states and interpreted as a real-valued modal logic. Our proof amounts to a slight modification of previous techniques [2,3] used to prove equivalence with a fixed-point pseudometric on the state-space of a labelled Markov process and making heavy use of the Kantorovich probability metric. Indeed, we again demonstrate equivalence with a fixed-point pseudometric defined on Markov decision processes [4]; what is novel is that we recast this proof in terms of integral probability metrics [5] defined through the family of functional expressions, shifting emphasis back to properties of such families. The hope is that a judicious choice of family might lead to something more computationally tractable than bisimilarity whilst maintaining its pleasing theoretical guarantees. Moreover, we use a trick from descriptive set theory to extend our results to MDPs with bounded measurable reward functions, dropping a previous continuity constraint on rewards and Markov kernels.",

author = "Norm Ferns and Doina Precup and Sophia Knight",

year = "2014",

doi = "10.1007/978-3-319-06880-0_17",

language = "English (US)",

isbn = "9783319068794",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer Verlag",

pages = "319--342",

booktitle = "Horizons of the Mind",

note = "PrakashFest Conference ; Conference date: 19-05-2014 Through 22-05-2014",

}

TY - GEN

T1 - Bisimulation for Markov decision processes through families of functional expressions

AU - Ferns, Norm

AU - Precup, Doina

AU - Knight, Sophia

PY - 2014

Y1 - 2014

N2 - We transfer a notion of quantitative bisimilarity for labelled Markov processes [1] to Markov decision processes with continuous state spaces. This notion takes the form of a pseudometric on the system states, cast in terms of the equivalence of a family of functional expressions evaluated on those states and interpreted as a real-valued modal logic. Our proof amounts to a slight modification of previous techniques [2,3] used to prove equivalence with a fixed-point pseudometric on the state-space of a labelled Markov process and making heavy use of the Kantorovich probability metric. Indeed, we again demonstrate equivalence with a fixed-point pseudometric defined on Markov decision processes [4]; what is novel is that we recast this proof in terms of integral probability metrics [5] defined through the family of functional expressions, shifting emphasis back to properties of such families. The hope is that a judicious choice of family might lead to something more computationally tractable than bisimilarity whilst maintaining its pleasing theoretical guarantees. Moreover, we use a trick from descriptive set theory to extend our results to MDPs with bounded measurable reward functions, dropping a previous continuity constraint on rewards and Markov kernels.

AB - We transfer a notion of quantitative bisimilarity for labelled Markov processes [1] to Markov decision processes with continuous state spaces. This notion takes the form of a pseudometric on the system states, cast in terms of the equivalence of a family of functional expressions evaluated on those states and interpreted as a real-valued modal logic. Our proof amounts to a slight modification of previous techniques [2,3] used to prove equivalence with a fixed-point pseudometric on the state-space of a labelled Markov process and making heavy use of the Kantorovich probability metric. Indeed, we again demonstrate equivalence with a fixed-point pseudometric defined on Markov decision processes [4]; what is novel is that we recast this proof in terms of integral probability metrics [5] defined through the family of functional expressions, shifting emphasis back to properties of such families. The hope is that a judicious choice of family might lead to something more computationally tractable than bisimilarity whilst maintaining its pleasing theoretical guarantees. Moreover, we use a trick from descriptive set theory to extend our results to MDPs with bounded measurable reward functions, dropping a previous continuity constraint on rewards and Markov kernels.

UR - http://www.scopus.com/inward/record.url?scp=84902476531&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84902476531&partnerID=8YFLogxK

U2 - 10.1007/978-3-319-06880-0_17

DO - 10.1007/978-3-319-06880-0_17

M3 - Conference contribution

AN - SCOPUS:84902476531

SN - 9783319068794

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 319

EP - 342

BT - Horizons of the Mind

PB - Springer Verlag

T2 - PrakashFest Conference

Y2 - 19 May 2014 through 22 May 2014

ER -

Bisimulation for Markov decision processes through families of functional expressions

Abstract

Publication series

Other

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this