Provably efficient neural GTD algorithm for off-policy learning

Hoi To Wai; Zhuoran Yang; Zhaoran Wang; Mingyi Hong

Provably efficient neural GTD algorithm for off-policy learning

Hoi To Wai, Zhuoran Yang, Zhaoran Wang, Mingyi Hong

Electrical and Computer Engineering

Research output: Contribution to journal › Conference article › peer-review

3 Scopus citations

Abstract

This paper studies a gradient temporal difference (GTD) algorithm using neural network (NN) function approximators to minimize the mean squared Bellman error (MSBE). For off-policy learning, we show that the minimum MSBE problem can be recast into a min-max optimization involving a pair of over-parameterized primal-dual NNs. The resultant formulation can then be tackled using a neural GTD algorithm. We analyze the convergence of the proposed algorithm with a 2-layer ReLU NN architecture using m neurons and prove that it computes an approximate optimal solution to the minimum MSBE problem as m ! 1.

Original language	English (US)
Journal	Advances in Neural Information Processing Systems
Volume	2020-December
State	Published - 2020
Event	34th Conference on Neural Information Processing Systems, NeurIPS 2020 - Virtual, Online Duration: Dec 6 2020 → Dec 12 2020

Bibliographical note

Funding Information:
Acknowledgement & Funding Disclosure The authors would like to thank Mr. Alan Lun (CUHK) for conducting the preliminary numerical experiments in this paper. H.-T. Wai is supported by the CUHK Direct Grant #4055113. M. Hong is supported in part by NSF under Grant CCF-1651825, CMMI-172775, CIF-1910385 and by AFOSR under grant 19RT0424.

Publisher Copyright:
© 2020 Neural information processing systems foundation. All rights reserved.

OpenUrl availability

Full text

Cite this

@article{85c4d422cc044bfaaec0e09048861b1c,

title = "Provably efficient neural GTD algorithm for off-policy learning",

abstract = "This paper studies a gradient temporal difference (GTD) algorithm using neural network (NN) function approximators to minimize the mean squared Bellman error (MSBE). For off-policy learning, we show that the minimum MSBE problem can be recast into a min-max optimization involving a pair of over-parameterized primal-dual NNs. The resultant formulation can then be tackled using a neural GTD algorithm. We analyze the convergence of the proposed algorithm with a 2-layer ReLU NN architecture using m neurons and prove that it computes an approximate optimal solution to the minimum MSBE problem as m ! 1.",

author = "Wai, {Hoi To} and Zhuoran Yang and Zhaoran Wang and Mingyi Hong",

note = "Funding Information: Acknowledgement & Funding Disclosure The authors would like to thank Mr. Alan Lun (CUHK) for conducting the preliminary numerical experiments in this paper. H.-T. Wai is supported by the CUHK Direct Grant #4055113. M. Hong is supported in part by NSF under Grant CCF-1651825, CMMI-172775, CIF-1910385 and by AFOSR under grant 19RT0424. Publisher Copyright: {\textcopyright} 2020 Neural information processing systems foundation. All rights reserved.; 34th Conference on Neural Information Processing Systems, NeurIPS 2020 ; Conference date: 06-12-2020 Through 12-12-2020",

year = "2020",

language = "English (US)",

volume = "2020-December",

journal = "Advances in Neural Information Processing Systems",

issn = "1049-5258",

}

TY - JOUR

T1 - Provably efficient neural GTD algorithm for off-policy learning

AU - Wai, Hoi To

AU - Yang, Zhuoran

AU - Wang, Zhaoran

AU - Hong, Mingyi

N1 - Funding Information: Acknowledgement & Funding Disclosure The authors would like to thank Mr. Alan Lun (CUHK) for conducting the preliminary numerical experiments in this paper. H.-T. Wai is supported by the CUHK Direct Grant #4055113. M. Hong is supported in part by NSF under Grant CCF-1651825, CMMI-172775, CIF-1910385 and by AFOSR under grant 19RT0424. Publisher Copyright: © 2020 Neural information processing systems foundation. All rights reserved.

PY - 2020

Y1 - 2020

N2 - This paper studies a gradient temporal difference (GTD) algorithm using neural network (NN) function approximators to minimize the mean squared Bellman error (MSBE). For off-policy learning, we show that the minimum MSBE problem can be recast into a min-max optimization involving a pair of over-parameterized primal-dual NNs. The resultant formulation can then be tackled using a neural GTD algorithm. We analyze the convergence of the proposed algorithm with a 2-layer ReLU NN architecture using m neurons and prove that it computes an approximate optimal solution to the minimum MSBE problem as m ! 1.

AB - This paper studies a gradient temporal difference (GTD) algorithm using neural network (NN) function approximators to minimize the mean squared Bellman error (MSBE). For off-policy learning, we show that the minimum MSBE problem can be recast into a min-max optimization involving a pair of over-parameterized primal-dual NNs. The resultant formulation can then be tackled using a neural GTD algorithm. We analyze the convergence of the proposed algorithm with a 2-layer ReLU NN architecture using m neurons and prove that it computes an approximate optimal solution to the minimum MSBE problem as m ! 1.

UR - http://www.scopus.com/inward/record.url?scp=85108419441&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85108419441&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:85108419441

SN - 1049-5258

VL - 2020-December

JO - Advances in Neural Information Processing Systems

JF - Advances in Neural Information Processing Systems

T2 - 34th Conference on Neural Information Processing Systems, NeurIPS 2020

Y2 - 6 December 2020 through 12 December 2020

ER -

Provably efficient neural GTD algorithm for off-policy learning

Abstract

Bibliographical note

OpenUrl availability

Other files and links

Fingerprint

Cite this