Randomized allocation with nonparametric estimation for contextual multi-armed bandits with delayed rewards

Sakshi Arya; Yuhong Yang

doi:10.1016/j.spl.2020.108818

Randomized allocation with nonparametric estimation for contextual multi-armed bandits with delayed rewards

Sakshi Arya, Yuhong Yang

Statistics (Twin Cities)

Research output: Contribution to journal › Article › peer-review

6 Scopus citations

Abstract

We study a multi-armed bandit problem with covariates in a setting where there is a possible delay in observing the rewards. Under some reasonable assumptions on the probability distributions for the delays and using an appropriate randomization to select the arms, the proposed strategy is shown to be strongly consistent.

Original language	English (US)
Article number	108818
Journal	Statistics and Probability Letters
Volume	164
DOIs	https://doi.org/10.1016/j.spl.2020.108818
State	Published - Sep 2020

Bibliographical note

Publisher Copyright:
© 2020

Keywords

Delayed rewards
Histogram method
Multi-armed bandit with covariates
Strong consistency

Access

10.1016/j.spl.2020.108818

OpenUrl availability

Full text

Cite this

@article{a42f1211fa69486c870711cb8df3e6e7,

title = "Randomized allocation with nonparametric estimation for contextual multi-armed bandits with delayed rewards",

abstract = "We study a multi-armed bandit problem with covariates in a setting where there is a possible delay in observing the rewards. Under some reasonable assumptions on the probability distributions for the delays and using an appropriate randomization to select the arms, the proposed strategy is shown to be strongly consistent.",

keywords = "Delayed rewards, Histogram method, Multi-armed bandit with covariates, Strong consistency",

author = "Sakshi Arya and Yuhong Yang",

note = "Publisher Copyright: {\textcopyright} 2020",

year = "2020",

month = sep,

doi = "10.1016/j.spl.2020.108818",

language = "English (US)",

volume = "164",

journal = "Statistics and Probability Letters",

issn = "0167-7152",

publisher = "Elsevier",

}

TY - JOUR

T1 - Randomized allocation with nonparametric estimation for contextual multi-armed bandits with delayed rewards

AU - Arya, Sakshi

AU - Yang, Yuhong

PY - 2020/9

Y1 - 2020/9

N2 - We study a multi-armed bandit problem with covariates in a setting where there is a possible delay in observing the rewards. Under some reasonable assumptions on the probability distributions for the delays and using an appropriate randomization to select the arms, the proposed strategy is shown to be strongly consistent.

AB - We study a multi-armed bandit problem with covariates in a setting where there is a possible delay in observing the rewards. Under some reasonable assumptions on the probability distributions for the delays and using an appropriate randomization to select the arms, the proposed strategy is shown to be strongly consistent.

KW - Delayed rewards

KW - Histogram method

KW - Multi-armed bandit with covariates

KW - Strong consistency

UR - http://www.scopus.com/inward/record.url?scp=85085141251&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85085141251&partnerID=8YFLogxK

U2 - 10.1016/j.spl.2020.108818

DO - 10.1016/j.spl.2020.108818

M3 - Article

AN - SCOPUS:85085141251

SN - 0167-7152

VL - 164

JO - Statistics and Probability Letters

JF - Statistics and Probability Letters

M1 - 108818

ER -

Randomized allocation with nonparametric estimation for contextual multi-armed bandits with delayed rewards

Abstract

Bibliographical note

Keywords

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this