Alterations in choice behavior by manipulations of world model

C. S. Green; C. Benson; D. Kersten; P. Schrater

doi:10.1073/pnas.1001709107

Alterations in choice behavior by manipulations of world model

C. S. Green, C. Benson, D. Kersten, P. Schrater

Psychology (Twin Cities)

Research output: Contribution to journal › Article › peer-review

74 Scopus citations

Abstract

How to compute initially unknown reward values makes up one of the key problems in reinforcement learning theory, with two basic approaches being used. Model-free algorithms rely on the accumulation of substantial amounts of experience to compute the value of actions, whereas in model-based learning, the agent seeks to learn the generative process for outcomes fromwhich the value of actions can be predicted. Here we show that (i) "probability matching" - a consistent example of suboptimal choice behavior seen in humans -occurs in an optimal Bayesian model-based learner using a max decision rule that is initialized with ecologically plausible, but incorrect beliefs about the generative process for outcomes and (ii) human behavior can be strongly and predictably altered by the presence of cues suggestive of various generative processes, despite statistically identical outcome generation. These results suggest human decision making is rational and model based and not consistent with model-free learning.

Original language	English (US)
Pages (from-to)	16401-16406
Number of pages	6
Journal	Proceedings of the National Academy of Sciences of the United States of America
Volume	107
Issue number	37
DOIs	https://doi.org/10.1073/pnas.1001709107
State	Published - Sep 14 2010

Keywords

Decision making
Probability matching
Reinforcement learning

Access

10.1073/pnas.1001709107

OpenUrl availability

Full text

Cite this

@article{1bb4be35404242a39ced921baec0795c,

title = "Alterations in choice behavior by manipulations of world model",

abstract = "How to compute initially unknown reward values makes up one of the key problems in reinforcement learning theory, with two basic approaches being used. Model-free algorithms rely on the accumulation of substantial amounts of experience to compute the value of actions, whereas in model-based learning, the agent seeks to learn the generative process for outcomes fromwhich the value of actions can be predicted. Here we show that (i) {"}probability matching{"} - a consistent example of suboptimal choice behavior seen in humans -occurs in an optimal Bayesian model-based learner using a max decision rule that is initialized with ecologically plausible, but incorrect beliefs about the generative process for outcomes and (ii) human behavior can be strongly and predictably altered by the presence of cues suggestive of various generative processes, despite statistically identical outcome generation. These results suggest human decision making is rational and model based and not consistent with model-free learning.",

keywords = "Decision making, Probability matching, Reinforcement learning",

author = "Green, {C. S.} and C. Benson and D. Kersten and P. Schrater",

year = "2010",

month = sep,

day = "14",

doi = "10.1073/pnas.1001709107",

language = "English (US)",

volume = "107",

pages = "16401--16406",

journal = "Proceedings of the National Academy of Sciences of the United States of America",

issn = "0027-8424",

publisher = "National Academy of Sciences",

number = "37",

}

TY - JOUR

T1 - Alterations in choice behavior by manipulations of world model

AU - Green, C. S.

AU - Benson, C.

AU - Kersten, D.

AU - Schrater, P.

PY - 2010/9/14

Y1 - 2010/9/14

N2 - How to compute initially unknown reward values makes up one of the key problems in reinforcement learning theory, with two basic approaches being used. Model-free algorithms rely on the accumulation of substantial amounts of experience to compute the value of actions, whereas in model-based learning, the agent seeks to learn the generative process for outcomes fromwhich the value of actions can be predicted. Here we show that (i) "probability matching" - a consistent example of suboptimal choice behavior seen in humans -occurs in an optimal Bayesian model-based learner using a max decision rule that is initialized with ecologically plausible, but incorrect beliefs about the generative process for outcomes and (ii) human behavior can be strongly and predictably altered by the presence of cues suggestive of various generative processes, despite statistically identical outcome generation. These results suggest human decision making is rational and model based and not consistent with model-free learning.

AB - How to compute initially unknown reward values makes up one of the key problems in reinforcement learning theory, with two basic approaches being used. Model-free algorithms rely on the accumulation of substantial amounts of experience to compute the value of actions, whereas in model-based learning, the agent seeks to learn the generative process for outcomes fromwhich the value of actions can be predicted. Here we show that (i) "probability matching" - a consistent example of suboptimal choice behavior seen in humans -occurs in an optimal Bayesian model-based learner using a max decision rule that is initialized with ecologically plausible, but incorrect beliefs about the generative process for outcomes and (ii) human behavior can be strongly and predictably altered by the presence of cues suggestive of various generative processes, despite statistically identical outcome generation. These results suggest human decision making is rational and model based and not consistent with model-free learning.

KW - Decision making

KW - Probability matching

KW - Reinforcement learning

UR - http://www.scopus.com/inward/record.url?scp=77958005364&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77958005364&partnerID=8YFLogxK

U2 - 10.1073/pnas.1001709107

DO - 10.1073/pnas.1001709107

M3 - Article

C2 - 20805507

AN - SCOPUS:77958005364

SN - 0027-8424

VL - 107

SP - 16401

EP - 16406

JO - Proceedings of the National Academy of Sciences of the United States of America

JF - Proceedings of the National Academy of Sciences of the United States of America

IS - 37

ER -

Alterations in choice behavior by manipulations of world model

Abstract

Keywords

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this