TY - JOUR
T1 - Alterations in choice behavior by manipulations of world model
AU - Green, C. S.
AU - Benson, C.
AU - Kersten, D.
AU - Schrater, P.
PY - 2010/9/14
Y1 - 2010/9/14
N2 - How to compute initially unknown reward values makes up one of the key problems in reinforcement learning theory, with two basic approaches being used. Model-free algorithms rely on the accumulation of substantial amounts of experience to compute the value of actions, whereas in model-based learning, the agent seeks to learn the generative process for outcomes fromwhich the value of actions can be predicted. Here we show that (i) "probability matching" - a consistent example of suboptimal choice behavior seen in humans -occurs in an optimal Bayesian model-based learner using a max decision rule that is initialized with ecologically plausible, but incorrect beliefs about the generative process for outcomes and (ii) human behavior can be strongly and predictably altered by the presence of cues suggestive of various generative processes, despite statistically identical outcome generation. These results suggest human decision making is rational and model based and not consistent with model-free learning.
AB - How to compute initially unknown reward values makes up one of the key problems in reinforcement learning theory, with two basic approaches being used. Model-free algorithms rely on the accumulation of substantial amounts of experience to compute the value of actions, whereas in model-based learning, the agent seeks to learn the generative process for outcomes fromwhich the value of actions can be predicted. Here we show that (i) "probability matching" - a consistent example of suboptimal choice behavior seen in humans -occurs in an optimal Bayesian model-based learner using a max decision rule that is initialized with ecologically plausible, but incorrect beliefs about the generative process for outcomes and (ii) human behavior can be strongly and predictably altered by the presence of cues suggestive of various generative processes, despite statistically identical outcome generation. These results suggest human decision making is rational and model based and not consistent with model-free learning.
KW - Decision making
KW - Probability matching
KW - Reinforcement learning
UR - http://www.scopus.com/inward/record.url?scp=77958005364&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77958005364&partnerID=8YFLogxK
U2 - 10.1073/pnas.1001709107
DO - 10.1073/pnas.1001709107
M3 - Article
C2 - 20805507
AN - SCOPUS:77958005364
SN - 0027-8424
VL - 107
SP - 16401
EP - 16406
JO - Proceedings of the National Academy of Sciences of the United States of America
JF - Proceedings of the National Academy of Sciences of the United States of America
IS - 37
ER -