Table of ContentsChapter 2: Evaluative Feedback The n-Armed Bandit Problem The Exploration/Exploitation Dilemma Action-Value Methods e-Greedy Action Selection 10-Armed Testbed e-Greedy Methods on the 10-Armed Testbed Softmax Action Selection Binary Bandit Tasks Contingency Space Linear Learning Automata Performance on Binary Bandit Tasks A and B Incremental Implementation Tracking a Nonstationary Problem Optimistic Initial Values Reinforcement Comparison Performance of a Reinforcement Comparison Method Pursuit Methods Performance of a Pursuit Method Associative Search Conclusions |
Author: Andy Barto
Email: barto@cs.umass.edu |