Table of ContentsChapter 6: Temporal Difference Learning TD Prediction Simple Monte Carlo Simplest TD Method cf. Dynamic Programming TD Bootstraps and Samples Example: Driving Home Driving Home Advantages of TD Learning Random Walk Example TD and MC on the Random Walk Optimality of TD(0) Random Walk under Batch Updating You are the Predictor You are the Predictor You are the Predictor Learning An Action-Value Function Sarsa: On-Policy TD Control Windy Gridworld Results of Sarsa on the Windy Gridworld Q-Learning: Off-Ploicy TD Control Cliffwalking Actor-Critic Methods Actor-Critic Details Dopamine Neurons and TD Error Average Reward Per Time Step R-Learning Access-Control Queuing Task Afterstates Summary |
Author: Andy Barto
Email: barto@cs.umass.edu |