Chapter 6: Temporal Difference Learning

10/14/99

Click here to start

Table of Contents

Chapter 6: Temporal Difference Learning

TD Prediction

Simple Monte Carlo

Simplest TD Method

cf. Dynamic Programming

TD Bootstraps and Samples

Example: Driving Home

Driving Home

Advantages of TD Learning

Random Walk Example

TD and MC on the Random Walk

Optimality of TD(0)

Random Walk under Batch Updating

You are the Predictor

You are the Predictor

You are the Predictor

Learning An Action-Value Function

Sarsa: On-Policy TD Control

Windy Gridworld

Results of Sarsa on the Windy Gridworld

Q-Learning: Off-Ploicy TD Control

Cliffwalking

Actor-Critic Methods

Actor-Critic Details

Dopamine Neurons and TD Error

Average Reward Per Time Step

R-Learning

Access-Control Queuing Task

Afterstates

Summary

Author: Andy Barto

Email: barto@cs.umass.edu

Download presentation source