Table of ContentsChapter 7: Eligibility Traces N-step TD Prediction Mathematics of N-step TD Prediction Learning with N-step Backups Random Walk Examples A Larger Example Averaging N-step Returns Forward View of TD(l) l-return Weighting Function Relation to TD(0) and MC Forward View of TD(l) II l-return on the Random Walk Backward View of TD(l) On-line Tabular TD(l) Backward View Relation of Backwards View to MC & TD(0) Forward View = Backward View On-line versus Off-line on Random Walk Control: Sarsa(l) Sarsa(l) Algorithm Sarsa(l) Gridworld Example Three Approaches to Q(l) Watkinsís Q(l) Pengís Q(l) NaÔve Q(l) Comparison Task Comparison Results Convergence of the Q(l)ís Eligibility Traces for Actor-Critic Methods Replacing Traces Replacing Traces Example Why Replacing Traces? More Replacing Traces Implementation Issues Variable l Conclusions Something Here is Not Like the Other |
Author: Andy Barto
Email: barto@cs.umass.edu |