Table of ContentsChapter 9: Planning and Learning Models Planning Planning Cont. Learning, Planning, and Acting Direct vs. Indirect RL The Dyna Architecture (Sutton 1990) The Dyna-Q Algorithm Dyna-Q on a Simple Maze Dyna-Q Snapshots: Midway in 2nd Episode When the Model is Wrong: Blocking Maze Shortcut Maze What is Dyna-Q ? Prioritized Sweeping Prioritized Sweeping Prioritized Sweeping vs. Dyna-Q Rod Maneuvering (Moore and Atkeson 1993) Full and Sample (One-Step) Backups Full vs. Sample Backups Trajectory Sampling Trajectory Sampling Experiment Heuristic Search Summary |
Author: Andy Barto
Email: barto@cs.umass.edu |