Table of ContentsChapter 11: Case Studies TD Gammon A Few Details Multi-layer Neural Network Summary of TD-Gammon Results Samuelís Checkers Player Samuelís Backups The Basic Idea More Samuel Details The Acrobot Acrobot Learning Curves for Sarsa(l) Typical Acrobot Learned Behavior Elevator Dispatching Semi-Markov Q-Learning Passenger Arrival Patterns Control Strategies The Elevator Model(from Lewis, 1991) State Space Actions Constraints Performance Criteria Average Squared Wait Time Algorithm Computing Rewards Neural Networks Elevator Results Dynamic Channel Allocation Job-Shop Scheduling |
Author: Andy Barto
Email: barto@cs.umass.edu |