Table of ContentsChapter 3: The Reinforcement Learning Problem The Agent-Environment Interface The Agent Learns a Policy Getting the Degree of Abstraction Right Goals and Rewards Returns Returns for Continuing Tasks An Example Another Example A Unified Notation The Markov Property Markov Decision Processes An Example Finite MDP Recycling Robot MDP Value Functions Bellman Equation for a Policy p More on the Bellman Equation Gridworld Golf Optimal Value Functions Optimal Value Function for Golf Bellman Optimality Equation for V* Bellman Optimality Equation for Q* Why Optimal State-Value Functions are Useful What About Optimal Action-Value Functions? Solving the Bellman Optimality Equation Summary |
Author: Andy Barto
Email: barto@cs.umass.edu |