Chapter 3: The Reinforcement Learning Problem

9/21/99

Click here to start

Table of Contents

Chapter 3: The Reinforcement Learning Problem

The Agent-Environment Interface

The Agent Learns a Policy

Getting the Degree of Abstraction Right

Goals and Rewards

Returns

Returns for Continuing Tasks

An Example

Another Example

A Unified Notation

The Markov Property

Markov Decision Processes

An Example Finite MDP

Recycling Robot MDP

Value Functions

Bellman Equation for a Policy p

More on the Bellman Equation

Gridworld

Golf

Optimal Value Functions

Optimal Value Function for Golf

Bellman Optimality Equation for V*

Bellman Optimality Equation for Q*

Why Optimal State-Value Functions are Useful

What About Optimal Action-Value Functions?

Solving the Bellman Optimality Equation

Summary

Author: Andy Barto

Email: barto@cs.umass.edu

Download presentation source