687 Info

CMPSCI 687

Reinforcement Learning

Spring 2006

Course Information

This course will provide a comprehensive introduction to reinforcement learning, a powerful approach to learning from interaction to achieve goals in stochastic and incompletely-known environments. Reinforcement learning has adapted key ideas from machine learning, operations research, control theory, psychology, and neuroscience to produce some strikingly successful engineering applications. The focus is on algorithms for learning what actions to take, and when to take them, so as to optimize long-term performance. This may involve sacrificing immediate reward to obtain greater reward in the long-term or just to obtain more information about the environment. The course will cover Markov decision processes, dynamic programming, temporal-difference learning, Monte Carlo reinforcement learning methods, eligibility traces, the role of function approximation, and the integration of learning and planning. We will also introduce policy gradient methods, methods for partially observable problems, hierarchical learning, and connections to the brain's reward systems.

Lecture: Tuesday & Thursday 9:30-10:45, CMPS 150

Prerequisites: Interest in learning approaches to artificial intelligence; basic probability theory; computer programming ability. If you have passed Math 515 or equivalent, you have enough basic probility theory. If you have passed a programming course at the level of CMPSCI 287, you have enough programming ability; knowledge of C++ is recommended. Please talk with the instructor if you want to take the course but have doubts about your qualifications.

Credit: 3 units

Instructor: Andrew Barto, barto [at] cs [dot] umass [dot] edu, 545-2109

Office hours: Tuesdays 11:00-12:00 except on 2/14, 3/14, 4/11, 4/18, 5/9 and Wednesdays 1:30-3:00 except on 4/19 (and not during Spring Break) CMPS 272

Teaching assistant: Andrew Stout, [andrew's last name]@cs.umass.edu

Office hours: TBA

Required book: We will be using a textbook by R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction. Cambridge, MA: MIT Press, 1998. Clicking on the title will take you to a full description of the book, from which you can obtain a detailed look at what will be covered in this course. I did not order the book through the Textbook Annex or a local bookstore. The full text of the book is on the web, so you don't really need to buy the book (though the book would be more convenient!)

The Plan: The plan is to cover the complete contents of the book, plus supplementary readings that will be made available when needed. Some of these will be assigned. The course schedule will indicate when you should be finished reading each of those. Others are suggested readings. See the detailed schedule by clicking here or the Schedule link at the bottom of the page. The schedule is subject to revision!

Required work:

Written exercises: Several exercise sets will be assigned. Most but not all of the exercises will be found in the textbook. The assignments and their due dates are indicated on the schedule. Hand in paper versions at the beginning of class the day they are due. All exercises will be marked and returned to you. In most cases, answer sheets for each exercise set will be made available at the end of the class on which the exercise is due. So you have to turn in your exercises on time. You are expected to spend time studying the answers provided. Since we have a large class, you should work in teams of two on these exercise sets. That is, one paper for each team will be handed in for each exercise set. If no consensus can be reached within a team as the answer of a question, individuals may hand in separate answers. You will be asked to inform the TA as to the composition of the teams, and we expect that teams will ordinarily remain the same throughout the term.
Programming Exercises: Each team of two will complete four exercises requiring programming and for each will hand in results of their work on paper at the beginning of the class the day they are due. The programming assignments and their due dates are indicated on the schedule. Assignment details will appear here or by clicking the Homework link at the bottom of the page.
Exams: There will be a closed-book in-class midterm and a closed-book final exam during the exam period. The midterm date is on the schedule. The final exam date has been announced: Monday May 22, 8:00 am, LGRT323. If you have an unavoidable conflict with the final exam date, you need to e-mail me at least two weeks before the exam.

Grading:

Midterm (20%), Final (25%)
Written homeworks (30%)
Programming homeworks (25%)

Related Courses Elsewhere

Rich Sutton's course at the University of Alberta

There is a lot of useful information on Rich's site, including information about RL python toolkit.

Peter Stone's course at the University of Texas at Austin

a student's ideas on exercise 7.3

Yishay Mansour's course at Tel Aviv University
Nikos Vlassis's course at the University of Amsterdam
USC
Marco Wiering's course at Utrecht University
B. Ravindran's course at the Indian Institute of Technology Madras

Another Really Useful Link

Worldwide reinforcement learning research