687 Info

CMPSCI 687 (691N)

Reinforcement Learning

Fall 2000

Course Information

Course description: This course will provide a comprehensive introduction to reinforcement learning, a powerful approach to learning from interaction to achieve goals in stochastic and uncertain environments. Reinforcement learning has adapted key ideas from machine learning, operations research, control theory, psychology, and neuroscience to produce some strikingly successful engineering applications. The focus is on algorithms that learn what actions to take, and when to take them, so as to optimize long-term performance. This may involve sacrificing immediate reward to obtain greater reward in the long-term or just to obtain more information about the environment. The course will cover Markov decision processes, dynamic programming, temporal-difference learning, Monte Carlo reinforcement learning methods, eligibility traces, the role of neural networks, and the integration of learning and planning.

Lecture: Tuesday & Thursday 9:30-10:45, LGRC A339

Prerequisites: Interest in learning approaches to artificial intelligence; basic probability theory; computer programming ability. If you have passed Math 515 or equivalent, you have enough basic probility theory. If you have passed a programming course at the level of CMPSCI 287, you have enough programming ability; knowledge of any reasonably-standard programming language is fine. Please talk with the instructor if you want to take the course but have doubts about your qualifications; it may suffice to do some background reading.

Credit: 3 units

Instructor: Andrew Barto, barto@cs.umass.edu, 545-2109

Office hours: To be announced, or by appointment, CS 276

Teaching assistant: To be announced

Required book: We will be using a textbook by R. S. Sutton and the instructor: Reinforcement Learning: An Introduction. Cambridge, MA: MIT Press, 1998. (Clicking on the title will take you to a full description of the book, from which you can obtain a detailed look at what will be covered in this course.) The book is available at the textbook annex.

Schedule: Roughly, the plan is to cover one chapter from the book each week, starting with chapter 1 on September 7 and 12, chapter 2 on September 14 and 19, and so on. Click Lectures below for a detailed schedule.

Written Exercises: There will be a set of exercises for each chapter comprising the most of the non-programming exercises in the chapter. These will be due of the last day the chapter is covered in class (generally the second day of the chapter). All exercises will be marked and returned to you. Answer sheets for each week's exercises will be made available at the class on which the exercises are due. So you have turn in your exercises on time!

Programming Exercises: Each student will complete a number of programming exercises during the course. Some of these may be programming exercises from the book, but others will be assigned as well. Students will implement most of the algorithms in the context of constructing a learning "agent". The focus will be on trying out various learning algorithms, not on elaborate programs. We will allow time for students to present the results of these exercises in class as appropriate.

Exams: There will be a closed-book in-class midterm and a closed-book final exam during the exam period.

Grading:

Midterm (10%), Final (15%)
Written homeworks (40%)
Programming homeworks (30%)
Class Participation (5%)