|
Course description: This course will provide a comprehensive introduction to reinforcement learning, a powerful approach to learning from interaction to achieve goals in stochastic and incompletely-known environments. Reinforcement learning has adapted key ideas from machine learning, operations research, control theory, psychology, and neuroscience to produce some strikingly successful engineering applications. The focus is on algorithms for learning what actions to take, and when to take them, so as to optimize long-term performance. This may involve sacrificing immediate reward to obtain greater reward in the long-term or just to obtain more information about the environment. The course will cover Markov decision processes, dynamic programming, temporal-difference learning, Monte Carlo reinforcement learning methods, eligibility traces, the role of function approximation, and the integration of learning and planning.
Lecture: Tuesday & Thursday 9:30-10:45, CMPS 140
Prerequisites: Interest in learning approaches to artificial intelligence; basic probability theory; computer programming ability. If you have passed Math 515 or equivalent, you have enough basic probility theory. If you have passed a programming course at the level of CMPSCI 287, you have enough programming ability; knowledge of C++ is recommended. Please talk with the instructor if you want to take the course but have doubts about your qualifications.
Credit: 3 units
Instructor: Andrew Barto, barto@cs.umass.edu, 545-2109
Teaching assistant: James Davis, jdavis@cs.umass.edu, 545-1596
Required book: We will be using a textbook by R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction. Cambridge, MA: MIT Press, 1998. (Clicking on the title will take you to a full description of the book, from which you can obtain a detailed look at what will be covered in this course.) The book is available at the textbook annex.
Schedule: Roughly, the plan is to cover one chapter from the book each week, starting with chapter 1 on September 6 and 11, chapter 2 on September 13 and 18, and so on. A detailed schedule will be provided when the class begins.Required work: This will depend on how many students end up taking the course. We will adjust the type of work that you must do when we see how large the class is going to be. BUT, here is the basic idea. There will be a set of exercises for each chapter comprising most of the non-programming exercises in the chapter. These will usually be due on the last day the chapter is covered in class (generally the second day of the chapter). All exercises will be marked and returned to you. Answer sheets for each exercise set will be made available at the end of the class on which the exercises are due. So you have to turn in your exercises on time. You are expected to spend time studying the answers provided. If there is a large class, we may allow these exercises will be done in teams. Or we may just check that you have handed in answers without thoroughly grading all of them.
Programming Exercises: Each student will complete a number of projects requiring programming during the course and for each, will hand in results of their work (details and due dates to be designated). Our goal is to use the Robocup domain as a souce of class projects. Take a look at this web page, following the links to the simulation league. More on this later.
Exams: There will be a closed-book in-class midterm and a closed-book final exam during the exam period.
Grading:this will depend on how we end up organizing the course, once I know how many students will be taking the course. I will post details very shortly.