687 Lectures

CMPSCI 687

Reinforcement Learning

Spring 2006

Course Schedule

(subject to revision!!)

Lecture	Date	Class topic	Reading	Homework assigned	Homework Due
Lecture 1 [ full \| printable ]	Tu Jan 31	Introduction and course overview	Chapter 1	Exercise Set 1: ex. 1.1-1.5
Lecture 2 [ full \| printable ]	Th Feb 2	Introduction continued; evaluative feedback	Chapter 2
Lecture 3	Tu Feb 7	Evaluative feedback continued	Chapter 2	Exercise Set 2: ex. 2.3, 2.4, 2.5, 2.6, 2.8, 2.16, additional exercise	Exercise Set 1
Lecture 4 [ full \| printable ]	Th Feb 9	The reinforcement learning problem	Chapter 3	Programming Exercise 1
Lecture 5	Tu Feb 14	The reinforcement learning problem continued		Exercise Set 3: ex. 3.2, 3.4, 3.5, 3.8 (omit final part re eq. 3.10), 3.9, 3.10, 3.11, 3.12, 3.13, 3.14, 3.15, 3.17	Exercise Set 2
Lecture 6 [ full \| printable ]	Th Feb 16	Dynamic programming	Chapter 4
Lecture 7	Th Feb 23	Dynamic programming continued	Samuel's paper	Exercise Set 4: ex. 4.1, 4.2, 4.3, 4.5, 4.9	Exercise Set 3
Lecture 8 [ full \| printable ]	Tu Feb 28	Monte Carlo methods	Chapter 5	Programming Exercise 2	Programming Exercise 1
Lecture 9	Th Mar 2	Monte Carlo methods continued	Importance sampling reading	Exercise Set 5: ex. 5.1, 5.2, 5.5	Exercise Set 4
Lecture 10	Tu Mar 7	The lost lecture
Lecture 11 [ full \| printable ]	Th Mar 9	Temporal-difference methods	Chapter 6	Exercise Set 6: ex: 6.1, 6.2, 6.8, 6.9, 6.10, 6.12	Exercise Set 5
Lecture 12	Tu Mar 14	Temporal-difference methods continued
Lecture 13	Th Mar 16	Temporal-difference methods and dopamine	Schultz, Dayan, & Montague, Redish		Exercise Set 6, Programming Exercise 2
Spring Break
Lecture 14	Tu Mar 28	Review for midterm	Minsky's paper	Programming Exercise 3
Lecture 15	Th Mar 30	In class midterm: Chapters 1 - 6
Lecture 16 [ full \| printable ]	Tu Apr 4	Eligibility traces	Chapter 7	Exercise Set 7: ex. 7.2, 7.6
Lecture 17 [ full \| printable ]	Th Apr 6	Function approximation	Chapter 8	Exercise Set 8: ex. 8.1, 8.2, 8.5, 8.6, 8.7	Exercise Set 7
Lecture 18	Tu Apr 11	Function approximation continued
Lecture 19 [ full \| printable ]	Th Apr 13	Model-based methods	Chapter 9	Exercise Set 9: ex. 9.1, 9.2, 9.3, 9.5	Exercise Set 8
Lecture 20 [ full \| printable ]	Tu Apr 18	Guest Lecture: Özgür Şimşek. Temporal Abstraction in RL	Sutton, Precup, and Singh		Short critique of today's reading (not a summary)
Lecture 21 [ full \| printable ]	Th Apr 20	Guest Lecture: Özgür Şimşek. Intrinsically-Motivated RL	Barto, Singh, and Chentanez		Short critique of today's reading (not a summary)
Lecture 22	Tu Apr 25	Model-based methods continued	Chapter 10	Programming Exercise 4
Lecture 23 [ full \| printable ]	Th Apr 27	Case studies	Chapter 11		Exercise Set 9
Lecture 24 [ full \| printable ]	Tu May 2	Partial Observability	See this site and this paper (though latter is not required)		Programming Exercise 3
Lecture 25 [ full \| printable ]	Th May 4	Policy gradient methods	Williams' 1992 REINFORCE paper
Lecture 26	Tu May 9	Policy gradient methods continued	Kohl and Stone and Grudic, Kumar, and Ungar	critique of at least one of these papers
Lecture 27	Th May 11	Transfer	Taylor and Stone and Konidaris and Barto	critique of at least one of these papers
Lecture 28	Tu May 16	Review for final exam			Programming Exercise 4
FINAL EXAM	Mon May 22	Cumulative Final Exam: 8:00 AM LGRT 323