CMPSCI 687

Reinforcement Learning

Spring 2006


Course Schedule

(subject to revision!!)

Lecture Date Class topic Reading Homework assigned Homework Due
Lecture 1 [ full | printable ] Tu Jan 31 Introduction and course overview Chapter 1 Exercise Set 1: ex. 1.1-1.5
Lecture 2 [ full | printable ] Th Feb 2 Introduction continued; evaluative feedback Chapter 2
Lecture 3 Tu Feb 7 Evaluative feedback continued Chapter 2 Exercise Set 2: ex. 2.3, 2.4, 2.5, 2.6, 2.8, 2.16, additional exercise Exercise Set 1
Lecture 4 [ full | printable ] Th Feb 9 The reinforcement learning problem Chapter 3 Programming Exercise 1
Lecture 5 Tu Feb 14 The reinforcement learning problem continued Exercise Set 3: ex. 3.2, 3.4, 3.5, 3.8 (omit final part re eq. 3.10), 3.9, 3.10, 3.11, 3.12, 3.13, 3.14, 3.15, 3.17 Exercise Set 2
Lecture 6 [ full | printable ] Th Feb 16 Dynamic programming Chapter 4
Lecture 7 Th Feb 23 Dynamic programming continued Samuel's paper Exercise Set 4: ex. 4.1, 4.2, 4.3, 4.5, 4.9 Exercise Set 3
Lecture 8 [ full | printable ] Tu Feb 28 Monte Carlo methods Chapter 5 Programming Exercise 2 Programming Exercise 1
Lecture 9 Th Mar 2 Monte Carlo methods continued Importance sampling reading Exercise Set 5: ex. 5.1, 5.2, 5.5 Exercise Set 4
Lecture 10 Tu Mar 7 The lost lecture
Lecture 11 [ full | printable ] Th Mar 9 Temporal-difference methods Chapter 6 Exercise Set 6: ex: 6.1, 6.2, 6.8, 6.9, 6.10, 6.12 Exercise Set 5
Lecture 12 Tu Mar 14 Temporal-difference methods continued
Lecture 13 Th Mar 16 Temporal-difference methods and dopamine Schultz, Dayan, & Montague, Redish Exercise Set 6, Programming Exercise 2
Spring Break
Lecture 14 Tu Mar 28 Review for midterm Minsky's paper Programming Exercise 3
Lecture 15 Th Mar 30 In class midterm: Chapters 1 - 6
Lecture 16 [ full | printable ] Tu Apr 4 Eligibility traces Chapter 7 Exercise Set 7: ex. 7.2, 7.6
Lecture 17 [ full | printable ] Th Apr 6 Function approximation Chapter 8 Exercise Set 8: ex. 8.1, 8.2, 8.5, 8.6, 8.7 Exercise Set 7
Lecture 18 Tu Apr 11 Function approximation continued
Lecture 19 [ full | printable ] Th Apr 13 Model-based methods Chapter 9 Exercise Set 9: ex. 9.1, 9.2, 9.3, 9.5 Exercise Set 8
Lecture 20 [ full | printable ] Tu Apr 18 Guest Lecture: Özgür Şimşek. Temporal Abstraction in RL Sutton, Precup, and Singh Short critique of today's reading (not a summary)
Lecture 21 [ full | printable ] Th Apr 20 Guest Lecture: Özgür Şimşek. Intrinsically-Motivated RL Barto, Singh, and Chentanez Short critique of today's reading (not a summary)
Lecture 22 Tu Apr 25 Model-based methods continued Chapter 10 Programming Exercise 4
Lecture 23 [ full | printable ] Th Apr 27 Case studies Chapter 11 Exercise Set 9
Lecture 24 [ full | printable ] Tu May 2 Partial Observability See this site and this paper (though latter is not required) Programming Exercise 3
Lecture 25 [ full | printable ] Th May 4 Policy gradient methods Williams' 1992 REINFORCE paper
Lecture 26 Tu May 9 Policy gradient methods continued Kohl and Stone and Grudic, Kumar, and Ungar critique of at least one of these papers
Lecture 27 Th May 11 Transfer Taylor and Stone and Konidaris and Barto critique of at least one of these papers
Lecture 28 Tu May 16 Review for final exam Programming Exercise 4
FINAL EXAM Mon May 22 Cumulative Final Exam: 8:00 AM LGRT 323