|
Lecture | Date | Class topic | Reading | Homework assigned | Homework Due |
---|---|---|---|---|---|
Lecture 1 [ full | printable ] | Tu Jan 31 | Introduction and course overview | Chapter 1 | Exercise Set 1: ex. 1.1-1.5 | |
Lecture 2 [ full | printable ] | Th Feb 2 | Introduction continued; evaluative feedback | Chapter 2 | ||
Lecture 3 | Tu Feb 7 | Evaluative feedback continued | Chapter 2 | Exercise Set 2: ex. 2.3, 2.4, 2.5, 2.6, 2.8, 2.16, additional exercise | Exercise Set 1 |
Lecture 4 [ full | printable ] | Th Feb 9 | The reinforcement learning problem | Chapter 3 | Programming Exercise 1 | |
Lecture 5 | Tu Feb 14 | The reinforcement learning problem continued | Exercise Set 3: ex. 3.2, 3.4, 3.5, 3.8 (omit final part re eq. 3.10), 3.9, 3.10, 3.11, 3.12, 3.13, 3.14, 3.15, 3.17 | Exercise Set 2 | |
Lecture 6 [ full | printable ] | Th Feb 16 | Dynamic programming | Chapter 4 | ||
Lecture 7 | Th Feb 23 | Dynamic programming continued | Samuel's paper | Exercise Set 4: ex. 4.1, 4.2, 4.3, 4.5, 4.9 | Exercise Set 3 |
Lecture 8 [ full | printable ] | Tu Feb 28 | Monte Carlo methods | Chapter 5 | Programming Exercise 2 | Programming Exercise 1 |
Lecture 9 | Th Mar 2 | Monte Carlo methods continued | Importance sampling reading | Exercise Set 5: ex. 5.1, 5.2, 5.5 | Exercise Set 4 |
Lecture 10 | Tu Mar 7 | The lost lecture | |||
Lecture 11 [ full | printable ] | Th Mar 9 | Temporal-difference methods | Chapter 6 | Exercise Set 6: ex: 6.1, 6.2, 6.8, 6.9, 6.10, 6.12 | Exercise Set 5 |
Lecture 12 | Tu Mar 14 | Temporal-difference methods continued | |||
Lecture 13 | Th Mar 16 | Temporal-difference methods and dopamine | Schultz, Dayan, & Montague, Redish | Exercise Set 6, Programming Exercise 2 | |
Spring Break | |||||
Lecture 14 | Tu Mar 28 | Review for midterm | Minsky's paper | Programming Exercise 3 | |
Lecture 15 | Th Mar 30 | In class midterm: Chapters 1 - 6 | |||
Lecture 16 [ full | printable ] | Tu Apr 4 | Eligibility traces | Chapter 7 | Exercise Set 7: ex. 7.2, 7.6 | |
Lecture 17 [ full | printable ] | Th Apr 6 | Function approximation | Chapter 8 | Exercise Set 8: ex. 8.1, 8.2, 8.5, 8.6, 8.7 | Exercise Set 7 |
Lecture 18 | Tu Apr 11 | Function approximation continued | |||
Lecture 19 [ full | printable ] | Th Apr 13 | Model-based methods | Chapter 9 | Exercise Set 9: ex. 9.1, 9.2, 9.3, 9.5 | Exercise Set 8 |
Lecture 20 [ full | printable ] | Tu Apr 18 | Guest Lecture: Özgür Şimşek. Temporal Abstraction in RL | Sutton, Precup, and Singh | Short critique of today's reading (not a summary) | |
Lecture 21 [ full | printable ] | Th Apr 20 | Guest Lecture: Özgür Şimşek. Intrinsically-Motivated RL | Barto, Singh, and Chentanez | Short critique of today's reading (not a summary) | |
Lecture 22 | Tu Apr 25 | Model-based methods continued | Chapter 10 | Programming Exercise 4 | |
Lecture 23 [ full | printable ] | Th Apr 27 | Case studies | Chapter 11 | Exercise Set 9 | |
Lecture 24 [ full | printable ] | Tu May 2 | Partial Observability | See this site and this paper (though latter is not required) | Programming Exercise 3 | |
Lecture 25 [ full | printable ] | Th May 4 | Policy gradient methods | Williams' 1992 REINFORCE paper | ||
Lecture 26 | Tu May 9 | Policy gradient methods continued | Kohl and Stone and Grudic, Kumar, and Ungar | critique of at least one of these papers | |
Lecture 27 | Th May 11 | Transfer | Taylor and Stone and Konidaris and Barto | critique of at least one of these papers | |
Lecture 28 | Tu May 16 | Review for final exam | Programming Exercise 4 | ||
FINAL EXAM | Mon May 22 | Cumulative Final Exam: 8:00 AM LGRT 323 |