Overview
The last decade has seen a surge of interest, both in research and in
industry, in machine learning algorithms and systems for sequential
decision making, in which an agent interacts with an unknown
environment to accomplish some goal. In this course, we will
investigate the algorithmic principles and theoretical foundations of
sequential decision making, starting from the simplest problem
settings and gradually increasing in complexity.
We begin with multi-armed bandits, the simplest decision-making
setting, and then add in the challenge of generalization through the
frameworks of linear and contextual bandits. Finally, we consider
general reinforcement learning setttings, both with and without
function approximation.
Requirements:
- Two homework assignments involving proofs and algorithm design, 50% of course grade.
- A research-based project, 50% of course grade. Project guidelines and some project ideas are available.
Prerequisites: This is an advanced, theory-heavy course.
Exposure to algorithms and proofs at the level of an undergraduate
algorithms course (CSOR 4231) is absolutely essential. A strong grasp
of machine learning (COMS 4771), probability, and statistics are
preferred. If you have taken, and were comfortable with, COMS 4773,
then you should be well-prepared for this course. If you do not meet
these requirements, please email the instructor.
Readings and resources There is no required textbook for this course. However you may find the following resources useful. They are listed roughly in order of relevance to the course.
Homeworks
- Homework 1. Released 1/31, due 3/3. (Solutions)
- Homework 2. Released 3/21, due 4/21. (Solutions)
Feel free to use this LaTeX template and style file.
Projects
Project guidelines are available here. Important dates are:
- Project Proposals. Due 3/11 by email.
- Project Presentations. 5/2 in class.
- Project Writeup. Due 5/9 by email.
Lecture Schedule
Date |
Lecture Topics |
Readings |
Assignments |
1/24 |
Course overview, learning theory background |
- Understanding ML: Appendix B, Ch 4-5
- Notes
|
|
1/31 |
Multi-armed bandits: UCB, Thompson Sampling, Exp3 |
- Rakhlin-Sridharan: Ch 18
- Lattimore-Szepesvari: Ch 11
- Notes
|
HW 1 released
|
2/7 |
Structured bandits: LinUCB, Combinatorial bandits, Lipschitz bandits |
- Lattimore-Szepesvari: Ch 4, 7, 19
- RL Monograph: Ch 6
- Notes
|
|
2/14 |
Contextual bandits |
|
|
2/21 |
Markov Decision Process basics |
|
|
2/28 |
Policy optimization and policy gradient methods |
- RL Monograph: Ch 11
- Notes
|
|
3/7 |
PG Methods: Convergence and statistics |
- RL Monograph: Ch 12-13
- Notes
|
|
3/14 |
NO CLASS -- Spring break |
3/21 |
Tabular methods: UCB-VI |
|
HW 2 released
|
3/28 |
Linear function approximation: upper bounds |
- RL Monograph: Ch 3 and Ch 8
- Notes
|
|
4/4 |
Linear function approximation: lower bounds |
|
|
4/11 |
General function approximation: statistical theory |
|
|
4/18 |
General function approximation: algorithms |
|
|
4/25 |
Imitation Learning |
- Notes
- RL Monograph: Ch 15
|
|
5/2 |
Project Presentations |
|
|
|