Bandits and Reinforcement Learning

COMS 6998-11, Spring 2022

Akshay Krishnamurthy

When: Monday 4:10-6:00
Where: 602 Northwest Corner Building
Office Hours: Wednesday 4-5 on Zoom, or by appointment

Overview

The last decade has seen a surge of interest, both in research and in industry, in machine learning algorithms and systems for sequential decision making, in which an agent interacts with an unknown environment to accomplish some goal. In this course, we will investigate the algorithmic principles and theoretical foundations of sequential decision making, starting from the simplest problem settings and gradually increasing in complexity. We begin with multi-armed bandits, the simplest decision-making setting, and then add in the challenge of generalization through the frameworks of linear and contextual bandits. Finally, we consider general reinforcement learning setttings, both with and without function approximation.

Requirements:

Two homework assignments involving proofs and algorithm design, 50% of course grade.
A research-based project, 50% of course grade. Project guidelines and some project ideas are available.

Prerequisites: This is an advanced, theory-heavy course. Exposure to algorithms and proofs at the level of an undergraduate algorithms course (CSOR 4231) is absolutely essential. A strong grasp of machine learning (COMS 4771), probability, and statistics are preferred. If you have taken, and were comfortable with, COMS 4773, then you should be well-prepared for this course. If you do not meet these requirements, please email the instructor.

Readings and resources

There is no required textbook for this course. However you may find the following resources useful. They are listed roughly in order of relevance to the course.

Reinforcement learning: Theory and algorithms by Alekh Agarwal, Nan Jiang, Sham Kakade, and Wen Sun
COLT 2021 Tutorial on Reinforcement Learning by Akshay Krishnamurthy and Wen Sun
Bandit Algorithms by Tor Lattimore and Csaba Szepesvari
Statistical Learning and Sequential Prediction by Alexander Rakhlin and Karthik Sridharan
Understanding machine learning: From theory to algorithms by Shai Shalev-Shwartz and Shai Ben-David

Homeworks

Homework 1. Released 1/31, due 3/3. (Solutions)
Homework 2. Released 3/21, due 4/21. (Solutions)

Feel free to use this LaTeX template and style file.

Projects

Project guidelines are available here. Important dates are:

Project Proposals. Due 3/11 by email.
Project Presentations. 5/2 in class.
Project Writeup. Due 5/9 by email.

Lecture Schedule

Date	Lecture Topics	Readings	Assignments
1/24	Course overview, learning theory background	Understanding ML: Appendix B, Ch 4-5 Notes
1/31	Multi-armed bandits: UCB, Thompson Sampling, Exp3	Rakhlin-Sridharan: Ch 18 Lattimore-Szepesvari: Ch 11 Notes	HW 1 released
2/7	Structured bandits: LinUCB, Combinatorial bandits, Lipschitz bandits	Lattimore-Szepesvari: Ch 4, 7, 19 RL Monograph: Ch 6 Notes
2/14	Contextual bandits	Notes
2/21	Markov Decision Process basics	RL Monograph: Ch 1 Notes
2/28	Policy optimization and policy gradient methods	RL Monograph: Ch 11 Notes
3/7	PG Methods: Convergence and statistics	RL Monograph: Ch 12-13 Notes
3/14	NO CLASS -- Spring break
3/21	Tabular methods: UCB-VI	RL Monograph: Ch 7 Notes	HW 2 released
3/28	Linear function approximation: upper bounds	RL Monograph: Ch 3 and Ch 8 Notes
4/4	Linear function approximation: lower bounds	RL Monograph: Ch 5 Notes
4/11	General function approximation: statistical theory	RL Monograph: Ch 9 Bellman rank paper Notes
4/18	General function approximation: algorithms	Notes Homer paper Rep-UCB paper
4/25	Imitation Learning	Notes RL Monograph: Ch 15
5/2	Project Presentations