Bandits and Reinforcement Learning

COMS 6998-11, Spring 2022

Akshay Krishnamurthy


When: Monday 4:10-6:00
Where: 602 Northwest Corner Building
Office Hours: Wednesday 4-5 on Zoom, or by appointment

Overview

The last decade has seen a surge of interest, both in research and in industry, in machine learning algorithms and systems for sequential decision making, in which an agent interacts with an unknown environment to accomplish some goal. In this course, we will investigate the algorithmic principles and theoretical foundations of sequential decision making, starting from the simplest problem settings and gradually increasing in complexity. We begin with multi-armed bandits, the simplest decision-making setting, and then add in the challenge of generalization through the frameworks of linear and contextual bandits. Finally, we consider general reinforcement learning setttings, both with and without function approximation.

Requirements:

  • Two homework assignments involving proofs and algorithm design, 50% of course grade.
  • A research-based project, 50% of course grade. Project guidelines and some project ideas are available.

Prerequisites: This is an advanced, theory-heavy course. Exposure to algorithms and proofs at the level of an undergraduate algorithms course (CSOR 4231) is absolutely essential. A strong grasp of machine learning (COMS 4771), probability, and statistics are preferred. If you have taken, and were comfortable with, COMS 4773, then you should be well-prepared for this course. If you do not meet these requirements, please email the instructor.

Readings and resources

There is no required textbook for this course. However you may find the following resources useful. They are listed roughly in order of relevance to the course.

Homeworks

  1. Homework 1. Released 1/31, due 3/3. (Solutions)
  2. Homework 2. Released 3/21, due 4/21. (Solutions)
Feel free to use this LaTeX template and style file.

Projects

Project guidelines are available here. Important dates are:
  1. Project Proposals. Due 3/11 by email.
  2. Project Presentations. 5/2 in class.
  3. Project Writeup. Due 5/9 by email.

Lecture Schedule

Date Lecture Topics Readings Assignments
1/24 Course overview, learning theory background
  • Understanding ML: Appendix B, Ch 4-5
  • Notes
1/31 Multi-armed bandits: UCB, Thompson Sampling, Exp3
  • Rakhlin-Sridharan: Ch 18
  • Lattimore-Szepesvari: Ch 11
  • Notes
HW 1 released
2/7 Structured bandits: LinUCB, Combinatorial bandits, Lipschitz bandits
  • Lattimore-Szepesvari: Ch 4, 7, 19
  • RL Monograph: Ch 6
  • Notes
2/14 Contextual bandits
2/21 Markov Decision Process basics
  • RL Monograph: Ch 1
  • Notes
2/28 Policy optimization and policy gradient methods
  • RL Monograph: Ch 11
  • Notes
3/7 PG Methods: Convergence and statistics
  • RL Monograph: Ch 12-13
  • Notes
3/14 NO CLASS -- Spring break
3/21 Tabular methods: UCB-VI
  • RL Monograph: Ch 7
  • Notes
HW 2 released
3/28 Linear function approximation: upper bounds
  • RL Monograph: Ch 3 and Ch 8
  • Notes
4/4 Linear function approximation: lower bounds
  • RL Monograph: Ch 5
  • Notes
4/11 General function approximation: statistical theory
4/18 General function approximation: algorithms
4/25 Imitation Learning
  • Notes
  • RL Monograph: Ch 15
5/2 Project Presentations