This course will focus on modern machine learning approaches to learn from human demonstrations, preferences, feedback, and other multimodal signals, with the goal of aligning agent goals and behaviors with human values and desires. For the purposes of both safety and practicality, it is increasingly important for AI systems to be well-aligned with human users as their capabilities improve and they are deployed more frequently in real-world settings. While the standard ML paradigm assumes that learning objectives are directly provided as part of the problem specification, emerging research in alignment suggests that it is often infeasible to do so accurately, requiring such objectives to be inferred from human data. This course will provide the basic tools to address these important issues, covering topics such as behavioral cloning, inverse reinforcement learning, preference elicitation, active learning, learning from feedback, value alignment, bounded rationality, and best practices for human studies. We will examine applications including robotics, large language models, and self-driving cars.
There will be no textbook. Links to all required readings will be provided in the class schedule.
There are no formal prerequisites, but it is strongly recommended to have strong programming skills, linear algebra, probability and statistics, multivariate calculus, and graduate-level machine learning.
Schedule
Thu 02/01 — Course Overview [
Slides]
Tue 02/06 — Behavior cloning [
Slides]
Thu 02/08 — Reinforcement Learning [
Slides]
Tue 02/13 — Reward Specification [
Slides]
Thu 02/15 — Interactive RL [
Slides]
Tue 02/20 — Inverse Reinforcement Learning [
Slides]
Thu 02/22 — No Class
Tue 02/27 — Bayesian Inverse Reinforcement Learning [
Slides]
Thu 02/29 — MOVED to 03/07 Adversarial Imitation Learning [
Slides]
Tue 03/05 — CANCELLED AI Safety and Alignment
Thu 03/07 — Adversarial Imitation Learning [
Slides]
Tue 03/12 — Preference Learning and RLHF 1 [
Slides]
Thu 03/14 — Models of Human Preference [
Slides]
Tue 03/19 — No Class
Thu 03/21 — No Class
Tue 03/26 — Preference Learning and RLHF 2 [
Slides]
Thu 03/28 — Performance Guarantees for RLHF [
Slides]
Tue 04/02 — Open Challenges in RLHF [
Slides]
Thu 04/04 — Improving Human Modeling Assumptions [
Slides]
Tue 04/09 — Fine-grained Preferences [
Slides]
Thu 04/11 — AI Risk, Mitigation, and Counterarguments [
Slides]
Tue 04/16 — Cooperation and Corrigibility [
Slides]
Thu 04/18 — Active Reward Learning and Optimal Teaching [
Slides]
Tue 04/23 — Reward Learning from Multimodal Human Signals [
Slides]
Thu 04/25 — Reward Design from Natural Language [
Slides]
Tue 04/30 — Scalable Oversight [
Slides]
Thu 05/02 — RLAIF [
Slides]
Tue 05/07 — Final Project Presentations
Thu 05/09 — Final Project Presentations
Grading
Grades will be calculated as follows, using a scale that includes both plus and minus letter grades:
- 15% Role presentations:
Each class will be a mix of lecture and discussion. To add depth to the discussion, we will draw on ideas from Colin Raffel's roleplaying seminar model. A few times per semester, each student will assigned one of the following roles for a paper and asked to produce a written report, as well as kick off discussion in class by presenting (from their seat; no slides) a summary of their findings in approximately 5 minutes:
- TMLR Reviewer: The machine learning journal TMLR uses a peer review process that focuses on technical correctness and quality, rather than subjective novelty. For reference, the review guidelines for can be found here: https://jmlr.org/tmlr/reviewer-guide.html. Write a review that discusses: (1) The claims being made by the authors; (2) Whether all claims are supported by sufficient (and correct) arguments, theory, or empirical evidence, and if not, what you'd like to see; (3) Clarifying questions you'd like to ask the authors; (4) Whether the results appear to be reproducible from information contained in the paper. Not all of these will be applicable for every paper, so use your judgement
- Archaeologist: This paper was found buried under ground in the desert. You’re an archeologist who must determine where this paper sits in the context of previous and subsequent work. Find and report on one older paper cited within the current paper that substantially influenced the current paper and one newer paper that cites this current paper. If the paper is too new to have been meaningfully cited, report on a second older paper instead. Discuss in detail how the papers influenced each other from both conceptual and technical perspectives, as well as how they differ, and what their main results are.
- Academic Researcher: Imagine that you're an academic researcher and this paper was just released. Propose a follow-up research project that builds on these ideas, addresses a key limitation of the current paper, or that investigates something about the paper's analysis or experiments that you are skeptical of. If this is an older paper with a lot of follow-up work already existing, feel free to come up with an idea that instead builds on a paper that was influenced by this one. Feel free to talk to other students to brainstorm. If you get really stuck, then simply report on the limitations of this work and highlight challenges that would be valuable to address, even if you don't have a proposal for how you might address them. Or alternately, propose something on a smaller-scale, such as an additional experiment or hypothesis to examine that would have made the paper stronger.
Grades will be assigned only for the quality of the written report, but students are expected to be well-prepared to present their findings in class.
- 25% Reading critiques:
For everyone who isn't assigned a role for a particular paper, a written critique of each reading (usually two) for each class will be due by 8:00 PM the previous night via Gradescope.
Each critique should include all of the following:
- A short summary of the main contribution(s) of the paper in your own words (roughly two sentences)
- A short description of how the paper differs from prior work.
- One strength and one weakness of the proposed method, core argument, or experiments
- At least one question / comment that you'd like me to address during class or that could spur discussion
In all cases, the written critique should provide non-trivial insight into the reading.
To get full credit, you must show that you understood and thought critically about the core concepts presented.
- 25% Programming assignments:
There will be several programming assignments, in which machine learning algorithms will be implemented and evaluated.
Each assignment will require students to turn in code as well as a short written report.
- 25% Final project:
Roughly halfway through the semester, students will propose topics of their own choosing for a large final programming project.
These projects may be completed alone, though it is encouraged to work in groups of up to 3 students.
A rough guideline is that the project produce about half a standard conference paper worth of material
(this means both technical content and length—about 4 double-column pages in LaTeX).
These projects are a chance to dive deeply into any topic of interest related to the course. Students are encouraged
to tie this work into their primary research that they are already pursuing, as long as it can relate to human-centric ML).
Example projects could include extending an algorithm in a novel way, comparing several algorithms on an interesting problem, or
designing a new approach to attack a problem relevant to the class. In all cases, there should be a novel intellectual
contribution, as well as empirical results on a problem of interest.
- 10% Attendance and participation:
Attendance is mandatory and participation in discussion is an important element of the course. Students should aim to participate in the discussion at least once a week.
Late work policy: Reading critiques will not be accepted late, since their main goal is to provide fuel for discussion.
However, each student can skip up to three critiques without penalty.
No other extensions will be given for critiques except in highly unusual circumstances, so please save these for times of necessity.
All other assignments can be turned in up to one week late, at a loss of 5 points (out of 100) per late day (though this cannot go beyond the final day of classes).
Attendance policy: Given that discussion is an important element of the course, attendance is mandatory and an element of grading, as described above.
Grades will be assigned using both plus and minus grades as follows:
93-100: A
90-93: A-
87-90: B+
83-87: B
80-83: B-
77-80: C+
73-77: C
<73: F
Generative AI Policy
Generative AI tools (e.g. ChatGPT) can only be used in the context of background research to better understand topics covered in the class. For example, it is permissible to ask ChatGPT for a summary of how inverse reinforcement learning differs from behavioral cloning. Please be aware that generative AI tools often provide inaccurate information and should always be verified against other sources. You are not permitted to use generative AI tools to assist with any part of completing your reading summaries, written homeworks, or coding assignments. This policy clearly forbids copying text or code directly from these sources, but it is also not acceptable to summarize the output of generative AI tools, or to use an answer from them as a starting point for your own work. To be clear: all written work should be done on your own from scratch, with generative AI used only in the limited case of generic background research that is not specific to any particular reading summary or homework question. Violation of this policy will reported as a violation of the university's academic honesty standards and punished accordingly.
Academic Honesty Statement
Since the integrity of the academic enterprise of any institution of higher education requires honesty in scholarship and research, academic honesty is required of all students at the University of Massachusetts Amherst. Academic dishonesty is prohibited in all programs of the University. Academic dishonesty includes but is not limited to: cheating, fabrication, plagiarism, and facilitating dishonesty. Appropriate sanctions may be imposed on any student who has committed an act of academic dishonesty. Instructors should take reasonable steps to address academic misconduct. Any person who has reason to believe that a student has committed academic dishonesty should bring such information to the attention of the appropriate course instructor as soon as possible. Instances of academic dishonesty not related to a specific course should be brought to the attention of the appropriate department Head or Chair. Since students are expected to be familiar with this policy and the commonly accepted standards of academic integrity, ignorance of such standards is not normally sufficient evidence of lack of intent (http://www.umass.edu/dean_students/codeofconduct/acadhonesty/).
Accommodation Statement
The University of Massachusetts Amherst is committed to providing an equal educational opportunity for all students. If you have a documented physical, psychological, or learning disability on file with Disability Services (DS), you may be eligible for reasonable academic accommodations to help you succeed in this course. If you have a documented disability that requires an accommodation, please notify me within the first two weeks of the semester so that we may make appropriate arrangements. For further information, please visit Disability Services (https://www.umass.edu/disability/)
Title IX Statement
In accordance with Title IX of the Education Amendments of 1972 that prohibits gender-based discrimination in educational settings that receive federal funds, the University of Massachusetts Amherst is committed to providing a safe learning environment for all students, free from all forms of discrimination, including sexual assault, sexual harassment, domestic violence, dating violence, stalking, and retaliation. This includes interactions in person or online through digital platforms and social media. Title IX also protects against discrimination on the basis of pregnancy, childbirth, false pregnancy, miscarriage, abortion, or related conditions, including recovery. There are resources here on campus to support you. A summary of the available Title IX resources (confidential and non-confidential) can be found at the following link: https://www.umass.edu/titleix/resources. You do not need to make a formal report to access them. If you need immediate support, you are not alone. Free and confidential support is available 24 hours a day / 7 days a week / 365 days a year at the SASA Hotline 413-545-0800.