CS 690: Human-Centric Machine Learning

Course Info

Semester: Spring 2024
Credits: 3
Time: Tu/Th 11:30-12:45
Location: CS 142
Website: https://people.cs.umass.edu/~sniekum/classes/HCML-S24/desc.php

Instructor: Scott Niekum
Email: [javascript protected email address]
Prof. Office Hours: By appointment (CS 374 or Zoom)

TA: Rohan Pandey
Email: [javascript protected email address]
TA Office Hours: Tues 2:30-3:30 and Wed 4:00-5:00 on Zoom. See Piazza for the Zoom link.

Piazza: https://piazza.com/umass/spring2024/cs609

Course Objectives

This course will focus on modern machine learning approaches to learn from human demonstrations, preferences, feedback, and other multimodal signals, with the goal of aligning agent goals and behaviors with human values and desires. For the purposes of both safety and practicality, it is increasingly important for AI systems to be well-aligned with human users as their capabilities improve and they are deployed more frequently in real-world settings. While the standard ML paradigm assumes that learning objectives are directly provided as part of the problem specification, emerging research in alignment suggests that it is often infeasible to do so accurately, requiring such objectives to be inferred from human data. This course will provide the basic tools to address these important issues, covering topics such as behavioral cloning, inverse reinforcement learning, preference elicitation, active learning, learning from feedback, value alignment, bounded rationality, and best practices for human studies. We will examine applications including robotics, large language models, and self-driving cars.

There will be no textbook. Links to all required readings will be provided in the class schedule.

There are no formal prerequisites, but it is strongly recommended to have strong programming skills, linear algebra, probability and statistics, multivariate calculus, and graduate-level machine learning.

Schedule

Thu 02/01 — Course Overview [Slides]

Tue 02/06 — Behavior cloning [Slides]

Behavioral Cloning from Observation

A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning

Assignment 1 — Due by 11:59pm on Monday 2/19 via Gradescope.

Thu 02/08 — Reinforcement Learning [Slides]

Introduction (Part 1 and Part 2) and Algorithm Docs (Proximal Policy Optimization): Spinning Up in Deep RL

Optional for more background: Sutton/Barto RL Book

Optional: Proximal Policy Optimization Algorithms

Optional: Soft Actor-Critic

Tue 02/13 — Reward Specification [Slides]

The Perils of Trial-and-Error Reward Design: Misdesign through Overfitting and Invalid Task Specifications

Goal Misgeneralization: Why Correct Specifications Aren’t Enough For Correct Goals

Thu 02/15 — Interactive RL [Slides]

Interactively shaping agents via human reinforcement: The TAMER framework

Interactive Learning from Policy-Dependent Human Feedback

Tue 02/20 — Inverse Reinforcement Learning [Slides]

Apprenticeship Learning via Inverse Reinforcement Learning

Maximum Entropy Inverse Reinforcement Learning

Assignment 2 — Due by 11:59pm on Monday 3/4 via Gradescope.

Thu 02/22 — No Class

Tue 02/27 — Bayesian Inverse Reinforcement Learning [Slides]

Bayesian Inverse Reinforcement Learning

Efficient Probabilistic Performance Bounds for Inverse Reinforcement Learning

Thu 02/29 — MOVED to 03/07 Adversarial Imitation Learning [Slides]

Generative Adversarial Imitation Learning

Learning Robust Rewards with Adversarial Inverse Reinforcement Learning

Final project proposal — Due by 11:59pm on Friday 3/15 via Gradescope.

Tue 03/05 — CANCELLED AI Safety and Alignment

Scalable Agent Alignment via Reward Modeling: A Research Direction

Unsolved Problems in ML Safety

Thu 03/07 — Adversarial Imitation Learning [Slides]

Generative Adversarial Imitation Learning

Learning Robust Rewards with Adversarial Inverse Reinforcement Learning

Final project proposal — Due by 11:59pm on Friday 3/15 via Gradescope.

Tue 03/12 — Preference Learning and RLHF 1 [Slides]

Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations

Training language models to follow instructions with human feedback

Assignment 3 — Due by 11:59pm on Wednesday 3/27 via Gradescope.

Thu 03/14 — Models of Human Preference [Slides]

Models of Human Preference for Learning Reward Functions

First 3 pages only: Preference Reversal in Multiattribute Choice

Tue 03/19 — No Class

Thu 03/21 — No Class

Tue 03/26 — Preference Learning and RLHF 2 [Slides]

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

Contrastive Preference Learning: Learning from Human Feedback without RL

Thu 03/28 — Performance Guarantees for RLHF [Slides]

Safe Imitation Learning via Fast Bayesian Reward Inference from Preferences

Value Alignment Verification

Tue 04/02 — Open Challenges in RLHF [Slides]

Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

Thu 04/04 — Improving Human Modeling Assumptions [Slides]

Preference Transformer: Modeling Human Preferences using Transformers for RL

Human Irrationality: Both Bad and Good for Reward Inference

Final project check-in — Due by 11:59pm on Wednesday 4/10 via Gradescope.

Tue 04/09 — Fine-grained Preferences [Slides]

Fine-Grained Human Feedback Gives Better Rewards for Language Model Training

Let’s Verify Step by Step

Thu 04/11 — AI Risk, Mitigation, and Counterarguments [Slides]

Anthropic's Responsible Scaling Policy

Counterarguments to the Basic AI Risk Case

Final project writeup — Due by 11:59pm on Friday 5/10 via Gradescope.

Tue 04/16 — Cooperation and Corrigibility [Slides]

Cooperative Inverse Reinforcement Learning

The Off-Switch Game

Thu 04/18 — Active Reward Learning and Optimal Teaching [Slides]

Active Reward Learning from Critiques

Machine Teaching for Inverse Reinforcement Learning

Tue 04/23 — Reward Learning from Multimodal Human Signals [Slides]

Efficiently Guiding Imitation Learning Agents with Human Gaze

The EMPATHIC Framework for Task Learning from Implicit Human Feedback

Thu 04/25 — Reward Design from Natural Language [Slides]

Reward Design with Language Models

Eureka: Human-Level Reward Design via Coding Large Language Models

Tue 04/30 — Scalable Oversight [Slides]

Measuring Progress on Scalable Oversight for Large Language Models

Constitutional AI: Harmlessness from AI Feedback

Thu 05/02 — RLAIF [Slides]

RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback

Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision

Tue 05/07 — Final Project Presentations

Thu 05/09 — Final Project Presentations

Grading

Grades will be calculated as follows, using a scale that includes both plus and minus letter grades:

15% Role presentations:

Each class will be a mix of lecture and discussion. To add depth to the discussion, we will draw on ideas from Colin Raffel's roleplaying seminar model. A few times per semester, each student will assigned one of the following roles for a paper and asked to produce a written report, as well as kick off discussion in class by presenting (from their seat; no slides) a summary of their findings in approximately 5 minutes:
- TMLR Reviewer: The machine learning journal TMLR uses a peer review process that focuses on technical correctness and quality, rather than subjective novelty. For reference, the review guidelines for can be found here: https://jmlr.org/tmlr/reviewer-guide.html. Write a review that discusses: (1) The claims being made by the authors; (2) Whether all claims are supported by sufficient (and correct) arguments, theory, or empirical evidence, and if not, what you'd like to see; (3) Clarifying questions you'd like to ask the authors; (4) Whether the results appear to be reproducible from information contained in the paper. Not all of these will be applicable for every paper, so use your judgement
- Archaeologist: This paper was found buried under ground in the desert. You’re an archeologist who must determine where this paper sits in the context of previous and subsequent work. Find and report on one older paper cited within the current paper that substantially influenced the current paper and one newer paper that cites this current paper. If the paper is too new to have been meaningfully cited, report on a second older paper instead. Discuss in detail how the papers influenced each other from both conceptual and technical perspectives, as well as how they differ, and what their main results are.
- Academic Researcher: Imagine that you're an academic researcher and this paper was just released. Propose a follow-up research project that builds on these ideas, addresses a key limitation of the current paper, or that investigates something about the paper's analysis or experiments that you are skeptical of. If this is an older paper with a lot of follow-up work already existing, feel free to come up with an idea that instead builds on a paper that was influenced by this one. Feel free to talk to other students to brainstorm. If you get really stuck, then simply report on the limitations of this work and highlight challenges that would be valuable to address, even if you don't have a proposal for how you might address them. Or alternately, propose something on a smaller-scale, such as an additional experiment or hypothesis to examine that would have made the paper stronger.
Grades will be assigned only for the quality of the written report, but students are expected to be well-prepared to present their findings in class.
25% Reading critiques:

For everyone who isn't assigned a role for a particular paper, a written critique of each reading (usually two) for each class will be due by 8:00 PM the previous night via Gradescope. Each critique should include all of the following:
- A short summary of the main contribution(s) of the paper in your own words (roughly two sentences)
- A short description of how the paper differs from prior work.
- One strength and one weakness of the proposed method, core argument, or experiments
- At least one question / comment that you'd like me to address during class or that could spur discussion
In all cases, the written critique should provide non-trivial insight into the reading. To get full credit, you must show that you understood and thought critically about the core concepts presented.
25% Programming assignments:

There will be several programming assignments, in which machine learning algorithms will be implemented and evaluated. Each assignment will require students to turn in code as well as a short written report.
25% Final project:

Roughly halfway through the semester, students will propose topics of their own choosing for a large final programming project. These projects may be completed alone, though it is encouraged to work in groups of up to 3 students. A rough guideline is that the project produce about half a standard conference paper worth of material (this means both technical content and length—about 4 double-column pages in LaTeX).

These projects are a chance to dive deeply into any topic of interest related to the course. Students are encouraged to tie this work into their primary research that they are already pursuing, as long as it can relate to human-centric ML).

Example projects could include extending an algorithm in a novel way, comparing several algorithms on an interesting problem, or designing a new approach to attack a problem relevant to the class. In all cases, there should be a novel intellectual contribution, as well as empirical results on a problem of interest.
10% Attendance and participation:

Attendance is mandatory and participation in discussion is an important element of the course. Students should aim to participate in the discussion at least once a week.

Late work policy: Reading critiques will not be accepted late, since their main goal is to provide fuel for discussion. However, each student can skip up to three critiques without penalty. No other extensions will be given for critiques except in highly unusual circumstances, so please save these for times of necessity.

All other assignments can be turned in up to one week late, at a loss of 5 points (out of 100) per late day (though this cannot go beyond the final day of classes).

Attendance policy: Given that discussion is an important element of the course, attendance is mandatory and an element of grading, as described above.

Grades will be assigned using both plus and minus grades as follows:

93-100: A
90-93: A-
87-90: B+
83-87: B
80-83: B-
77-80: C+
73-77: C
<73: F

Generative AI Policy

Generative AI tools (e.g. ChatGPT) can only be used in the context of background research to better understand topics covered in the class. For example, it is permissible to ask ChatGPT for a summary of how inverse reinforcement learning differs from behavioral cloning. Please be aware that generative AI tools often provide inaccurate information and should always be verified against other sources. You are not permitted to use generative AI tools to assist with any part of completing your reading summaries, written homeworks, or coding assignments. This policy clearly forbids copying text or code directly from these sources, but it is also not acceptable to summarize the output of generative AI tools, or to use an answer from them as a starting point for your own work. To be clear: all written work should be done on your own from scratch, with generative AI used only in the limited case of generic background research that is not specific to any particular reading summary or homework question. Violation of this policy will reported as a violation of the university's academic honesty standards and punished accordingly.

Academic Honesty Statement

Since the integrity of the academic enterprise of any institution of higher education requires honesty in scholarship and research, academic honesty is required of all students at the University of Massachusetts Amherst. Academic dishonesty is prohibited in all programs of the University. Academic dishonesty includes but is not limited to: cheating, fabrication, plagiarism, and facilitating dishonesty. Appropriate sanctions may be imposed on any student who has committed an act of academic dishonesty. Instructors should take reasonable steps to address academic misconduct. Any person who has reason to believe that a student has committed academic dishonesty should bring such information to the attention of the appropriate course instructor as soon as possible. Instances of academic dishonesty not related to a specific course should be brought to the attention of the appropriate department Head or Chair. Since students are expected to be familiar with this policy and the commonly accepted standards of academic integrity, ignorance of such standards is not normally sufficient evidence of lack of intent (http://www.umass.edu/dean_students/codeofconduct/acadhonesty/).

Accommodation Statement

The University of Massachusetts Amherst is committed to providing an equal educational opportunity for all students. If you have a documented physical, psychological, or learning disability on file with Disability Services (DS), you may be eligible for reasonable academic accommodations to help you succeed in this course. If you have a documented disability that requires an accommodation, please notify me within the first two weeks of the semester so that we may make appropriate arrangements. For further information, please visit Disability Services (https://www.umass.edu/disability/)

Title IX Statement

In accordance with Title IX of the Education Amendments of 1972 that prohibits gender-based discrimination in educational settings that receive federal funds, the University of Massachusetts Amherst is committed to providing a safe learning environment for all students, free from all forms of discrimination, including sexual assault, sexual harassment, domestic violence, dating violence, stalking, and retaliation. This includes interactions in person or online through digital platforms and social media. Title IX also protects against discrimination on the basis of pregnancy, childbirth, false pregnancy, miscarriage, abortion, or related conditions, including recovery. There are resources here on campus to support you. A summary of the available Title IX resources (confidential and non-confidential) can be found at the following link: https://www.umass.edu/titleix/resources. You do not need to make a formal report to access them. If you need immediate support, you are not alone. Free and confidential support is available 24 hours a day / 7 days a week / 365 days a year at the SASA Hotline 413-545-0800.