CS 690S: AI Alignment


Course Info

Semester: Spring 2026
Credits: 3
Time: Wed/Fri 1:00-2:15
Location: LGRC A104A
Website: https://people.cs.umass.edu/~sniekum/classes/LFH-S2/desc.php

Instructor: Scott Niekum
Email: [javascript protected email address]
Prof. Office Hours: By appointment (CSL 343 or Zoom)

TA: Shreyas Chaudhari
Email: [javascript protected email address]
TA Office Hours: TBD

Piazza: https://piazza.com/umass/spring2026/cs609

Course Objectives

This course will focus on modern machine learning approaches to align agent objectives and behaviors with human values and desires. For the purposes of both safety and practicality, it is increasingly important for AI systems to be well-aligned as their capabilities increase and they are deployed more frequently in real-world settings. While the standard ML paradigm assumes that objective functions are provided as part of the problem specification, alignment research examines the profound challenges associated with specifying or learning about such objectives. This course covers a core set of topics that examine AI Alignment from a variety of angles, including behavioral cloning, inverse reinforcement learning, reinforcement learning from human feedback, robustness, scalable oversight, and mechanistic interpretability. We will examine applications of alignment ranging from robotics to large language models. Thus, the course aims to provide a broad overview of how AI researchers and practitioners have historically tried to design objectives and control the behaviors of AI systems, rather than adhering to any particular definition of alignment.

There will be no textbook. Links to all required readings will be provided in the class schedule.

There are no formal prerequisites, but it is strongly recommended to have strong programming skills, linear algebra, probability and statistics, multivariate calculus, graduate-level machine learning, and at least some familiarity with reinforcement learning.

Schedule


Fri 01/30 — Course Overview [Slides]



Wed 02/04 — Behavior cloning [Slides]



Fri 02/06 — Limitations of behavior cloning [Slides]



Wed 02/11 — Reinforcement Learning [Slides]



Fri 02/13 — Reward hacking [Slides]



Wed 02/18 — Interactive RL [Slides]



Fri 02/20 — Inverse reinforcement learning [Slides]



Wed 02/25 — Maximum entropy inverse reinforcement learning [Slides]



Fri 02/27 — Bayesian inverse reinforcement learning [Slides]



Wed 03/04 — Occupancy matching [Slides]



Fri 03/06 — Cooperation and Corrigibility [Slides]



Wed 03/11 — Reinforcement learning from human feedback: direct methods [Slides]



Fri 03/13 — Models of human preference [Slides]



Wed 03/18 — No Class — Spring break




Fri 03/20 — No Class — Spring break




Wed 03/25 — Reinforcement learning from human feedback: indirect methods [Slides]



Fri 03/27 — Performance Guarantees for RLHF [Slides]



Wed 04/01 — Reinforcement learning from AI feedback [Slides]



Fri 04/03 — Reward design and inference from multimodal signals [Slides]



Wed 04/08 — Mechanistic interpretability [Slides]



Fri 04/10 — Circuit breakers and unlearning [Slides]



Wed 04/15 — AI risks [Slides]



Fri 04/17 — Red teaming [Slides]



Wed 04/22 — Adversarial robustness [Slides]



Fri 04/24 — No Class — Holiday




Wed 04/29 — Deception and frontiers of misalignment [Slides]



Fri 05/01 — Scalable oversight [Slides]



Wed 05/06 — Final Project Presentations




Fri 05/08 — Final Project Presentations




Grading

Grades will be calculated as follows, using a scale that includes both plus and minus letter grades:

  • 20% Reading critiques:

    A written critique of each reading (usually two) for each class will be due by 11:59 PM the previous night via Gradescope. Each critique should be formatted as a numbered list that addresses all of the following:

    • (1) A short summary of the main contribution(s) of the paper in your own words (roughly two sentences)
    • (2) A short description of how the paper differs from prior work.
    • (3) One strength and one weakness of the proposed method, core argument, or experiments
    • (4) At least one question / comment that you'd like me to address during class or that could spur discussion
    • (5) One idea for potential follow-up work, or a citation and short description of existing follow-up work that cites this paper.

    In all cases, the written critique should provide non-trivial insight into the reading. To get full credit, you must show that you understood and thought critically about the core concepts presented.


  • 20% Homework assignments:

    There will be several programming assignments, in which machine learning algorithms will be implemented and evaluated. Each assignment will require students to turn in code as well as a short written report.


  • 20% Quizzes:

    Since generative AI is permitted to be used for any part of the course (other than these quizzes), there will be frequent short in-class quizzes that test basic knowledge about the readings and homeworks. They will be designed so that studying should not be required -- as long as the assignments were completed without over-reliance on AI, you should have all the knowledge you need to do well on the quizzes.


  • 40% Final project:

    Roughly halfway through the semester, students will propose topics of their own choosing for a large final programming project. These projects may be completed alone, though it is encouraged to work in groups of up to 3 students. A rough guideline is that the project produce about half a standard conference paper worth of material (this means both technical content and length—about 4 double-column pages in LaTeX).

    These projects are a chance to dive deeply into any topic of interest related to the course. Students are encouraged to tie this work into their primary research that they are already pursuing, as long as it can relate to AI alignment broadly).

    Example projects could include extending an algorithm in a novel way, comparing several algorithms on an interesting problem, or designing a new approach to attack a problem relevant to the class. In all cases, there should be a novel intellectual contribution, as well as empirical results on a problem of interest.


Late work policy: Reading critiques will not be accepted late, since their main goal is to provide fuel for discussion. However, each student can skip up to three critiques without penalty. No other extensions will be given for critiques except in highly unusual circumstances, so please save these for times of necessity.

All other assignments can be turned in up to one week late, at a loss of 2 points (out of 100) per late day (though this cannot go beyond the final day of classes). However, please note that in-class quizzes may include material from homework assignments immediately after the due date, so completing them on time is strongly encouraged.

Attendance policy: Given that discussion is an important element of the course, and that there will be frequent in-class quizzes, attendance is mandatory.

Grades will be assigned using both plus and minus grades as follows:

93-100: A
90-93: A-
87-90: B+
83-87: B
80-83: B-
77-80: C+
73-77: C
<73: F

Generative AI Policy

Generative AI tools can be used for any aspect of the class, other than in-class quizzes. However, it is strongly recommended that you use them sparingly and with a great deal of review, to ensure that you learn the material sufficiently well. This is both for your own learning benefit, as well as to ensure that you perform well on the quizzes that cover this material. Aside from the use of generative AI, all assignments must be completed alone unless otherwise specified.

Academic Integrity Statement

UMass Amherst is strongly committed to academic integrity, which is defined as completing all academic work without cheating, lying, stealing, or receiving unauthorized assistance from any other person, or using any source of information not appropriately authorized or attributed. As a community, we hold each other accountable and support each other’s knowledge and understanding of academic integrity. Academic dishonesty is prohibited in all programs of the University and includes but is not limited to: Cheating, fabrication, plagiarism, lying, and facilitating dishonesty, via analogue and digital means. Sanctions may be imposed on any student who has committed or participated in an academic integrity infraction. Any person who has reason to believe that a student has committed an academic integrity infraction should bring such information to the attention of the appropriate course instructor as soon as possible. All students at the University of Massachusetts Amherst have read and acknowledged the Commitment to Academic Integrity and are knowingly responsible for completing all work with integrity and in accordance with the policy: (https://www.umass.edu/senate/book/academic-regulations-academic-integrity-policy)

In this course, you are encouraged to discuss the readings and concepts with classmates, but all work must be your own. Programming assignments and homework must be completed alone, except for teams for the final project. Generative AI may be used in accordance with the policy listed above.

Accommodation Statement

The University of Massachusetts Amherst is committed to providing an equal educational opportunity for all students. If you have a documented physical, psychological, or learning disability on file with Disability Services (DS), you may be eligible for reasonable academic accommodations to help you succeed in this course. If you have a documented disability that requires an accommodation, please notify me within the first two weeks of the semester so that we may make appropriate arrangements. For further information, please visit Disability Services (https://www.umass.edu/disability/)

Title IX Statement

In accordance with Title IX of the Education Amendments of 1972 that prohibits gender-based discrimination in educational settings that receive federal funds, the University of Massachusetts Amherst is committed to providing a safe learning environment for all students, free from all forms of discrimination, including sexual assault, sexual harassment, domestic violence, dating violence, stalking, and retaliation. This includes interactions in person or online through digital platforms and social media. Title IX also protects against discrimination on the basis of pregnancy, childbirth, false pregnancy, miscarriage, abortion, or related conditions, including recovery. There are resources here on campus to support you. A summary of the available Title IX resources (confidential and non-confidential) can be found at the following link: https://www.umass.edu/titleix/resources. You do not need to make a formal report to access them. If you need immediate support, you are not alone. Free and confidential support is available 24 hours a day / 7 days a week / 365 days a year at the SASA Hotline 413-545-0800.