This course will focus on modern machine learning approaches to align agent objectives and behaviors with human values and desires. For the purposes of both safety and practicality, it is increasingly important for AI systems to be well-aligned as their capabilities increase and they are deployed more frequently in real-world settings. While the standard ML paradigm assumes that objective functions are provided as part of the problem specification, alignment research examines the profound challenges associated with specifying or learning about such objectives. This course covers a core set of topics that examine AI Alignment from a variety of angles, including behavioral cloning, inverse reinforcement learning, reinforcement learning from human feedback, robustness, scalable oversight, and mechanistic interpretability. We will examine applications of alignment ranging from robotics to large language models. Thus, the course aims to provide a broad overview of how AI researchers and practitioners have historically tried to design objectives and control the behaviors of AI systems, rather than adhering to any particular definition of alignment.
There will be no textbook. Links to all required readings will be provided in the class schedule.
There are no formal prerequisites, but it is strongly recommended to have strong programming skills, linear algebra, probability and statistics, multivariate calculus, graduate-level machine learning, and at least some familiarity with reinforcement learning.
Grades will be assigned using both plus and minus grades as follows:
93-100: AGenerative AI tools can be used for any aspect of the class, other than in-class quizzes. However, it is strongly recommended that you use them sparingly and with a great deal of review, to ensure that you learn the material sufficiently well. This is both for your own learning benefit, as well as to ensure that you perform well on the quizzes that cover this material. Aside from the use of generative AI, all assignments must be completed alone unless otherwise specified.
UMass Amherst is strongly committed to academic integrity, which is defined as completing all academic work without cheating, lying, stealing, or receiving unauthorized assistance from any other person, or using any source of information not appropriately authorized or attributed. As a community, we hold each other accountable and support each other’s knowledge and understanding of academic integrity. Academic dishonesty is prohibited in all programs of the University and includes but is not limited to: Cheating, fabrication, plagiarism, lying, and facilitating dishonesty, via analogue and digital means. Sanctions may be imposed on any student who has committed or participated in an academic integrity infraction. Any person who has reason to believe that a student has committed an academic integrity infraction should bring such information to the attention of the appropriate course instructor as soon as possible. All students at the University of Massachusetts Amherst have read and acknowledged the Commitment to Academic Integrity and are knowingly responsible for completing all work with integrity and in accordance with the policy: (https://www.umass.edu/senate/book/academic-regulations-academic-integrity-policy)
In this course, you are encouraged to discuss the readings and concepts with classmates, but all work must be your own. Programming assignments and homework must be completed alone, except for teams for the final project. Generative AI may be used in accordance with the policy listed above.
The University of Massachusetts Amherst is committed to providing an equal educational opportunity for all students. If you have a documented physical, psychological, or learning disability on file with Disability Services (DS), you may be eligible for reasonable academic accommodations to help you succeed in this course. If you have a documented disability that requires an accommodation, please notify me within the first two weeks of the semester so that we may make appropriate arrangements. For further information, please visit Disability Services (https://www.umass.edu/disability/)
In accordance with Title IX of the Education Amendments of 1972 that prohibits gender-based discrimination in educational settings that receive federal funds, the University of Massachusetts Amherst is committed to providing a safe learning environment for all students, free from all forms of discrimination, including sexual assault, sexual harassment, domestic violence, dating violence, stalking, and retaliation. This includes interactions in person or online through digital platforms and social media. Title IX also protects against discrimination on the basis of pregnancy, childbirth, false pregnancy, miscarriage, abortion, or related conditions, including recovery. There are resources here on campus to support you. A summary of the available Title IX resources (confidential and non-confidential) can be found at the following link: https://www.umass.edu/titleix/resources. You do not need to make a formal report to access them. If you need immediate support, you are not alone. Free and confidential support is available 24 hours a day / 7 days a week / 365 days a year at the SASA Hotline 413-545-0800.