Machine Learning

CS 689, Spring 2023, UMass Amherst CS

Instructor: Brendan O’Connor, brenocon AT cs.umass.edu

All materials accessible via the Piazza page for this course.

Course Description: Machine learning is the computational study of artificial systems that can adapt to novel situations, discover patterns from data, and improve performance with practice. This course will cover the mathematical foundation of supervised and unsupervised learning. The course will provide a state-of-the-art overview of the field, with an emphasis on implementing and deriving learning algorithms for a variety of models from first principles.

Overview: The information below is designed to help you decide whether this is the right course for you at this time and how to be successful in the course. Machine learning at the PhD level aims to prepare students to participate in machine learning research. It requires both strong mathematical foundations and the ability to implement algorithms with a high degree of precision and computational efficiency. Specifically, the course requires a solid undergraduate-level background in linear algebra, vector calculus, multi-variate probability, and numerical programming in Python. Students who need to acquire or substantially revise this background material should plan to spend significant additional time on assignments.

The three main steps you can take to succeed in the course are:

Make sure 689 is the right course for you and this is the right time to take it. MS students with no background in ML are strongly encouraged to take 589 prior to taking 689, unless they have extremely strong backgrounds in all areas (e.g., dual major in Math and CS, undergrad in CS and prior MS in math). MS/PhD students who are interested in applied machine learning are also strongly encouraged to take 589 before (or instead of) taking 689. 589 counts as a 500-level elective for MS/PhD students. MS students who want experience with the mathematical foundations of machine learning and MS/PhD students who plan to conduct research in ML or related area (vision, NLP, AI, etc.) should take 689.
Set up your schedule to accommodate the course. All students are strongly advised against taking 689 in combination with any other PhD-level core course unless they have extremely strong backgrounds in all areas. You can make-up gaps in background at the same time you learn primary course material, but you will need to be prepared to devote extra time to the course to do so.
Start addressing weaknesses in you background now. 689 starts with the assumption that you have sufficient background knowledge of linear algebra, vector calculus, multi-variate probability, and Python, and will integrate aspects of these topics together from the outset (e.g., using differential calculus to derive a method for optimizing the parameters of a multi-variate probability density over a vector space and then implementing the method in Python). The course does not cover background topics, but to help you prepare we have assembled a reading list that covers what you need to know to get started in the course. Reviewing all of the material below with a focus on weaker areas is a good strategy for all students. The specific sources below may cover material at a deeper level than is included in some undergrad CS programs (for example, computational complexity of linear algebra operations).

Suggested Reading List: Covering the math in the order listed below is likely to be most helpful. For calculus, Corral or Marsden and Tromba can be used. Marsden and Tromba is more detailed, but Corral will do. All texts are open access or freely available through the UMass Library (links provided), except for Marsden and Tromba. Students should feel free to discuss background material among themselves on Piazza using the background tag.

Zico Kolter. Linear Algebra Review and Reference (2008 version), and also videos. Sometimes very brief, but covers most of the necessary linear algebra and multivariate calculus topics in this course.
Stephen Boyd and Lieven Vandenberghe. Introduction to Applied Linear Algebra.
- Chapter 1: Vectors
- Chapter 2.1: Linear Functions
- Chapter 3: Norm and Distance
- Chapter 5: Linear Independence
- Chapter 6: Matrices
- Chapter 8: Linear Equations (Can skip 8.2)
- Chapter 10: Matrix Multiplication
- Chapter 11: Matrix Inverses
Stephen Boyd and Lieven Vandenberghe. Convex Optimization. (Covers additional linear algebra background missing from the Applied text)
- Appendix A.1, A.3, A.4, A.5
- Appendix C.1, C.2, C.3, C.4
Michael Corral. Vector Calculus.
- Chapter 1: Vectors in Euclidean Space (1.1 to 1.6, 1.8)
- Chapter 2: Functions of Several Variables (2.1 to 2.5)
- Chapter 3: Double Integrals (3.1, 3.3, 3.4, 3.7)
Marsden and Tromba. Vector Calculus
- Chapter 1: Geometry of Euclidean Space (1.1, 1.2, 1.3, 1.5)
- Chapter 2: Differentiation (2.1, 2.2, 2.3, 2.5, 2.6)
- Chapter 3: Higher Order Derivatives (3.1, 3.3)
- Chapter 4: Vector Valued Functions (4.1)
- Chapter 5: Double and Triple Integrals (5.1, 5.2, 5.5)
Bishop. Pattern Recognition and Machine Learning (probability from an ML perspective)
- Chapter 1: Introduction (1.2)
- Chapter 2: Probability Distributions (2.1, 2.2, 2.3, 2.4)
Murphy. Machine Learning: A Probabilistic Perspective (more probability from an ML perspective)
- Chapter 2: Probability
Scipy Lecture Notes (Python background)