University of Massachusetts Amherst
College of Information and Computer Sciences
Introduction to core machine learning models and algorithms for
classification, regression, dimensionality reduction and clustering.
The course will cover the mathematical foundations behind the most common machine
learning algorithms, and the effective use in solving real-world applications.
Requires a strong mathematical background and knowledge
of one high-level programming language such as Python.
This course will introduce core machine learning models and algorithms for
classification, regression, clustering, and dimensionality reduction.
On the theory side, the course will cover the mathematical foundations
behind the most common machine learning algorithms.
It will focus on understanding models and the relationships between them.
On the applied side, the course will focus on effectively using machine
learning methods to solve real-world problems with an emphasis on
model selection, regularization, design of experiments, and
presentation and interpretation of results.
The course will be held in a flipped-classroom manner, with students
being assigned pre-recorded videos, and the lectures being reserved for
discussions, including Q&A on the lecture topics, exercises,
connecting the lecture abstractions to real-world application,
implementation considerations and demos. The assignments will involve
both mathematical problems and implementation tasks.
Knowledge of a high-level programming language is absolutely necessary.
Python is most commonly used, but languages such as Matlab, R, Scala, Julia
would also be suitable.
Strong foundations in linear algebra, calculus, probability and statistics
are essential for the successful completion of this course.
Lectures: Monday & Wednesday 2:30-3:45pm.
Credit: 3 units
- Homeworks: 50%
- Midterm: 30%. In class. Date TBA.
- Mini-Project: 10%. Assignment based on an open challenge.
- Checkpoint Quizzes: 10%
- Extra credit: participation, in class and on piazza.
Class materials will be posted to the Moodle course.
Discussions will happen on Piazza or over Moodle.
Introduction to Machine Learning. Simple classifiers
Bishop, Section 1.2.1-1.2.4 Probability Theory. ESL Section 2.3.2. ESL Section 2.5.
- Definition of Machine Learning
- Relationship to other fields
- Course overview
- Learning problem formulation
- Regression vs classification; supervised vs unsupervised; parametric vs nonparametric models
- K-NN classifiers
- Decision trees
Probability and estimation
Bishop, 2.1 Binary Variables, 2.2 Multinomial Variables
- Random variable independence
- Bayes rule
- Maximum likelihood estimator (MLE)
- Maximum a posteriori estimator (MAP)
Advanced: Mitchell, Estimating Probabilities
Mitchell, 3.1 and 3.2, Naive Bayes
- Bayes Optimal Classifiers
- Conditional Independence
- Naive Bayes
- Learning for Naive Bayes
- Gaussian Naive Bayes
- Naive Bayes use case: the Bag of Words model
Linear Discriminant Analysis (LDA)
ESL 4.1-4.3 (p. 101-102, 106-110). Bishop 4.1.1-4.1.4 Discriminant Functions. Bishop 4.2 Probabilistic Generative Models
- Fitting linear responses
- Fitting by least squares
- Maximizing conditional likelihood
- LDA - model class conditional densities as multivariate Gaussians
Logistic Regression (LR)
ESL Section 4.4 (p. 119-120, 127-132)
- Generative vs discriminative classifiers
- Classification using the logistics function
- Gradient methods to solve LR: gradient descent, stochastic gradient descent
- MLE and MAP estimates for LR
Advanced: Mitchell, 3.3, Logistic Regression
Generalization and Evaluation
- Training error and generalization error
- Hypothesis space, model capacity
- Generalization, overfitting, underfitting, bias-variance trade-off
- Regularization, model selection, cross-validation
(Optional) Deep dive: Machine Learning Theory
Nina Balcan, Notes on generalization guarantees.
- Theoretical model of ML
- Generalization bounds
- Consistent learning
- PAC learning
- Anostic learning. Relationship to bia/variance tradeoff
- Infinite hypothesis space. VC dimension. Sauer's lemma
Support Vector Machines
ESL Section 12.3. ESL Section 12.3.6 (p. 434-438). Bishop 6.1, 6.2 (p. 291 - 299).
- Maximizing the margin
- Hinge loss vs logistic loss
- Basis expansions and kernels
- The kernel trick
ESL Chapter 16 (p. 605-622). Bishop Sections 14.3,14.4 (p. 657 - 665).
- Introduction to ensembles
- Random forests
- Boosting. Adaboost
- (Optional) Deep dive: Analysis of Adaboost.
Linear Regression, Ridge, and Lasso
ESL Sections 3.1, 3.2.1 (p. 43-51). ESL Sections 3.4.1-3.4.3 (p. 61-73).
- Regression intro
- Linear regression
- Ordinary least squares
Regression trees and smoothing
ESL 6.1 and 6.2 (p. 191-200). ESL 9.2.1, 9.2.2 (305-308).
- Regression trees
- Feature selection
- Kernel smoothing
Neural Networks and Deep Learning
ESL 11.3 Neural Networks
- The Multilayer Perceptron (MLP)
- Nonlinear Activations
- Universal Function Approximation
- Convolutional Neural Networks (CNNs) for vision
Backpropagation and Sequential Neural Networks
ESL 11.4 Fitting Neural Networks. ESL 11.5 Some Issues in Training Neural Networks.
- Training neural networks
- Learning rates and acceleration
- Recurrent neural networks (RNN)
- Long-term Short-term Memory (LSTM)
Linear Dimensionality Reduction and SVD
ESL Section 14.15.1 (p.534-536).
- Dimensionality reduction overview
- Linear dimensionality reduction
- Singular Value Decomposition (SVD)
Principal Components Analysis
Bishop 12.1 Principal Component Analysis (p.559-569).
- Eigenvalue decomposition
- Direction of maximum variance
- Principal Component Analysis (PCA)
- Connection between PCA and SVD
Sparse Coding, NMF, ICA and Kernel PCA
ESL Section 14.6 (p.553-557). ESL Section 14.7 (p.557-570).
- Sparse coding
- Nonnegative matrix factorization
- Independent Component Analysis (ICA)
- Kernel PCA
ESL 14.3.4 - 14.3.11 (k=means). ESL 8.5 (EM).
- Mixture models
- Expectation Maximization (EM)
- Exhaustive clustering
- Hierarchical clustering
- Spectral clustering
Exam exception policy: If you have any special needs/circumstances pertaining to an exam, you must talk to the instructor at least 2 weeeks before the exam.
Late homework policy: If you cannot turn in a homework on time, you will need to discuss with the instructor at least one day in advance.
Regrade policy: Any requests for regrading must be submitted within a week of receiving the grade and preferably discussed during office hours. Each TA will be responsible for a different part of the homework, as indicated when the assignment is issued, so please direct questions appropriately. Only contact the instructors after discussing the issue with the TAs.
Many of the materials created for this course are the intellectual property of the course instructors and of the professors whose courses served as a basis for some of the lectures. This includes, but is not limited to, the syllabus, lectures and course notes. Except to the extent not protected by copyright law, any use, distribution or sale of such materials requires the permission of the instructor. Please be aware that it is a violation of university policy to reproduce, for distribution or sale, class lectures or class notes, unless copyright has been explicitly waived by the faculty member.