- Intro
- Calculus review
- Linear regression in one variable
- Intro to MATLAB
- Homework 1
- Linear algebra review
- Multivariate linear regression
- Homework 2
- Logistic regression
- Nonlinearity, overfitting, regularization
- Homework 3
- Multiclass classifcation
- Evaluation methodology
- Instance-based classification
- Decision trees
- Support vector machines
- Weka tutorial
- Soft-margin SVMs
- Kernel SVMs
- Designing and debugging ML systems
- Clustering: K-means
- Dimensionality reduction: principal component analysis
- Probability review
- Bayes / Naive Bayes
- Hidden Markov models
- Bayes nets
- K-means
- Gaussian mixture models
- EM algorithm
- Odds and ends

*Outline is subject to change*

- What is machine learning?
- Course logistics
- An example: curve fitting

- Motivation: optimization
- Derivatives: intuition and rules
- Convex, concave functions
- Find minimum of convex function by setting derivative to zero

- Paradigm: supervised learning
- Linear regression
- Setup
- Cost function
- Minimize cost function (one parameter—slope only) by setting derivative to zero

- Gradient descent
- Geometry of functions in higher dimensions
- Contour plots

- GD for linear regression with two parameters (slope, intercept)
- Intuition of partial derivatives
- Partial derivatives of the linear regression cost function

- Interactive session to show basic / important MATLAB features
- From Brown tutorial:
- Basics
- Comments
- Suppressing output
- Statements separated by commas, semicolons, or newlines
- help / doc

- Types, assignments, literals
- Entering vectors and matrices
- Accessing entries and submatrices

- Operations on vectors and matrices
- Elementwise operators and funcitons
- Other vector and matrix functions

- Control flow (briefly)
- Functions
- Multiple inputs / output
- Assigment of multiple outputs

- Basics
- Did not cover
- Debugging
- Plotting
- Load and save
- Formatting strings: disp / sprintf / fprintf

- Advanced topics (to cover in future classes)
- logical indexing
- cell arrays
- structs

- Exercises
- Partial derivatives
- Intuition and geometry

- Problems
- 1D linear regression derivations
- Implement 1D linear regression by gradient descent
- Convergence of gradient descent
- Run 1D linear regression on own data

- Follow-up notes
- Feature normalization
- Gradient descent:
*simultaneous*updates of all parameters

- Motivation: want to move to more complicated ML setups
- Many inputs \(x_1, \ldots, x_n\)
- More complex functions, e.g. polynomials

- Linear algebra
- Succinct language for linear expressions of many variables
- Saves coding
- Inspires new ML methods

- Matrices
- Vectors
- Matrix-Matrix multiplication (and special cases)
- Tranpose
- Inverse

- MATLAB pointers
- Concatenation of vectors / matrices
- Subscripted assignment

- First multivariate prediction models
- Geometry of
*linear functions*in high dimensions- “Tilted” planes through the origin
- (Affine function = linear function translated away from origin)

- Contours are parallel lines
- Gradient is vector orthogonal to contours
- Length of gradient = “slope” of plane

- “Tilted” planes through the origin
- Multivariate linear regression
- Motivation
- Model
- Cost function
- Normal equations
- Gradient descent

- Features
- Normalization
- Feature design
- Non-linearity by feature expansion
- Polynomial regression

- Linear algebra exercises
- Normal equations
- Features
- Feature engineering
- Feature normalization

- Stochastic gradient descent?
- Polynomial regression

- First classifier
- Widely used “workhorse” of predictive stats and ML
- Examples
- MNIST: 4 vs. 9
- Breast cancer

- Outline
- Classification
- Model
- Cost function
- Gradient descent
- Decision boundaries

- Fit a non-linear function using linear models
- What is Overfitting?
- How to Diagnose Overfitting
- Regularization

- Logistic regression
- Feature normalization
- Log loss

- One-vs-all
- One-vs-many

- Different evaluation goals
- Estimate performance of deployed system
- Model selection
- Compare algorithms

- Data splits
- Train / validation / test
- Cross-validation

- Classification performance measures
- Accuracy
- Confusion matrix
- Precision
- Recall
- F1
- Precision-recall curve

- Tools
- Grid search
- Training curve
- Precision-recall curve

- First inherently non-linear methods
- Different learning paradigm: no training phase
- Example: time-series classification
- Nearest neighbor
- k-NN
- Kernel regression

- Linear separators
- Idea of margin
- Formulate max-margin optimization problem

- Guest lecture by Kevin Winner, UMass Ph.D. student
- Learn about Weka
- Comprehensive ML toolkit
- Easy to use
- Java based: both interactive GUI and programmatic API

- What if data is not linearly separable?
- Penalty in optimization problem
- “Hinge-loss”
- Comparison with logistic regression

- SVM dual training and prediction
- Dot-products
- Kernel trick: redefine dot-product
- Feature expansions
- Example kernels
- SVM with RBF kernel visualization

- TBD

*

- Fourth Hour
- Probability spaces and events
- Conditional probability
- Random variables
- Expected value

- Bayes rule
- Generative model
- Naive Bayes
- Text classification

*

- This may be too hard… there is no way to get thre

- Stochastic gradient descent