Outline is subject to change

Intro

• What is machine learning?
• Course logistics
• An example: curve fitting

Calculus review

• Motivation: optimization
• Derivatives: intuition and rules
• Convex, concave functions
• Find minimum of convex function by setting derivative to zero

Linear regression in one variable

• Linear regression
• Setup
• Cost function
• Minimize cost function (one parameter—slope only) by setting derivative to zero
• Geometry of functions in higher dimensions
• Contour plots
• GD for linear regression with two parameters (slope, intercept)
• Intuition of partial derivatives
• Partial derivatives of the linear regression cost function

Intro to MATLAB

• Interactive session to show basic / important MATLAB features
• From Brown tutorial:
1. Basics
• Suppressing output
• Statements separated by commas, semicolons, or newlines
• help / doc
2. Types, assignments, literals
• Entering vectors and matrices
• Accessing entries and submatrices
3. Operations on vectors and matrices
• Elementwise operators and funcitons
• Other vector and matrix functions
4. Control flow (briefly)
5. Functions
• Multiple inputs / output
• Assigment of multiple outputs
• Did not cover
1. Debugging
2. Plotting
4. Formatting strings: disp / sprintf / fprintf
• Advanced topics (to cover in future classes)
• logical indexing
• cell arrays
• structs

Homework 1

• Exercises
• Partial derivatives
• Intuition and geometry
• Problems
• 1D linear regression derivations
• Implement 1D linear regression by gradient descent
• Run 1D linear regression on own data
• Follow-up notes
• Feature normalization

Linear algebra review

• Motivation: want to move to more complicated ML setups
• Many inputs $$x_1, \ldots, x_n$$
• More complex functions, e.g. polynomials
• Linear algebra
• Succinct language for linear expressions of many variables
• Saves coding
• Inspires new ML methods
• Matrices
• Vectors
• Matrix-Matrix multiplication (and special cases)
• Tranpose
• Inverse

Multivariate linear regression

• MATLAB pointers
• Concatenation of vectors / matrices
• Subscripted assignment
• First multivariate prediction models
• Geometry of linear functions in high dimensions
• “Tilted” planes through the origin
• (Affine function = linear function translated away from origin)
• Contours are parallel lines
• Gradient is vector orthogonal to contours
• Length of gradient = “slope” of plane
• Multivariate linear regression
• Motivation
• Model
• Cost function
• Normal equations
• Features
• Normalization
• Feature design
• Non-linearity by feature expansion
• Polynomial regression

Homework 2

• Linear algebra exercises
• Normal equations
• Features
• Feature engineering
• Feature normalization
• Polynomial regression

Logistic regression

• First classifier
• Widely used “workhorse” of predictive stats and ML
• Examples
• MNIST: 4 vs. 9
• Breast cancer
• Outline
• Classification
• Model
• Cost function
• Decision boundaries

Nonlinearity, overfitting, regularization

• Fit a non-linear function using linear models
• What is Overfitting?
• How to Diagnose Overfitting
• Regularization

Homework 3

• Logistic regression
• Feature normalization
• Log loss

• One-vs-all
• One-vs-many

Evaluation methodology

• Different evaluation goals
• Estimate performance of deployed system
• Model selection
• Compare algorithms
• Data splits
• Train / validation / test
• Cross-validation
• Classification performance measures
• Accuracy
• Confusion matrix
• Precision
• Recall
• F1
• Precision-recall curve
• Tools
• Grid search
• Training curve
• Precision-recall curve

Instance-based classification

• First inherently non-linear methods
• Different learning paradigm: no training phase
• Example: time-series classification
• Nearest neighbor
• k-NN
• Kernel regression

Support vector machines

• Linear separators
• Idea of margin
• Formulate max-margin optimization problem

Weka tutorial

• Guest lecture by Kevin Winner, UMass Ph.D. student
• Comprehensive ML toolkit
• Easy to use
• Java based: both interactive GUI and programmatic API

Soft-margin SVMs

• What if data is not linearly separable?
• Penalty in optimization problem
• “Hinge-loss”
• Comparison with logistic regression

Kernel SVMs

• SVM dual training and prediction
• Dot-products
• Kernel trick: redefine dot-product
• Feature expansions
• Example kernels
• SVM with RBF kernel visualization

• TBD

*

Probability review

• Fourth Hour
• Probability spaces and events
• Conditional probability
• Random variables
• Expected value

Bayes / Naive Bayes

• Bayes rule
• Generative model
• Naive Bayes
• Text classification

*

Gaussian mixture models

• This may be too hard… there is no way to get thre