Outline is subject to change
- What is machine learning?
- Course logistics
- An example: curve fitting
- Motivation: optimization
- Derivatives: intuition and rules
- Convex, concave functions
- Find minimum of convex function by setting derivative to zero
- Paradigm: supervised learning
- Linear regression
- Setup
- Cost function
- Minimize cost function (one parameter—slope only) by setting derivative to zero
- Gradient descent
- Geometry of functions in higher dimensions
- GD for linear regression with two parameters (slope, intercept)
- Intuition of partial derivatives
- Partial derivatives of the linear regression cost function
- Interactive session to show basic / important MATLAB features
- From Brown tutorial:
- Basics
- Comments
- Suppressing output
- Statements separated by commas, semicolons, or newlines
- help / doc
- Types, assignments, literals
- Entering vectors and matrices
- Accessing entries and submatrices
- Operations on vectors and matrices
- Elementwise operators and funcitons
- Other vector and matrix functions
- Control flow (briefly)
- Functions
- Multiple inputs / output
- Assigment of multiple outputs
- Did not cover
- Debugging
- Plotting
- Load and save
- Formatting strings: disp / sprintf / fprintf
- Advanced topics (to cover in future classes)
- logical indexing
- cell arrays
- structs
- Exercises
- Partial derivatives
- Intuition and geometry
- Problems
- 1D linear regression derivations
- Implement 1D linear regression by gradient descent
- Convergence of gradient descent
- Run 1D linear regression on own data
- Follow-up notes
- Feature normalization
- Gradient descent: simultaneous updates of all parameters
- Motivation: want to move to more complicated ML setups
- Many inputs \(x_1, \ldots, x_n\)
- More complex functions, e.g. polynomials
- Linear algebra
- Succinct language for linear expressions of many variables
- Saves coding
- Inspires new ML methods
- Matrices
- Vectors
- Matrix-Matrix multiplication (and special cases)
- Tranpose
- Inverse
- MATLAB pointers
- Concatenation of vectors / matrices
- Subscripted assignment
- First multivariate prediction models
- Geometry of linear functions in high dimensions
- “Tilted” planes through the origin
- (Affine function = linear function translated away from origin)
- Contours are parallel lines
- Gradient is vector orthogonal to contours
- Length of gradient = “slope” of plane
- Multivariate linear regression
- Motivation
- Model
- Cost function
- Normal equations
- Gradient descent
- Features
- Normalization
- Feature design
- Non-linearity by feature expansion
- Polynomial regression
- Linear algebra exercises
- Normal equations
- Features
- Feature engineering
- Feature normalization
- Stochastic gradient descent?
- Polynomial regression
- First classifier
- Widely used “workhorse” of predictive stats and ML
- Examples
- MNIST: 4 vs. 9
- Breast cancer
- Outline
- Classification
- Model
- Cost function
- Gradient descent
- Decision boundaries
- Fit a non-linear function using linear models
- What is Overfitting?
- How to Diagnose Overfitting
- Regularization
- Logistic regression
- Feature normalization
- Log loss
- Different evaluation goals
- Estimate performance of deployed system
- Model selection
- Compare algorithms
- Data splits
- Train / validation / test
- Cross-validation
- Classification performance measures
- Accuracy
- Confusion matrix
- Precision
- Recall
- F1
- Precision-recall curve
- Tools
- Grid search
- Training curve
- Precision-recall curve
- First inherently non-linear methods
- Different learning paradigm: no training phase
- Example: time-series classification
- Nearest neighbor
- k-NN
- Kernel regression
- Linear separators
- Idea of margin
- Formulate max-margin optimization problem
- Guest lecture by Kevin Winner, UMass Ph.D. student
- Learn about Weka
- Comprehensive ML toolkit
- Easy to use
- Java based: both interactive GUI and programmatic API
- What if data is not linearly separable?
- Penalty in optimization problem
- “Hinge-loss”
- Comparison with logistic regression
- SVM dual training and prediction
- Dot-products
- Kernel trick: redefine dot-product
- Feature expansions
- Example kernels
- SVM with RBF kernel visualization
*
- Fourth Hour
- Probability spaces and events
- Conditional probability
- Random variables
- Expected value
- Bayes rule
- Generative model
- Naive Bayes
- Text classification
*
- This may be too hard… there is no way to get thre
- Stochastic gradient descent