Machine Learning, Fall, 2016


Course Number: COMPSCI 589
Time: MW / 2:30-3:45 PM
Room: Engineering Laboratory Room 323
Instructor: Justin Domke
Staff Email: Please usa Piazza
Course Website: Detailed materials for the course will be hosted on Moodle. Syllabus (this page) is at http://people.cs.umass.edu/domke/courses/cs589/
Instructor Office Hours: Monday 3:45pm-4:45pm, CS 208
TA Hours: Tuesday 10am-12pm, LGRT 220

Course Description: This course will introduce core machine learning models and algorithms for classification, regression, clustering, and dimensionality reduction. On the theory side, the course will focus on understanding models and the relationships between them. On the applied side, the course will focus on effectively using machine learning methods to solve real-world problems with an emphasis on model selection, regularization, design of experiments, and presentation and interpretation of results.

Textbooks: The course will use several open textbooks, including:

Course Miderm: In Class
Course Final: Friday, December 16th, 3:30 - 5:30pm (In normal classroom)

Grading Scheme:
Homework20%
Midterm15%
Final30%
Project Proposal5%
Project Report30%

Preliminary Schedule

  1. Introduction and Overview

    (Unit 1: Classification)

  2. K-Nearest Neighbors and Decision Trees
  3. Naive Bayes, LDA, and Logistic Regression
  4. Overfitting, Regularization, and Cross-Validation
  5. Support Vector Machines Basis Expansion, and Kernels
  6. Neural Networks and Deep Learning
  7. Ensembles and Classification

    (Unit 2: Regression)

  8. Linear Regression, Ridge, and Lasso
  9. KNN Regression, Regression Trees, and Feature Selection
  10. Support Vector and Neural Network Regression
  11. KOLS and Gaussian Process Regression

    (Midterm)

  12. Midterm review
  13. Midterm

    (Unit 3: Probabilistic Methods)

  14. Bayesian Methods 1
  15. Bayesian Methods 2
  16. Markov Chain Monte Carlo
  17. Generative and Discriminative Methods

    (Unit 4: Clustering)

  18. Hierarchical Clustering
  19. K-Means
  20. Mixture Models

    (Unit 5: Dimensionality Reduction)

  21. Linear Dimensionality Reduction and SVD
  22. Principal Components Analysis
  23. Sparse Coding, Non-Negative Matrix Factorization and Independent Components Analysis
  24. Kernel Principal Components Analysis and Spectral Clustering
  25. Multidimensional Scaling and Isomap

    (Final)

  26. Final Review

What is the difference between CMPSCI 589 and CMPSCI 689?: 589 has been designed to focus on understanding and applying core machine learning models and algorithms, while 689 focuses on the mathematical foundations of machine learning. While both courses require a background in multivariate calculus, linear algebra, and probability; 689 is more theoretically focused and will use more of this background material than 589. In particular, 589 will not focus on deriving learning or optimization algorithms.

Should I take CMPSCI 589 or CMPSCI 689?: 589 is appropriate as an introductory machine learning course for senior undergraduate students, masters students, and MS/PhD students interested in applying machine learning in their research. Note that 589 can count for credit for MS/PhD students, but it does not satisfy an AI core requirement. Graduate students who intend to pursue research in machine learning or who need a course to satisfy the AI core requirement should take 689. Note also that students can take 589 followed by 689, but may not take the courses in the reverse order.

Required Background:While this course has an applied focus, it still requires appropriate mathematical background in probability and statistics, calculus and linear algebra. The official prerequisites for undergrads are CMPSCI 383 and MATH 235 (CMPSCI 240 provides sufficient background in probability and Math 131/132 provide sufficient background in calculus). Graduate students can check the descriptions for these courses to verify that they have sufficient mathematical background for 589. The course will also use Python as a programming language including the numpy, scipy, and scikit-learn. Some familiarity with Python will be helpful, but senior CS students should be able to learn Python during the course if needed. Graduate students from outside computer science with sufficient background are also welcome to take the course. The following references can provide a useful reviw:

Course Policies

  • Homework Submission: Homework assignments will generally consist of developing machine learning systems in Python, evaluating the systems, and producing written reports. Both the code and report must be submitted through Moodle by the due date for a submission to be considered on time.
  • Late Homework: To allow some flexibility to complete assignments given other constraints, you have a total of five free late days. You will be charged one late day for handing in an assignment within 24 hours after it is due, two late days for handing in an assignment within 48 hours after it is due, etc. Your assignment is considered late if either the written or code portions are submitted late. The late homework clock stops when both the written and code portions are submitted. After you have used up your late days, late homework will not count for credit except in special circumstances (ie: illness documented by a doctors note). If you do not hand in an assignment at all, this will count as using all five late days.
  • Homework Collaboration: You are encouraged to discuss assignments and course material with other students in person or on the course forums. However, you must show that you fully understand the solution to any homework problem arising from such collaboration by writing your own code, running your own experiments, and producing your own write-up for the problem.
  • Academic Honesty Policy: You are required to list the names of anyone you discuss problems with on the first page of your solutions. Copying any solution materials from external sources (books, web pages, etc.) or other students is considered cheating. To emphasize: no detectable copying is acceptable, even, e.g., copying a single sentence from an outside source. Sharing your code or solutions with other students is also considered cheating. Any detected cheating will result in a grade of -100% on the assignment for all students involved (negative credit), and potentially a grade of F in the course.
  • Re-grading Policy: Errors in grading of assignments and exams can occur despite the best efforts of the course staff. If you believe you've found a grading error, complete the online re- grade request form. Re-grade requests must be submitted no later than one week after the assignment is returned. Note that re-grading may result in your original grade increasing or decreasing as appropriate.