How does Netflix learn what shows a person likes? How do computers read handwritten addresses on packages, or detect faces in images? Machine learning is the practice of programming computers to learn and improve through experience, and it is becoming pervasive in technology and science. This course will cover the mathematical underpinnings, algorithms, and practices that enable a computer to learn. Topics will include supervised learning, unsupervised learning, evaluation methodology, and Bayesian probabilistic modeling. Students will learn to program in Python and apply course skills to solve real-world prediction and pattern recognition problems. Programming intensive.

Instructor | Dan Sheldon dsheldon (at) mtholyoke.edu |

Lecture | Tuesday, Thursday 10:00am–11:15am |

Location | Clapp 206 |

Piazza | https://piazza.com/mtholyoke/spring2019/cs335 |

Moodle | https://moodle.mtholyoke.edu/course/view.php?id=15996 |

Gradescope | https://www.gradescope.com/courses/37951 |

Textbook | none |

Office Hours | Tue 3:30–4:30pm, Thu 1-2pm, Clapp 200 |

- CS 211 Data Structures
- Math 232 Discrete Math
- Math 101 Calculus I (or equivalent)

The goal of these prerequisites is to ensure that you are: comfortable programming in some language; familiar with basic CS paradigms; know elementary probability and calculus; and are generally comfortable with mathematical tools and reasoning.

There is **no required textbook** for this course. Here are some useful resources:

- An Introduction to Statistical Learning by James, Witten, Hastie, and Tibshirani: accessible undergraduate ML textbook with statistics focus.
**Freely downloadable.** - Introduction to Machine Learning by Alpaydin: approachable undergraduate ML text with CS focus.
- Artificial Intelligence: A Modern Approach by Russell and Norvig: the most widely-used AI textbook. Chapters 13–15, 18, and 20 cover material related to machine learning.
- Pattern Recognition and Machine Learning by Bishop. Graduate / advanced undergraduate level ML text with a probabilistic / Bayesian focus.
- Machine Learning: a Probabilistic Perspective by Murphy. Comprehensive new ML textbook at graduate / advanced undergraduate level.
- The Elements of Statistical Learning by Hastie, Tibshirani, and Friedman. Graduate level statistical view of many machine learning topics.
**Freely downloadable.** - Coursera Machine Learning course by Andrew Ng: outstanding free online ML course.
- Course handouts from Stanford CS 229 by Andrew Ng

The books above that are not freely downloadable (Alpaydin; Russell and Norvig; Bishop; Murphy) will be on three-hour reserve at MHC library.

Programming assignments will use Python, NumPy, and SciPy. The **required Python environment is the Anaconda 2018.12 distribution of Python 3.7.** I will not grade or help debug work unless you are working in this environment. Anaconda 2018.12 will be installed on lab computers in Clapp 202 and Kendade 307. You are also encouraged to download and install it on your personal computer.

Please see this page on getting started with Python for CS 335.

Here are some **general / comprehensive** resources on Python and SciPy:

- Google’s Python class
- Norm Matloff’s Fast Lane to Python
- Introduction to Python for Computational Science and Engineering by Hans Fangohr
- The SciPy Lecture Notes
- The Python Tutorial

Here are some more **focused** Python resources oriented toward a class like ours:

The goals of the course are

- To understand the basic building blocks and general principles that allow one to design machine learning algorithms
- To become familiar with specific, widely used machine learning algorithms
- To learn methodology and tools to apply machine learning algorithms to real data and evaluate their performance

Like many ML courses, this one is organized primarily as a sequence of specific techniques (see the schedule), which comprise a small subset of the available machine learning algorithms. We will learn about details of these specific techniques and also use them to explore cross-cutting concepts:

**Mathematical tools**: probability, matrix and vector manipulation, geometry of machine learning problems, basic optimization**Machine learning principles**: problem formulations, notation, overfitting, regularization**Methodology**: evaluation, parameter tuning, model selection, diagnosing and controlling overfitting**Applications**: different applications of ML; the “messy” stuff: data preparation, feature engineering, feature normalization

The skills learned in this class will prepare the student to explore much more widely within the field of machine learning.

The coursework will consist of:

- 4–5 homework assignments
- 1–2 quizzes
- a final project

The grading breakdown is:

- Homework: 35%
- Project: 30%
- Quizzes: 25%
- Class participation: 10%

Homework will be assigned and due every 1–2 weeks during the first part of the semester. They will be a mix of written problems, programming exercises, and experiments. Later in the semester, assigned work will shift toward the final project.

All homework will be submitted electronically through moodle or Gradescope (details TBA). Written work should be submitted as a pdf — either typed or a *high-quality* scan. Specific instructions will accompany each assignment.

- Students have three free late days to be used on
**homeworks only**. - Each late day buys exactly 24 hours from the original due date (so 24.5 hours = 2 late days).
- If you use up your late days, you will be penalized 33% of the assignment’s value for each day or fraction thereof that it is late (0–24 hours = 33% penalty; 24–48 hours = 66% penalty; 48+ hours = no credit).
- An assignment is considered late until all components (written and digital) are submitted.

Collaboration on assignments is encouraged. However, **every student must write their own code, run their own experiments, and write their own solutions**. Sharing of code or written solutions will be considered a violation of the honor code. Also, I highly encourage each student to first attempt problems on their own, especially for the shorter exercises that are designed to test and reinforce concepts taught in class. Please write the names of all collaborators at the beginning of the written portion of the submission.

Students will work as individuals or in small groups on a final project. This can be either a hands-on application of machine learning algorithms learned in class to an interesting data set, or an in-depth exploration of a machine learning topic not covered in this class. Details will be announced later in the course.

- We will use Piazza for the class discussion forum, announcements, etc. [link]
- We will use Moodle for submitting HW and posting HW solutions and grades. [link]
- We will use Gradescope for submitting and returning homework and exams. [link]

Participation includes arriving on time to class, engaging meaningfully in lecture activities (e.g., peer discussions or exercises), giving a project presentation, and contributing in the way you are most comfortable to course discussions during lecture or on the class forum.

If you have a disability and would like to request accommodations, please contact AccessAbility Services, located in Wilder Hall B4, at (413) 538-2634 or accessability-services@mtholyoke.edu. If you are eligible, they will give you an accommodation letter which you should bring to me as soon as possible.

The Computer Science Department follows the Mount Holyoke College Honor Code. Work submitted for grading must be entirely your own, unless you were instructed to work in groups. The purpose of course assignments is to practice skills, gain a deeper understanding of the course material, and apply that knowledge to new situations. Assignments are designed to challenge you, stimulate critical thinking, and help you understand the concepts related to the course. Your grade is a reflection of your understanding of the material. We recognize that collaboration can help you master course material. In fact, there are certain ways in which we will encourage you to collaborate. These include: discussing course content at a high level, getting hints or debugging help, talking about problem-solving strategies, and discussing ideas together. However, you must do **all coding and write-ups on your own**. Writing code and solutions on your own will test and demonstrate your mastery of course material. **Looking at solutions from other students or any other source (including the web), or collaborating to write solutions to individual work, is considered a violation of the honor code.** All suspected violations will be referred to the academic honor board. If you are uncertain whether something is allowed, it is your responsibility to ask.

If you have engaged in any of the above acceptable collaboration activities for an assignment, you MUST acknowledge the classmates or TAs with whom you spoke – this should be done in a comment at the top of your main submission file.

The Association for Computing Machinery has a strong Code of Ethics and Professional Conduct. At this site you can read both the current Code from 1992 and the draft of the new 2018 version.

The internet is a useful resource when learning to solve computer science problems. In general, it’s OK to look at resources for a broad topic (e.g., dynamic programming), but it is not OK to look at solutions for specific problem (e.g., interval scheduling) that is the same or subtantially similar to one you are working on for the class. If you are unsure whether something is allowed, ask. **You must cite all online sources used while working on an assignment**. It is always **your responsibility** to learn if a source is allowed.

These lists are intended to clarify what types of behaviors are and are not generally permissible. Follow these guidelines unless specifically directed otherwise.

Do:

- Organize study groups.
- Clarify ambiguities or vague points in class handouts, textbooks, assignments, and labs.
- Discuss assignments at a high level to understand what is being asked for, and to discuss related concepts and the high-level approach.
- Refine high-level ideas/concepts for projects (i.e. brainstorming).
- Outline solutions to assignments with others using diagrams or pseudocode, but not actual code.
- Walk away from the computer or write-up to discuss conceptual issues if you get stuck.
- Get or give help on how to operate the computer, terminal, or course software.
- Get or give limited debugging help. Debugging includes identifying a syntax or logical error but not helping to write or rewrite code.

Don’t:

- Look at another student’s solutions.
- Use solutions to same or similar problems found online or elsewhere.
- Search for homework solutions online.
- Turn in any part of someone else’s work as your own (with or without their knowledge).
- Share your solutions or code with another student.
- Share solutions or code online.
- Allow someone else to turn in your work as their own. (Be sure to disconnect your network drive when you logout and remove any printouts promptly from printers.)
- Collaborate while writing programs or solutions (unless it is is a group assignment).
- Collaborate with anyone outside your group for a group assignment.
- Use resources during a quiz or exam beyond those explicitly allowed in the quiz/exam instructions. (If it is not listed, don’t use it. Ask if you are unsure.)
- Submit the same or similar work in more than one course. (Always ask the instructor if it is OK to reuse any part of a different project in their course.)

The instructor and students in CS 335 are expected to be respectful, inclusive of all students, and to not discriminate. Mount Holyoke resources on diversity, equity, and inclusion can be found here. Bias incidents can be reported here. Students are encouraged to bring concerns or feedback to the attention of the instructor.