COMPSCI 514: Algorithms for Data Science (Fall 2022)
Time: Tuesday/Thursday 1:00pm-2:15pm
Location: Morrill Science Center. Room 131. All lectures will be recorded and posted along with annotated slides under the
Schedule tab.
Professor:
Cameron Musco
- Email: cmusco at cs dot umass dot edu.
- Office: CS 234
- Office Hours: Tuesday 2:30pm-3:30pm (directly after class) in CS 234.
- How to Contact: If you need to chat or schedule an individual meeting, you can reach out over email, via a Piazza message, or in person, after class or during office hours.
Teaching Assistants:
- An La
- Email: anla at umass dot edu
- Office Hours: Wednesday 12:30pm-1:30pm in CS207 Cube 1. Friday 4:30pm-5:30pm, over Zoom.
- Mohit Yadav
- Email: ymohit at cs dot umass dot edu
- Office Hours: Monday 9:30am-10:30am, Thursday 9:30am-10:30am. Over Zoom.
- Forsad Al Hossein
- Email: falhossain at cs dot umass dot edu
- Office Hours: Monday 4:00pm-5:00pm, in CS207. Tuesday 9:00am-10:00am, over Zoom.
Online Section:
Andrew McGregor is teaching an
online section of 514 that will closely parallel this section. You may attend his virtual lectures Tuesday/Thursday 11:30am-12:45pm if it is helpful. See Moodle for the Zoom link.
Course Description:
With the advent of social networks, ubiquitous sensors, and large-scale computational science, data scientists must deal with data that is massive in size, arrives at blinding speeds, and often must be processed within interactive or quasi-interactive time frames. This course studies the mathematical foundations of big data processing, developing algorithms and learning how to analyze them. We explore methods for sampling, sketching, and distributed processing of large scale databases, graphs, and data streams for purposes of scalable statistical description, querying, pattern mining, and learning. 3 credits.
Prerequisites:
The undergraduate prerequisites are COMPSCI 240 or STAT 515 (Probability) and COMPSCI 311 (Algorithms). This is a theoretical course with an emphasis on algorithm design, correctness proofs, and analysis. Aside from a general background in algorithms, a strong mathematical background, particularly in linear algebra and probability is required. If you are a masters student with a limited background in either of these subjects, please email me at the start of the semester to discuss your preparation.
Textbooks: This is no official textbook for this class. We will use some material from:
Readings from these books and other sources will be posted before class under the
Schedule tab.
Related Classes: You may also find some helpful reference material in these similar classes taught at other universities:
Piazza: We will use Piazza for class discussion, questions, and annoucements. Sign up
here. We hope for Piazza
to be a key interactive component of the class. Thus, we encourage posting and good answering of other students' questions as part of up to 5% extra credit for class participation (see below).
Grading:
- Problem Sets (5 total): 40%, weighted equally.
- Weekly Quizzes: 10%, weighted equally, lowest score dropped.
- Midterm: 25%.
- Final: 25%.
Problem Sets: Problem sets can be completed in groups of up to three students. If you work in a group, you submit a single problem set together. You may talk to people not in your group about the problem sets at a high level, but may not work through the detailed solutions together, write them up together, etc. We very strongly encourage you to work in a three person group, as it will give an advantage in doing the problem sets. At the beginning of the semester we will make a Piazza post where you can look for teammates.
- Problem set submissions will be via Gradescope. If working in a group, only one member of each group should submit the problem set, marking the other members in the group as part of the submission in Gradescope.
- The entry code for Gradescope is
2KBPNG
.
- No late homework submissions will be accepted unless there are extenuating circumstances, approved by the instructor before the deadline.
- I strongly encourage students to type up problem sets using either Latex or Markdown. A Latex template for problem sets can be downloaded here. For editing Markdown, I use Typora, which supports Latex-style math equations (see here). While they may seem cumbersome at first, these tools will save you a lot of time in the long run!
Weekly Quizzes: A quiz will be posted in
Moodle each Thursday after class, due the following Monday at 8pm. These are short quizzes (designed to take ~15 minutes) to check that you are following the material and help me make adjustments if needed. Quizzes will include check-in questions asking for feedback on class pacing and on topics that need clarification, or that you would like to see discussed more.
The lowest quiz grade will be dropped.
Exams: The midterm will be held in class on Thursday October 20th, and the final will held during final exams week. Both will be closed notes. We will be posting extensive review material and practice questions to help you prepare.
Class Participation: Up to 5% extra credit may be awarded for class participation. This may come in many forms, e.g.:
- Asking good clarfiying questions and answering questions during lecture.
- Actively participating in office hours.
- Asking good clarfiying questions and answering other students' or instructor questions on Piazza.
- Posting helpful links on Piazza, e.g., resources that cover class material, research articles related to the topics covered in class, etc.
Course Academic Honestly Policy: If caught violating the problem set or quiz rules, students will receive a 0% on the assignment for the first violation, and fail the class for a second violation. Any cheating on the midterm or final will lead to failing the class. For fairness, we apply these rules universally, without exceptions.
UMass Academic Honesty Statement: Since the integrity of the academic enterprise of any institution of higher education requires honesty in scholarship and research, academic honesty is required of all students at the University of Massachusetts Amherst. Academic dishonesty is prohibited in all programs of the University. Academic dishonesty includes but is not limited to: cheating, fabrication, plagiarism, and facilitating dishonesty. Appropriate sanctions may be imposed on any student who has committed an act of academic dishonesty. Instructors should take reasonable steps to address academic misconduct. Any person who has reason to believe that a student has committed academic dishonesty should bring such information to the attention of the appropriate course instructor as soon as possible. Instances of academic dishonesty not related to a specific course should be brought to the attention of the appropriate department Head or Chair. Since students are expected to be familiar with this policy and the commonly accepted standards of academic integrity, ignorance of such standards is not normally sufficient evidence of lack of intent.
Disability Accommodations: The University of Massachusetts Amherst is committed to providing an equal educational opportunity for all students. If you have a documented physical, psychological, or learning disability on file with
Disability Services (DS), you may be eligible for reasonable academic accommodations to help you succeed in this course. If you have a documented disability that requires an accommodation, please notify me within the first two weeks of the semester so that we may make appropriate arrangements.
I understand that people have different learning needs, home situations, etc. If something isn’t working for you in the class, please reach out and let’s try to work it out.
Learning Objectives:
- Students will learn about modern tools for data processing, including random sampling and hashing, low-memory streaming algorithms, linear and non-linear dimensionality reduction, spectral graph theory, and continuous optimization. A major goal is to be familiar at a high level with a breadth of algorithmic tools beyond combinatorial algorithms, which are the main focus of most undergraduate algorithms courses.
- Through problem sets, students will develop the ability to apply and modify these algorithmic tools to tackle new problems, beyond those discussed in class. They will strengthen their ability to think creatively about algorithmic problems and push beyond known approaches, to develop solutions of their own.
- Through assessments that emphasize formal proofs, students will strengthen their ability to formulate problems mathematically and analyze them rigorously.
- Through algorithmic problems, students will practice applying fundamental tools in probability theory and linear algebra, which are broadly applicable in data science and machine learning. These include concentration bounds and methods for decomposing complex random variables, eigendecomposition, orthogonal projection, important matrix identities, and fundamentals of high-dimensional geometry and random matrix theory.