COMPSCI 514: Algorithms for Data Science (Spring 2026)
Time: Tuesday/Thursday 1:00pm-2:15pm
Location: Computer Science Labs, E110. Lectures will be recorded and posted along with annotated slides under the
Schedule tab.
Professor:
Cameron Musco
- Email: cmusco at cs dot umass dot edu.
- Office: CS Building (CSB) 234
- Office Hours: Tuesday 2:30pm-3:30pm (directly after class) in CSB 234.
- How to Contact: If you need to chat or schedule an individual meeting, you can reach out over email or in person, after class or during office hours.
Teaching Assistants:
- Shruti Chanumolu
- Email: schanumolu at umass dot edu
- Office Hours: Thursday 9am-10am, in CSL E215. Friday 9am-10am, over Zoom.
- Shiv Shankar
- Email: sshankar at umass dot edu
- Office Hours: Monday 1pm-2pm, hybrid in CSL E211 and over Zoom. Wednesday 1pm-2pm, hybrid in CSL E211 and over Zoom.
Course Description:
With the advent of social networks, ubiquitous sensors, and large-scale machine learning and computational science, data scientists must deal with data that is massive in size, arrives at blinding speeds, and often must be processed within interactive or quasi-interactive time frames. This course studies the mathematical foundations of big data processing, developing algorithms and learning how to analyze them. We explore methods for sampling, sketching, and distributed processing of large scale databases, graphs, and data streams for purposes of scalable statistical description, querying, pattern mining, and learning. 3 credits.
Prerequisites:
The undergraduate prerequisites are COMPSCI 240 or STAT 515 (Probability) and COMPSCI 311 (Algorithms). This is a theoretical course with an emphasis on algorithm design, correctness proofs, and analysis. Aside from a general background in algorithms, a strong mathematical background, particularly in linear algebra and probability is required. If you are a masters student with a limited background in either of these subjects, please email me at the start of the semester to discuss your preparation.
Textbooks: This is no official textbook for this class. We will use some material from:
Readings from these books and other sources will be posted before class under the
Schedule tab.
Related Classes: You may also find some helpful reference material in these similar classes taught at other universities:
Piazza: We will use Piazza for class discussion, questions, and annoucements. Sign up
here. We hope for Piazza
to be a key interactive component of the class. Thus, we encourage posting and good answering of other students' questions as part of up to 5% extra credit for class participation (see below).
Grade Components:
- Midterm 1 + Midterm 2: 40% total. 25% for the one you score highest on and 15% for the one you score lowest on.
- Cumulative Final: 25%.
- Problem Sets (4 total): 25%, weighted equally.
- Weekly Quizzes: 10%, weighted equally, lowest score dropped.
Grade Scale: The course is graded on a standard scale. I will typically shift this scale down to account for any difficult exams/problems sets. I will never shift it up. I.e., if you obtain a 90% in the course, you will definitely achieve an A-, and potentially an A. If applicable, I will publish the shifted scale at the conclusion of the course. The standard grade scale is: A (100-93), A- (92-90), B+ (89-87), B (86-83), B- (82-80), C+ (79-77), C (76-73), C- (72-70), D+ (69-67), D (66-63), D- (62-60), F (below 60).
Problem Sets: Problem sets are an opportunity to get practice with the course material, build confidence working through problems on your own, and prepare for the exams (some exam problems will closely resemble problem set problems). They are designed to be a learning tool, not a means for assessment.
- For this reason, the problem sets will be graded lightly, according to the following rubric:
- ✓+: (2 points) Submitted work demonstrates a full understanding of the problem. There may be some errors, omissions, or unclear steps, but overall, a reader would be able to understand how to solve the problem by looking at the submitted work.
- ✓-: (1 point) Submitted work demonstrates partial understanding of the concepts, but contains significant omissions or errors.
- X: (0 points) Submitted work doesn't not provide enough information to determine whether there is understanding of the problem.
- Problem sets can be completed in groups of up to three students. If you work in a group, you submit a single problem set together.
- We strongly discourage the use of LLMs on problem sets. LLMs can be very helpful in giving explainations of course concepts, generating example problems, helping fix Latex bugs, etc. But if you rely on them to solve the problem sets, you won't be getting the practice you need to internalize the course material and perform well on the exams.
- Problem set submissions will be via Gradescope. If working in a group, only one member of each group should submit the problem set, marking the other members in the group as part of the submission in Gradescope.
- The entry code for Gradescope is
3DRE6R.
- No late homework submissions will be accepted unless there are extenuating circumstances, approved by the instructor before the deadline.
- I strongly encourage students to type up problem sets using Latex. Many students chose to use Overleaf, which is a good online Latex editor that allows collaborative editing. A Latex template for problem sets can be downloaded here. While it may seem cumbersome at first, getting proficient in Latex will save you a lot of time in the long run!
Weekly Quizzes: A quiz will be posted on
Canvas each Thursday after class, due the following Monday at 8pm. These are short quizzes (designed to take ~15 minutes) to check that you are following the material and help me make adjustments if needed. Quizzes will include check-in questions asking for feedback on class pacing and on topics that need clarification, or that you would like to see discussed more. While we will not allow any excused misses of quizzes,
the lowest quiz grade will be dropped, so that each student can miss one quiz during the course of the semeseter without it affecting their grade.
Exams: Midterm 1 will be held in class on Thursday 3/12. Midterm 2 will be held in class on Thursday 4/23. The cumulative Final Exam will be held during exam week, Tuesday 5/12 from 1-3pm in the regular lecture hall. All exams are closed notes. We will post extensive review material, past exams, and other practice questions to help you prepare. There is no option to take the exams remotely. Any makeup exams needed due to illness or other excused absences will be held in person.
Class Participation: Up to 5% extra credit may be awarded for class participation. This may come in many forms, e.g.:
- Asking good clarfiying questions and answering questions during lecture.
- Asking good clarfiying questions and answering other students' or instructor questions on Piazza.
- Posting helpful links on Piazza, e.g., resources that cover class material, research articles related to the topics covered in class, etc.
Course Academic Honestly Policy: If caught violating the problem set or quiz rules, students will receive a 0% on the assignment for the first violation, and fail the class for a second violation. Any cheating on a midterm or final will lead to failing the class. For fairness, we apply these rules universally, without exceptions.
UMass Academic Integrity Statement: UMass Amherst is strongly committed to academic integrity, which is defined as completing all academic work without cheating, lying, stealing, or receiving unauthorized assistance from any other person, or using any source of information not appropriately authorized or attributed. As a community, we hold each other accountable and support each other’s knowledge and understanding of academic integrity. Academic dishonesty is prohibited in all programs of the University and includes but is not limited to: Cheating, fabrication, plagiarism, lying, and facilitating dishonesty, via analogue and digital means. Sanctions may be imposed on any student who has committed or participated in an academic integrity infraction. Any person who has reason to believe that a student has committed an academic integrity infraction should bring such information to the attention of the appropriate course instructor as soon as possible. All students at the University of Massachusetts Amherst have read and acknowledged the Commitment to Academic Integrity and are knowingly responsible for completing all work with integrity and in accordance with the policy:
Academic Integrity Policy.
Disability Accommodations: The University of Massachusetts Amherst is committed to making reasonable, effective, and appropriate accommodations to meet the needs of students with disabilities and help create a barrier-free campus. If you have a disability and require accommodations, please register with
Disability Services, meet with an Access Coordinator in Disability Services, and send your accommodation letter to your faculty. Information on services and registration is available on the Disability Services website.
I understand that people have different learning needs, home situations, etc. If something isn’t working for you in the class, please reach out and let’s try to work it out.
Title IX Statement: In accordance with Title IX of the Education Amendments of 1972 that prohibits gender-based discrimination in educational settings that receive federal funds, the University of Massachusetts Amherst is committed to providing a safe learning environment for all students, free from all forms of discrimination, including sexual assault, sexual harassment, domestic violence, dating violence, stalking, and retaliation. This includes interactions in person or online through digital platforms and social media. Title IX also protects against discrimination on the basis of pregnancy, childbirth, false pregnancy, miscarriage, abortion, or related conditions, including recovery. There are resources here on campus to support you. A summary of the available Title IX resources (confidential and non-confidential) can be found at the following
link. You do not need to make a formal report to access them. If you need immediate support, you are not alone. Free and confidential support is available 24 hours a day / 7 days a week / 365 days a year at the SASA Hotline 413-545-0800.
Learning Objectives:
- Students will learn about modern tools for data processing, including random sampling and hashing, low-memory streaming algorithms, linear and non-linear dimensionality reduction, spectral graph theory, and continuous optimization. A major goal is to be familiar at a high level with a breadth of algorithmic tools beyond combinatorial algorithms, which are the main focus of most undergraduate algorithms courses.
- Through problem sets, students will develop the ability to apply and modify these algorithmic tools to tackle new problems, beyond those discussed in class. They will strengthen their ability to think creatively about algorithmic problems and push beyond known approaches, to develop solutions of their own.
- Through assessments that emphasize formal proofs, students will strengthen their ability to formulate problems mathematically and analyze them rigorously.
- Through algorithmic problems, students will practice applying fundamental tools in probability theory and linear algebra, which are broadly applicable in data science and machine learning. These include concentration bounds and methods for decomposing complex random variables, eigendecomposition, orthogonal projection, important matrix identities, and fundamentals of high-dimensional geometry and random matrix theory.