COMPSCI 514: Algorithms for Data Science

COMPSCI 514: Algorithms for Data Science (Fall 2019)

Time: Tue/Thurs 10am-11:15pm

Location: Thompson Hall, Room 104

Instructor: Cameron Musco

Email: cmusco at cs dot umass dot edu.
Office: CS 234
Office Hours: Tue 11:30am-12:30pm (right after class) in CS 234.

Teaching Assistants:

Raj Kumar Maity

Email: rajkmaity at cs dot umass dot edu
Office Hours: Fri 3-4pm in CS 207

Pratheba Selvaraju

Email: pselvaraju at cs dot umass dot edu
Office Hours: Mon 2-3pm (CS 207)

Course Description: With the advent of social networks, ubiquitous sensors, and large-scale computational science, data scientists must deal with data that is massive in size, arrives at blinding speeds, and often must be processed within interactive or quasi-interactive time frames. This course studies the mathematical foundations of big data processing, developing algorithms and learning how to analyze them. We explore methods for sampling, sketching, and distributed processing of large scale databases, graphs, and data streams for purposes of scalable statistical description, querying, pattern mining, and learning. Course was previously COMPSCI 590D. 3 credits.

Prerequisites: The undergraduate prerequisites are COMPSCI 240 (Probability) and COMPSCI 311 (Algorithms). This is a theoretical course with an emphasis on algorithm design, correctness proofs, and analysis. Aside from a general background in algorithms, a strong mathematical background, particularly in linear algebra and probability is required.

Textbooks: This is no official textbook for this class. We will use some material from:

Mining of Massive Datasets, Jure Leskovec, Anand Rajaraman and Jeff Ullman.
Foundations of Data Science, Avrim Blum, John Hopcroft and Ravi Kannan.

Related Classes: You may also find some helpful reference material in these similar classes taught at other universities:

The Modern Algorithmic Toolbox, Gregory Valiant at Stanford.
Sketching Algorithms for Big Data, Piotr Indyk and Jelani Nelson at MIT/Harvard.
Algorithmic Techniques for Big Data, Moses Charikar at Stanford.

Piazza: We will use Piazza for class discussion and questions. Sign up here. Our goal is for students to answer each others' questions on Piazza as much as the TAs and instructor do. Thus, we encourage good question answering with extra credit (see extra credit policy below).

Homework: Problem sets will be completed and submitted in groups of 3. You will choose these groups at the beginning of the semester and they will remain fixed for the rest of the course. You may talk to people in other groups about the problem sets at a high level, but may not work through the detailed solutions together, write them up together, etc.

Problem set submissions will be via Gradescope. Only one member of each group needs to submit the problem set, marking the other members in the group as part of the submission in Gradescope.
The sign up code for Gradescope is in the slides pdf for Lecture 1. Please sign up and complete the Gradescope consent poll in Piazza by 9/12.
No late submissions will be accepted unless there are extenuating circumstances, approved by the instructor before the deadline.
I strongly encourage students to type up problem sets using either Latex or Markdown. A Latex template for problem sets can be downloaded here. For editing Markdown, I use Typora, which supports Latex-style math equations (see here). While they may seem cumbersome at first, these tools will save you a lot of time in the long run!

Exams: We will have an in class midterm exam (October 17th) along with a final (December 19th, 10:30am-12:30pm).

Grading:

Problem Sets (4 total): 40%, weighted equally.
Midterm: 30%.
Final: 30%.

Extra Credit: Students may be awarded up to 5% extra credit for in class and Piazza participation (asking good clarifying questions in class and on Piazza, answering instructors questions in class, answering other students' questions on Piazza, etc.).

Disability Services: UMass Amherst is committed to making reasonable, effective, and appropriate accommodations to meet the needs to students with disabilities and help create a barrier-free campus. If you have a documented disability on file with Disability Services, you may be eligible for reasonable accommodations in this course. If your disability requires an accommodation, please notify me within the first two weeks of the course so that we may make arrangements in a timely manner.

Helpful UMass Resources:

Academic Honesty Policy