COMPSCI 614: Randomized Algorithms with Applications to Data Science (Spring 2024)

Time: Tue Thu 10:00am-11:15am
Location: LGRC A203. All lectures will be recorded and posted under the Schedule.
Professor: Cameron Musco
Teaching Assistant: Weronika Nguyen
Course Description: Randomness has proven itself to be a useful resource for developing provably efficient algorithms and protocols for large scale data processing. As a result, the study of randomized algorithms has become a major research topic in recent years. This course will explore a collection of techniques for effectively using randomization and for analyzing randomized algorithms, as well as examples from a variety of settings and problem areas. The course is a natural follow on to both COMPSCI 514: Algorithms for Data Science and COMPSCI 611: Advanced Algorithms. 3 credits.
Prerequisites: This is a theoretical course with an emphasis on algorithm design, correctness proofs, and analysis. A strong background in algorithms and mathematics, particularly in linear algebra and probability is required. I would not recommend taking this course if you are an undergraduate or Masters student unless you have previously taking either 514 or 611 and done well. If you are unsure about your preparation, please reach out to me and we can discuss.
Textbooks: There is no official textbook for this class. I will post reading material related to the lecture content from a variety of online sources. Two optional textbooks that you might find useful (and can find available online) are: Readings from these books and other sources will be posted before class under the Schedule tab.
Piazza: We will use Piazza for class discussion, questions, and announcements. Sign up here. We hope for Piazza to be a key interactive component of the class. Thus, we encourage posting and good answering of other students' questions as part of up to 5% extra credit for class participation (see below).
  • Problem Sets (5 total): 40%, weighted equally.
  • Weekly Quizzes: 10%, weighted equally, lowest score dropped.
  • Midterm: 20%.
  • Final OR Final Project: 30%.
Grade Scale: The course is graded on a standard scale. I will typically shift this scale down to account for any difficult exams/problems sets. I will never shift it up. I.e., if you obtain a 90% in the course, you will definitely achieve an A-, and potentially an A. If applicable, I will publish the shifted scale at the conclusion of the course. The standard grade scale is: A (100-93), A- (92-90), B+ (89-87), B (86-83), B- (82-80), C+ (79-77), C (76-73), C- (72-70), D+ (69-67), D (66-63), D- (62-60), F (below 60).
Problem Sets: Problem sets can be completed in groups of up to three students. If you work in a group, you submit a single problem set together. You may talk to people not in your group about the problem sets at a high level, but may not work through the detailed solutions together, write them up together, etc. We very strongly encourage you to work in a three person group, as it will give an advantage in doing the problem sets. At the beginning of the semester we will make a Piazza post where you can look for teammates.
  • Problem set submissions will be via Gradescope. If working in a group, only one member of each group should submit the problem set, marking the other members in the group as part of the submission in Gradescope.
  • The entry code for Gradescope is 5JBN2D.
  • No late homework submissions will be accepted unless there are extenuating circumstances, approved by the instructor before the deadline.
  • I strongly encourage students to type up problem sets using Latex. A Latex template for problem sets can be downloaded here.
Weekly Quizzes: A quiz will be posted in Moodle each Thursday after class, due the following Monday at 8pm. These are short quizzes (designed to take ~15 minutes) to check that you are following the material and help me make adjustments if needed. Quizzes will include check-in questions asking for feedback on class pacing and on topics that need clarification, or that you would like to see discussed more. The lowest quiz grade will be dropped.
Exams: The midterm will be held in class, and the final will held during final exams week. Both will be closed notes. We will be posting extensive review material and practice questions to help you prepare. You can either take the final OR complete the final project -- details below.
Final Project: Optionally, instead of taking the final exam, you can complete a final project. This will involve identifying a topic of current research in randomized algorithms, formulating a research problem related to that topic, and making efforts to tackle that problem. Background on the topic and a description of these efforts will be recorded in a ~10 page final report. Final projects should be completed in groups of two -- if you would like to work alone, please email the instructor to request permission. For more details on the final project requirements and milestones, see the Assignments page.
Class Participation: Up to 5% extra credit may be awarded for class participation. This may come in many forms, e.g.:
  • Asking good clarifying questions and answering questions during lecture.
  • Asking good clarifying questions and answering other students' or instructor questions on Piazza.
  • Posting helpful links on Piazza, e.g., resources that cover class material, research articles related to the topics covered in class, etc.
Course Academic Honestly Policy: If caught violating the problem set or quiz rules, students will receive a 0% on the assignment for the first violation, and fail the class for a second violation. Any cheating on the midterm, final, or project will lead to failing the class. For fairness, we apply these rules universally, without exceptions.
UMass Academic Honesty Statement: Since the integrity of the academic enterprise of any institution of higher education requires honesty in scholarship and research, academic honesty is required of all students at the University of Massachusetts Amherst. Academic dishonesty is prohibited in all programs of the University. Academic dishonesty includes but is not limited to: cheating, fabrication, plagiarism, and facilitating dishonesty. Appropriate sanctions may be imposed on any student who has committed an act of academic dishonesty. Instructors should take reasonable steps to address academic misconduct. Any person who has reason to believe that a student has committed academic dishonesty should bring such information to the attention of the appropriate course instructor as soon as possible. Instances of academic dishonesty not related to a specific course should be brought to the attention of the appropriate department Head or Chair. Since students are expected to be familiar with this policy and the commonly accepted standards of academic integrity, ignorance of such standards is not normally sufficient evidence of lack of intent.
Disability Services: UMass Amherst is committed to making reasonable, effective, and appropriate accommodations to meet the needs to students with disabilities and help create a barrier-free campus. If you have a documented disability on file with Disability Services, you may be eligible for reasonable accommodations in this course. If your disability requires an accommodation, please notify me within the first two weeks of the course so that we may make arrangements in a timely manner.
Learning Objectives: After taking this course, it is expected that students will be able to:
  • Identify computational costs associated with the design of algorithms for large-scale data processing, including space usage, communication complexity, and running time.
  • Design and formally analyze the correctness and running time of randomized algorithms for a wide-range of data processing tasks using popular paradigms, such as random hashing, random sketching, importance-based sampling, and Markov chain Monte Carlo sampling.
  • Apply key tools of probabilistic analysis, such as linearity of expectation and variance, scalar and matrix concentration bounds, and epsilon-net style arguments, not only to algorithm analysis, but to other problems in data science, machine learning, and the analysis of stochastic processes.
  • Articulate how randomized algorithms fit into the broader computational complexity landscape, and in what settings they can give unconditional improvements over deterministic methods (e.g., in communication complexity).
  • Use their knowledge to read and understand cutting edge research on new algorithms for big data processing and the theory of algorithms in general.