CS514 (S21) | Andrew McGregor

Welcome to the Spring 2021 homepage for CMPSCI 514 - Algorithms for Data Science. See Moodle page for links to lecture recordings, syllabus, homework etc. Slides for future lectures may be motified signifantly depending on our progress. Slides for previous lectures may be updated if, e.g., we spot a typo during the lecture.

Date	Topic	Reading and Background

2 Feb	Course overview. Probability review.	Slides
4 Feb	Estimating set size by counting duplicates. Concentration Bounds: Markov's inequality. Random hashing for efficient lookup.	Slides
9 Feb	Finish up hash tables. 2-universal and pairwise independent hashing. Hashing for load balancing.	Slides
11 Feb	Concentration Bounds Continued: Chebyshev's inequality. The union bound. Exponential tail bounds (Bernstein's inequality).	Slides
16 Feb	Finish up exponential concentration bounds and the central limit theorem. Bloom filters and their applications.	Slides
18 Feb	Finish up Bloom filters. Start on streaming algorithms. Min-Hashing for distinct elements.	Slides
23 Feb	Finish up distinct elements and the median trick. Flajolet-Martin and HyperLogLog. Jaccard similarity estimation with MinHash for audio fingerprinting, document comparision, etc. Start on locality sensitive hashing and nearest neighbor search.	Slides
25 Feb	Finish up MinHash for Jaccard similarity and locality sensitive hashing. Similarity search. SimHash for Cosine similarity.	Slides
2 Mar	The frequent elements problem and count-min sketch.	Slides
4 Mar	Dimensionality reduction, low-distortion embeddings, and the Johnson Lindenstrauss Lemma.	Slides
9 Mar	Finish up the JL Lemma. Example application to clustering. Connections to high-dimensional geometry.	Slides
11 Mar	Finish up high-dimensional geometry and connection to the JL Lemma.	Slides
16 Mar	Midterm Review	Slides
18 May	No Class: Midterm
23 Mar	Intro to principal component analysis, low-rank approximation, data-dependent dimensionality reduction.	Slides
25 Mar	Projection matrices and best fit subspaces.	Slides
30 Mar	Optimal low-rank approximation via eigendecomposition. Principal component analysis.	Slides
1 Apr	No class.
6 Apr	SVD and applications of low-rank approximation beyond compression. Matrix completion, LSA, and word embeddings.	Slides
8 Apr	Linear algebraic view of graphs. Spectral graph partitioning and clustering.	Slides
13 Apr	Stochastic block model.	Slides
15 Apr	Computing the SVD: power method, Krylov methods. Connection to random walks and Markov chains.	Slides
22 Apr	Optimization and gradient descent analysis for convex functions.	Slides
27 Apr	Finish gradient descent analysis. Constrained optimization and projected gradient descent.	Slides
29 Apr	Online learning and regret. Online gradient descent.	Slides
4 May	Finish up online gradient descent and stochastic gradient descent analysis. Course conclusion/review.	Slides

Andrew McGregor

Associate Professor