Andrew McGregor

Associate Professor

Welcome to the Spring 2021 homepage for CMPSCI 514 - Algorithms for Data Science. See Moodle page for links to lecture recordings, syllabus, homework etc. Slides for future lectures may be motified signifantly depending on our progress. Slides for previous lectures may be updated if, e.g., we spot a typo during the lecture.

Date Topic Reading and Background
2 Feb Course overview. Probability review. Slides
4 Feb Estimating set size by counting duplicates. Concentration Bounds: Markov's inequality. Random hashing for efficient lookup. Slides
9 Feb Finish up hash tables. 2-universal and pairwise independent hashing. Hashing for load balancing. Slides
11 Feb Concentration Bounds Continued: Chebyshev's inequality. The union bound. Exponential tail bounds (Bernstein's inequality). Slides
16 Feb Finish up exponential concentration bounds and the central limit theorem. Bloom filters and their applications. Slides
18 Feb Finish up Bloom filters. Start on streaming algorithms. Min-Hashing for distinct elements. Slides
23 Feb Finish up distinct elements and the median trick. Flajolet-Martin and HyperLogLog. Jaccard similarity estimation with MinHash for audio fingerprinting, document comparision, etc. Start on locality sensitive hashing and nearest neighbor search. Slides
25 Feb Finish up MinHash for Jaccard similarity and locality sensitive hashing. Similarity search. SimHash for Cosine similarity. Slides
2 Mar The frequent elements problem and count-min sketch. Slides
4 Mar Dimensionality reduction, low-distortion embeddings, and the Johnson Lindenstrauss Lemma. Slides
9 Mar Finish up the JL Lemma. Example application to clustering. Connections to high-dimensional geometry. Slides
11 Mar Finish up high-dimensional geometry and connection to the JL Lemma. Slides
16 Mar Midterm Review Slides
18 May No Class: Midterm
23 Mar Intro to principal component analysis, low-rank approximation, data-dependent dimensionality reduction. Slides
25 Mar Projection matrices and best fit subspaces. Slides
30 Mar Optimal low-rank approximation via eigendecomposition. Principal component analysis. Slides
1 Apr No class.
6 Apr SVD and applications of low-rank approximation beyond compression. Matrix completion, LSA, and word embeddings. Slides
8 Apr Linear algebraic view of graphs. Spectral graph partitioning and clustering. Slides
13 Apr Stochastic block model. Slides
15 Apr Computing the SVD: power method, Krylov methods. Connection to random walks and Markov chains. Slides
22 Apr Optimization and gradient descent analysis for convex functions. Slides
27 Apr Finish gradient descent analysis. Constrained optimization and projected gradient descent. Slides
29 Apr Online learning and regret. Online gradient descent. Slides
4 May Finish up online gradient descent and stochastic gradient descent analysis. Course conclusion/review. Slides