Course Schedule (Evolving)

Lecture recordings from Echo360 can be accessed here.

Lecture      Day Topic Materials/Reading
1/29 Thu No Class. Class begins on Tuesday 2/3.
1. 2/3 Tue Course overview. Probability review. Linearity of expectation and variance. Slides. Compressed slides. Reading: MIT short videos and exercises on probability (go to Unit 4). Khan academy probability lessons (a bit more basic). Chapters 1-3 of Probability and Computing with content and excersises on basic probability, expectation, variance, and concentration bounds.
Randomized Methods, Sketching & Streaming
2. 2/5 Thu Estimating set size by counting duplicates. Markov's inequality. Random hashing for efficient lookup. Collision-free hashing. Slides. Compressed slides. Reading: Chapters 1-3 of Probability and Computing with content and excersises on basic probability, expectation, variance, and concentration bounds.
3. 2/10 Tue 2-level hashing. 2-universal and pairwise independent hashing. Slides. Compressed slides. Reading: Chapter 2.2 of Foundations of Data Science with content on Markov's inequality and Chebyshev's inequality. Exercises 2.1-2.6. Chapters 1-3 of Probability and Computing with content and excersises on basic probability, expectation, variance, and concentration bounds. Some notes (Arora and Kothari at Princeton) proving that the ax+b mod p hash function described in class in 2-universal.
4. 2/12 Thu Hashing for load balancing and Chebyshev's inequality. The union bound. Motivate exponential concentration bounds. Slides. Compressed slides. Reading: Chapter 2.2 of Foundations of Data Science with content on Markov's inequality and Chebyshev's inequality. Exercises 2.1-2.6. Chapters 1-3 of Probability and Computing with content and excersises on basic probability, expectation, variance, and concentration bounds.
5. 2/17 Tue Class held over Zoom. Exponential concentration bounds and the central limit theorem. Slides. Reading: Chapter 4 of Probability and Computing on exponential concentration bounds. Some notes (Goemans at MIT) showing how to prove exponential tail bounds using the moment generating function + Markov's inequality approach.
2/19 Thu No Class. Monday class schedule followed.
6. 2/24 Tue Finish up applications of exponential concentration bounds. Bloom Filters. Slides. Reading: Chapter 4 of Mining of Massive Datasets, with content on Bloom filters. See here for full Bloom filter analysis. See Wikipedia for a discussion of the many bloom filter variants, including counting Bloom filters, and Bloom filters with deletions.
7. 2/26 Thu Finish up Bloom filters. Start on streaming algorithms and frequent elements estimation. Slides. Reading: Chapter 4 of Mining of Massive Datasets, with content on Bloom filters. Notes (Amit Chakrabarti at Dartmouth) on streaming algorithms. See Chapters 1 and 5 for frequent elements. Some more notes on the frequent elements problem.
8. 3/3 Tue Frequent elements estimation via Count-min sketch. Min-Hashing for Distinct elements. Slides. Reading: Notes (Amit Chakrabarti at Dartmouth) on streaming algorithms. See Chapters 1 and 5 for frequent elements. Some more notes on the frequent elements problem. A website with lots of resources, implementations, and example applications of count-min sketch. Chapter 4 of Mining of Massive Datasets, with content on distinct elements counting.
9. 3/5 Thu Finish up distinct elements counting. The median trick. Distinct elements in pratice: Flajolet-Martin and HyperLogLog. Slides. Reading: Chapter 4 of Mining of Massive Datasets, with content on distinct elements counting. The 2007 paper introducing the popular HyperLogLog distinct elements algorithm.
10. 3/10 Tue Jaccard similarity, fast similarity search, and locality sensitive hashing Slides. Reading: Chapter 3 of Mining of Massive Datasets, with content on Jaccard similarity, MinHash, and locality sensitive hashing.
3/12 Thu Midterm 1. 1-2:15pm. In class. Study guide and review questions.
3/17 Tue No Class. Spring Break.
3/19 Thu No Class. Spring Break.
Spectral Methods
12. 3/24 Tue Compressing high dimensional data: low-distortion embeddings and the Johnson-Lindenstrauss Lemma. Example application to clustering. Slides. Reading: Chapter 2.7 of Foundations of Data Science on the Johnson-Lindenstrauss lemma. Notes on the JL-Lemma (Anupam Gupta (CMU). Sparse random projections which can be multiplied by more quickly. Some good videos for linear algebra review.. See also: Khan academy.
13. 3/26 Thu Intro to principal component analysis, low-rank approximation, data-dependent dimensionality reduction. Orthogonal bases and projection matrices. Dual column/row view of low-rank approximation. Slides. Reading: Chapter 3 of Foundations of Data Science and Chapter 11 of Mining of Massive Datasets on low-rank approximation and the SVD. Some good videos overviewing the SVD and related topics (like orthogonal projection and low-rank approximation).
14. 3/31 Tue Best fit subspaces and optimal low-rank approximation via eigendecomposition. Slides. Reading: Proof that optimal low-rank approximation can be found greedily (see Section 1.1). Chapter 3 of Foundations of Data Science and Chapter 11 of Mining of Massive Datasets on low-rank approximation.
15. 4/2 Thu Finish up optimal low-rank approximation via eigendecomposition. Eigenvalues as a measure of low-rank approximation error. General linear algebra review. Slides. Reading: Chapter 3 of Foundations of Data Science and Chapter 11 of Mining of Massive Datasets on low-rank approximation.
16. 4/7 Tue The singular value decomposition and connections to low-rank approximation. Applications of low-rank approximation beyond compression. Matrix completion and entity embeddings. Slides. Reading: Notes on SVD and its connection to eigendecomposition/PCA (Roughgarden and Valiant at Stanford). Levy Goldberg paper on word embeddings as implicit low-rank approximation.
17. 4/9 Thu Spectral graph theory and spectral clustering. Slides. Reading: Chapter 10.4 of Mining of Massive Datasets on spectral graph partitioning. For a lot more interesting material on spectral graph methods see Dan Spielman's lecture notes. Great notes on spectral graph methods (Roughgarden and Valiant at Stanford).
18. 4/14 Tue The stochastic block model. Slides. Reading: Dan Spielman's lecture notes on stochastic block model, including matrix concentration + David-Kahan perturbation analysis.. Further stochastic block model notes (Alessandro Rinaldo at CMU). A survey of the vast literature on the stochastic block model, beyond the spectral methods discussed in class (Emmanuel Abbe at Princeton).
19. 4/16 Thu Computing the SVD: power method. Slides. Reading: Chapter 3.7 of Foundations of Data Science on the power method for SVD. Some notes on the power method. (Roughgarden and Valiant at Stanford).
20. 4/21 Tue Finish up power method analysis. Krylov methods. Connection to random walks and Markov chains. Slides. Reading: Chapter 3.7 of Foundations of Data Science on the power method for SVD. Some notes on the power method. (Roughgarden and Valiant at Stanford). Multivariable calc review, e.g., through: Khan academy
4/23 Thu Midterm 2. 1-2:15pm. In class. Study guide and review questions.
Optimization
21. 4/28 Tue Intro to gradient descent and its analysis for convex Lipschitz functions. Slides. Reading: Chapters I and III of these notes (Hardt at Berkeley).
22. 4/30 Thu Finish gradient descent analysis. Constrained optimization and projected gradient descent. Start on motivation for stochastic gradient descent. Slides. Reading: Short notes, proving regret bound for online gradient descent. A good book (by Elad Hazan) on online optimization, including online gradient descent and connection to stochastic gradient descent
23. 5/4 Tue Online gradient descent and application to the analysis of stochastic gradient descent. Slides. Reading: Short notes, proving regret bound for online gradient descent. A good book (by Elad Hazan) on online optimization, including online gradient descent and connection to stochastic gradient descent.
24. 5/7 Thu Course wrap-up and final exam review. Slides.
5/12 Tue Final Exam. 1-3pm. In class. Study guide and review questions.