Assignments

Homework 0 released, due 2/14
Quiz 1 released, due 2/17
Quiz 2 released, due 3/3
Final project proposals due 3/7, use this Overleaf template
Homework 1 released, due 3/14
Quiz 3 released, due 3/17
Homework 2 released, due 4/11
Quiz 4 released, due 5/5
Extra credit talk summaries due by 5/9, use this Overleaf template
Final project reports due 5/12, use this Overleaf template
Extra credit lecture summaries due by 5/12, use this Overleaf template

Schedule

Make sure to reload this page to ensure you're seeing the latest version. Readings should be done before watching the corresponding lecture videos. See this page or this page for materials (videos / slides / reading) from the Spring 2023/2024 offering.

Week 1 (2/3-5): Introduction and Language Modeling

- Course Introduction // [video] (last year) // [slides]
- HW 0 released here, due 2/14
- Final projects:
- - Organize into groups of 4 by 2/14. If you do not know people, you can post your interests in Piazza to find partners. Once decided, please fill https://forms.gle/PKvJRxZkUMgFrkVG8 telling us your team members (please sending only one response per group).
  - Proposal due 3/7 on Gradescope, use this Overleaf template
- [optional reading] The Bitter Lesson

- Language Modeling Task // [video] (last year) // [slides]
- [reading] Jurafsky & Martin, 3.1-3.5 (language modeling)
- [reading] Jurafsky & Martin, 7 (neural language models)
- [optional reading] A nice blog introducing language modeling

Week 2 (2/10-12): Neural Language models, Optimization and Backpropagation

- Neural Language Models // [video] (last year) // [slides]
- [reading] Neural language models (Bengio 2003)
- [optional reading] A nice blog introducing language modeling

- Optimization and Backpropagation // [video] (last year) // [notes] // [slides]
- [optional reading] Andrej Karpathy's coding-based backpropagation post
- [optional reading] Pascanu et al., ICML 2013 (vanishing gradients in RNNs)
- [optional reading] A paper about how to tune batch size and learning rate
- [S2023 video]

Week 3 (2/19): Attention Mechanisms

Presidents' Day

- Optimization (cont'd) and Self-Attention // [video] (last year) // [notes] // [notes] // [slides]
- [optional reading] Bahdanau et al., ICLR 2015 (paper that introduced attention)
- [optional reading] An easy-to-read blog post on attention
- [optional watching] An good youtube video to understand self-attention
- [S2023 video]

Week 4 (2/24-26): Transformer

- Transformer Language Models // [video] (last year) // [notes] // [slides]
- [reading] Vaswani et al., NeurIPS 2017 (paper that introduced Transformers)
- [reading] Understanding Transformer through an example
- [optional reading] An easy-to-read blog post on Transformer language models
- [optional reading] Another easy-to-read blog post on Transformer language models (including some animations)
- [optional watching] An good youtube video to understand MLP layer inside Transformer
- [optional watching] An good youtube video to understand Transformer
- [optional watching] An good youtube video to understand Transformer
- [S2023 video]

- Transformer Language Models (cont'd) // [video] (last year) // [notes] // [slides]
- [reading] BERT: Pre-training of Deep Bidirectional Transformers... (Devlin et al., 2019)
- [optional reading] Deep contextualized word representations (Peters et al., 2018, "ELMo")

Week 5 (3/3-5): Fine-Tuning and Instruction Tuning

- BERT + Instruction Tuning // [video] (last year) // [notes] // [slides]
- [reading] BERT: Pre-training of Deep Bidirectional Transformers... (Devlin et al., 2019)
- [reading] Exploring the Limits of Transfer Learning... (Raffel et al., JMLR 2020, "T5")
- [reading] A Closer Look at the Limitations of Instruction Tuning (Ghosh et al., ICML 2024)

- Parameter-efficient Adaptation // [video] (last year) // [notes] // [slides]
- [reading] Power of Scale for Prompt Tuning (Lester et al., EMNLP 2021)
- [reading] LoRA: Low-Rank Adaptation of Large Language Models (Hu et al., 2021)

Week 6 (3/10-12): LLM Alignment

- RLHF / RLAIF // [video] (last year) // [notes] // [slides]
- [reading] Reinforcement learning from human feedback (Ouyang et al., 2022, RLHF)
- [reading] Why RLHF? (Yoav Goldberg's blogpost, 2023)
- [reading] Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
- [optional reading] RLAIF (Lee et al., 2023)
- [optional reading] Constitutional AI (Bai et al., 2022)
- [optional reading] Effects of RLHF on generalization and diversity (Kirk et al., ICLR 2024)

- DPO // [video] (last year) // [notes] // [slides]
- [reading] Direct Preference Optimization (NeurIPS 2023)
- [optional reading] Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback (Ivison et al., 2024)

Week 7 (3/17-19): Spring Break

Week 8 (3/24-26): Reasoning and Tokenization

- Reasoning // [slides]
- [reading] DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
- [optional reading] LIMO: Less is More for Reasoning
- [optional reading] s1: Simple test-time scaling

- Reasoning 2 & Tokenization // [video] (last year) // [slides for tokenization] // [slides for Reasoning 2]
- [reading] Neural Machine Translation... with Subword Units (Sennrich et al., ACL 2016)
- [optional reading] s1: Simple test-time scaling
- [optional watching] Tokenizer Lecture from Andrej Karpathy

Week 9 (3/31-4/2): Decoding and Positional Embedding

- Decoding from Language Models // [video] (last year) // [slides]
- [reading] Token Sampling Methods
- [optional reading] Nucleus sampling (Holtzmann et al., ICLR 2020)
- [optional reading] Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters

- Position Embeddings + Efficient Attention // [video] (last year) // [notes] // [slides]
- [reading] Rotary position embeddings (RoPE, Su et al., 2021)
- [optional reading] Ring attention (Liu et al., 2023)
- [optional reading] Flash attention (Dao et al., 2022)

Week 10 (4/7-4/9): Midterm Review

Week 11 (4/14-4/18): Scaling Law, Evaluation, and Midterm

- Scaling Laws of LLMs // [video] (last year) // [slides]
- [reading] Scaling Laws for Neural Language Models (Kaplan et al., 2020)
- [optional reading] Training Compute-Optimal Large Language Models (Hoffmann et al., 2022)
- [optional reading] Are Emergent Abilities of Large Language Models a Mirage? (Schaeffer et al., 2023)
- [optional reading] Observational Scaling Laws and the Predictability of Language Model Performance (Ruan et al., 2024)

- Evaluating Text Generation Models // [video] (last year) // [slides]
- [reading] Judging LLM as a Judge (MTBench, Zheng et al., NeuRIPS 2023)
- [optional reading] BLEURT: robust metrics for text generation (Sellam et al., 2020)
- [optional reading] FactScore: Fine-grained atomic evaluation of factual precision (Min et al., EMNLP 2023)

Midterm

Week 12 (4/23): Prompt Engineering and In-context Learning

Patriot's Day

- Evaluating 2, Prompt Engineering, and In-context Learning // [prompt engineering video] (last year) // [in-context video] (last year) // [Evaluation 2 slides] // [Prompt Engineering slides]
- [reading] Lilian Weng's blogpost on prompt engineering
- [optional reading] REALM: retrieval-augmented LMs (Guu et al., 2020)
- [optional reading] Toolformer (Schick et al., 2023)
- [optional reading] Rethinking the role of demonstrations (Min et al., 2022)
- [optional reading] What learning algorithm is in-context learning? (Akyürek et al., 2023)

Week 13 (4/28-4/30): Agents and Interpretability

- Agentic LLM // [slides]
- [optional reading] Agentic Large Language Models, a survey

- Interpretability // [video] (last year) // [slides]
- [reading] Mapping the Mind of a Large Language Model
- [optional reading] Tracing the thoughts of a large language model
- [optional reading] Measuring / manipulating knowledge representations in LMs (Hernandez et al., 2023)

Week 14 (5/5-5/7): Security and LLM limitations

- LLM Security Risks // [video] (last year) // [slides]
- [reading] Security and Privacy Challenges of Large Language Models: A Survey
- [optional reading] A watermark for LLMs (Kirchenbauer et al., ICML 2023)

- LLM Limitations // [slides]
- [reading] Faith and Fate: Limits of Transformers on Compositionality
- [reading] The Generative AI Paradox: "What it can create, it may not understand"