CS 685, Spring 2024, UMass Amherst

Assignments

Homework 0 released, due 2/16
Quiz 1 released, due 2/19
Quiz 2 released, due 3/1
Final project proposals due 3/8, use this Overleaf template
Quiz 3 released, due 3/15
Homework 1 released, due 3/15
Homework 2 released, due 4/8
Quiz 4 released, due 5/1
Quiz 5 released, due 5/10
Extra credit talk summaries due by 5/17, use this Overleaf template
Final project reports due 5/17, use this Overleaf template

Schedule

Make sure to reload this page to ensure you're seeing the latest version. Readings should be done before watching the corresponding lecture videos. See this page for materials (videos / slides / reading) from the Spring 2023 offering.

Week 1 (2/5-7): introduction, language modeling

- Course introduction // [video] // [slides]
- No associated readings or weekly quiz!
- HW 0 released here, due 2/16
- Final projects:
- - organize into groups of 4 by 2/16
  - proposal due 3/8 on Gradescope, use this Overleaf template

- Language modeling // [video] // [slides]
- [reading] Jurafsky & Martin, 3.1-3.5 (language modeling)
- [reading] Jurafsky & Martin, 7 (neural language models)
- [S2023 video]

Week 2 (2/12-14): neural language models, backpropagation

- Neural language models // [video] // [slides]
- [reading] Neural language models (Bengio 2003)
- [S2023 video]

- Backpropagation // [video] // [notes]
- [optional reading] Andrej Karpathy's coding-based backpropagation post
- [optional reading] Pascanu et al., ICML 2013 (vanishing gradients in RNNs)
- [S2023 video]
- See F2020 video timestamped here for derivation of gradient through linear layer

Week 3 (2/21-22): attention mechanisms and Transformers

- Attention mechanisms & self-attention // [video] // [notes]
- [reading] Bahdanau et al., ICLR 2015 (paper that introduced attention)
- [optional reading] An easy-to-read blog post on attention
- [S2023 video]

- Transformer language models // [video] // [notes]
- [reading] Vaswani et al., NeurIPS 2017 (paper that introduced Transformers)
- [reading] An easy-to-read blog post on Transformer language models
- [S2023 video]

Week 4 (2/26-28): Transformers (cont'd): architecture, pretrain/finetune

- Transformers (cont'd) // [video] // [notes]
- [reading] Deep contextualized word representations (Peters et al., 2018, "ELMo")
- [reading] BERT: Pre-training of Deep Bidirectional Transformers... (Devlin et al., 2019)
- [reading] Easy-to-read blog post on transfer learning in NLP

- BERT + Instruction tuning // [video] // [notes]
- [reading] BERT: Pre-training of Deep Bidirectional Transformers... (Devlin et al., 2019)
- [reading] Exploring the Limits of Transfer Learning... (Raffel et al., JMLR 2020, "T5")
- [reading] Instruction tuning (Wei et al., 2022, FLAN)

Week 5 (3/4-6): Tokenization and efficient fine-tuning

- Tokenization & T5 // [video] // [slides] // [notes]
- [reading] Neural Machine Translation... with Subword Units (Sennrich et al., ACL 2016)
- [reading] ByT5: Towards a token-free future... (Xue et al., 2021)

- Parameter-efficient adaptation // [video] // [notes]
- [reading] Power of Scale for Prompt Tuning (Lester et al., EMNLP 2021)
- [reading] LoRA: Low-Rank Adaptation of Large Language Models (Hu et al., 2021)

Week 6 (3/11-13): LLM alignment

- RLHF / RLAIF // [video] // [notes]
- [reading] Reinforcement learning from human feedback (Ouyang et al., 2022, RLHF)
- [reading] Why RLHF? (Yoav Goldberg's blogpost, 2023)
- [optional reading] RLAIF (Lee et al., 2023)
- [optional reading] Constitutional AI (Bai et al., 2022)
- [optional reading] Effects of RLHF on generalization and diversity (Kirk et al., ICLR 2024)

- DPO // [video] // [notes]
- [reading] Direct Preference Optimization (NeurIPS 2023)

Week 7 (3/27): Decoding from language models

- No class Monday 3/25 (Mohit traveling)

- Decoding from language models // [video] // [slides] // [notes]
- [reading] Nucleus sampling (Holtzmann et al., ICLR 2020)
- [reading] RankGen (Krishna et al., EMNLP 2022)

Week 8 (4/1-4/3): Prompt engineering and evaluation

- Prompt engineering & RAG // [video] // [prompt examples]
- [reading] Lilian Weng's blogpost on prompt engineering
- [reading] REALM: retrieval-augmented LMs (Guu et al., 2020)
- [reading] Toolformer (Schick et al., 2023)

- Evaluating text generation models // [video] // [slides]
- [reading] BLEURT: robust metrics for text generation (Sellam et al., 2020)
- [reading]Judging LLM as a Judge (MTBench, Zheng et al., NeuRIPS 2023)
- [reading]FactScore: Fine-grained atomic evaluation of factual precision (Min et al., EMNLP 2023)

Week 9 (4/8-4/12): Scaling LLMs, midterm review

- Position embeddings + efficient attention // [video] // [notes]
- [reading] Rotary position embeddings (RoPE, Su et al., 2021)
- [reading] Ring attention (Liu et al., 2023)
- [reading] Flash attention (Dao et al., 2022)

- Scaling laws of LLMs // [video] // [slides] // [notes]
- [reading] Scaling Laws for Neural Language Models (Kaplan et al., 2020)
- [reading] Training Compute-Optimal Large Language Models (Hoffmann et al., 2022)

- Midterm review // [video] // [notes]

Week 9 (4/22-4/24): Vision-language models, understanding in-context learning

- Vision-language models // [video] // [slides]
- [reading] Learning... visual models from NL supervision (Radford et al., 2021, "CLIP")
- [reading] An image is worth 16x16 words (Dosovitskiy et al, 2021)
- [reading] Breaking resolution curse of VLMs (HuggingFace blogpost)

- Understanding in-context learning // [video] // [slides]
- [reading] Rethinking the role of demonstrations (Min et al., 2022)
- [reading] What learning algorithm is in-context learning? (Akyürek et al., 2023)

Week 10 (4/29-5/1): LLM security, probing LLMs

- LLM detection & security risks // [video] // [slides]
- [reading] A watermark for LLMs (Kirchenbauer et al., ICML 2023)
- [reading] Paraphrasing attacks on AI-generated text detectors (Krishna et al., NeurIPS 2023)

- Interpretability: probing, editing, induction heads // [video] // [slides]
- [reading] Control probes (Hewitt & Liang et al., 2019)
- [reading] Measuring / manipulating knowledge representations in LMs (Hernandez et al., 2023)
- [reading] In-context learning and induction heads (Olsson et al., 2022)

Week 11 (5/6): Mamba/Griffin, no class 5/8

- Beyond Transformers: Mamba & Griffin // [video] // [slides]
- [reading] Mamba: Linear-Time Sequence Modeling with SSMs (Gu & Dao, 2023)
- [reading] Griffin: Mixing Gated Linear Recurrences with Local Attention... (De et al., 2024)