CS 685, Spring 2024, UMass Amherst
Assignments
-
Homework 0 released, due 2/16
- Quiz 1 released, due 2/19
- Quiz 2 released, due 3/1
-
Final project proposals due 3/8, use this Overleaf template
- Quiz 3 released, due 3/15
-
Homework 1 released, due 3/15
-
Homework 2 released, due 4/8
-
Quiz 4 released, due 5/1
-
Quiz 5 released, due 5/10
-
Extra credit talk summaries due by 5/17, use this Overleaf template
- Final project reports due 5/17, use this Overleaf template
Schedule
Make sure to reload this page to ensure you're seeing the latest version.
Readings should be done before watching the corresponding lecture videos. See this page for materials (videos / slides / reading) from the Spring 2023 offering.
Week 1 (2/5-7): introduction, language modeling
-
- Course introduction // [video] // [slides]
- No associated readings or weekly quiz!
- HW 0 released here, due 2/16
- Final projects:
-
Week 2 (2/12-14): neural language models, backpropagation
Week 3 (2/21-22): attention mechanisms and Transformers
Week 4 (2/26-28): Transformers (cont'd): architecture, pretrain/finetune
- Transformers (cont'd) // [video] // [notes]
- [reading] Deep contextualized word representations (Peters et al., 2018, "ELMo")
- [reading] BERT: Pre-training of Deep Bidirectional Transformers... (Devlin et al., 2019)
- [reading] Easy-to-read blog post on transfer learning in NLP
- BERT + Instruction tuning // [video] // [notes]
- [reading] BERT: Pre-training of Deep Bidirectional Transformers... (Devlin et al., 2019)
- [reading] Exploring the Limits of Transfer Learning... (Raffel et al., JMLR 2020, "T5")
- [reading] Instruction tuning (Wei et al., 2022, FLAN)
Week 5 (3/4-6): Tokenization and efficient fine-tuning
- Tokenization & T5 // [video] // [slides] // [notes]
- [reading] Neural Machine Translation... with Subword Units (Sennrich et al., ACL 2016)
- [reading] ByT5: Towards a token-free future... (Xue et al., 2021)
- Parameter-efficient adaptation // [video] // [notes]
- [reading] Power of Scale for Prompt Tuning (Lester et al., EMNLP 2021)
- [reading] LoRA: Low-Rank Adaptation of Large Language Models (Hu et al., 2021)
Week 6 (3/11-13): LLM alignment
Week 7 (3/27): Decoding from language models
- No class Monday 3/25 (Mohit traveling)
Week 8 (4/1-4/3): Prompt engineering and evaluation
- Evaluating text generation models // [video] // [slides]
- [reading] BLEURT: robust metrics for text generation (Sellam et al., 2020)
- [reading]Judging LLM as a Judge (MTBench, Zheng et al., NeuRIPS 2023)
- [reading]FactScore: Fine-grained atomic evaluation of factual precision (Min et al., EMNLP 2023)
Week 9 (4/8-4/12): Scaling LLMs, midterm review
- Position embeddings + efficient attention // [video] // [notes]
- [reading] Rotary position embeddings (RoPE, Su et al., 2021)
- [reading] Ring attention (Liu et al., 2023)
- [reading] Flash attention (Dao et al., 2022)
- Scaling laws of LLMs // [video] // [slides] // [notes]
- [reading] Scaling Laws for Neural Language Models (Kaplan et al., 2020)
- [reading] Training Compute-Optimal Large Language Models (Hoffmann et al., 2022)
Week 9 (4/22-4/24): Vision-language models, understanding in-context learning
- Vision-language models // [video] // [slides]
- [reading] Learning... visual models from NL supervision (Radford et al., 2021, "CLIP")
- [reading] An image is worth 16x16 words (Dosovitskiy et al, 2021)
- [reading] Breaking resolution curse of VLMs (HuggingFace blogpost)
- Understanding in-context learning // [video] // [slides]
- [reading] Rethinking the role of demonstrations (Min et al., 2022)
- [reading] What learning algorithm is in-context learning? (Akyürek et al., 2023)
Week 10 (4/29-5/1): LLM security, probing LLMs
- LLM detection & security risks // [video] // [slides]
- [reading] A watermark for LLMs (Kirchenbauer et al., ICML 2023)
- [reading] Paraphrasing attacks on AI-generated text detectors (Krishna et al., NeurIPS 2023)
- Interpretability: probing, editing, induction heads // [video] // [slides]
- [reading] Control probes (Hewitt & Liang et al., 2019)
- [reading] Measuring / manipulating knowledge representations in LMs (Hernandez et al., 2023)
- [reading] In-context learning and induction heads (Olsson et al., 2022)
Week 11 (5/6): Mamba/Griffin, no class 5/8
- Beyond Transformers: Mamba & Griffin // [video] // [slides]
- [reading] Mamba: Linear-Time Sequence Modeling with SSMs (Gu & Dao, 2023)
- [reading] Griffin: Mixing Gated Linear Recurrences with Local Attention... (De et al., 2024)