CS 685, Spring 2023, UMass Amherst
Assignments
-
Homework 0 released, due 2/17
-
Extra credit talk summaries due by 5/17, use this Overleaf template
- Quiz 1 released, due 2/27
-
Final project proposals due 3/8, use this Overleaf template
- Quiz 2 released, due 3/8
- Homework 1 released, due 4/5
- Midterm scheduled for 4/12
- Quiz 3 released, due 4/28
- Quiz 4 released, due 5/12
- Homework 2 released, due 5/13
- Final project reports due 5/17, use this Overleaf template
Schedule
Make sure to reload this page to ensure you're seeing the latest version.
Readings should be done before watching the corresponding lecture videos. See this page for materials (videos / slides / reading) from the Fall 2022 offering.
Week 1 (2/6-8): introduction, language modeling
-
- Course introduction // [video] // [slides]
- No associated readings or weekly quiz!
- HW 0 released here, due 2/17
- Final projects:
-
Week 2 (2/13-15): neural language models, backpropagation
Week 3 (2/22): attention mechanisms
Week 4 (2/27-3/1): Transformers
- Transformer language models // [video] // [notes]
- [reading] Vaswani et al., NeurIPS 2017 (paper that introduced Transformers)
- [reading] An easy-to-read blog post on Transformer language models
- Transformers (cont'd) // [video] // [notes]
- [reading] Deep contextualized word representations (Peters et al., 2018, "ELMo")
- [reading] BERT: Pre-training of Deep Bidirectional Transformers... (Devlin et al., 2019)
- [reading] Easy-to-read blog post on transfer learning in NLP
Week 5 (3/6-3/8): Transformers (cont'd)
- BERT // [video] // [notes]
- [reading] BERT: Pre-training of Deep Bidirectional Transformers... (Devlin et al., 2019)
- [reading] Exploring the Limits of Transfer Learning... (Raffel et al., JMLR 2020, "T5")
Week 6 (3/20-3/22): Using large language models
Week 7 (3/27-3/29): Aligning large language models to human preferences
- Instruction tuning + reinforcement learning with human feedback // [video] // [notes]
- [reading] Instruction tuning (Wei et al., 2022, FLAN)
- [reading] Reinforcement learning from human feedback (Ouyang et al., 2022, RLHF)
- Tokenization // [video] // [slides]
- [reading] Neural Machine Translation... with Subword Units (Sennrich et al., ACL 2016)
- [reading] ByT5: Towards a token-free future... (Xue et al., 2021)
Week 8 (4/3-4/5): Evaluating language generation, and prompt engineering
- Evaluating text generation models // [video] // [slides]
- [reading] Evaluation of text generation survey (Celikyilmaz et al., 2020)
- [reading] BLEURT: robust metrics for text generation (Sellam et al., 2020)
- [optional reading] Do massively pretrained LMs make better storytellers? (See et al., 2019)
Week 9 (4/10-12): Midterm exam
- No class on 4/12 due to midterm
Week 10 (4/19): Security risks with LLMs
- Model extraction & other attacks // [video] // [slides]
- [reading] Imitation attacks on MT systems (Wallace et al., 2020)
- [reading] Thieves on Sesame Street! (Krishna et al., 2020)
- [reading] Paraphrasing attacks on AI-generated text detectors (Krishna et al., 2023)
Week 11 (4/24-26): Scaling laws and probe tasks
- Scaling laws of LLMs // [video] // [slides]
- [reading] Scaling Laws for Neural Language Models (Kaplan et al., 2020)
- [reading] Training Compute-Optimal Large Language Models (Hoffmann et al., 2022)
- Linguistic probe tasks // [video] // [slides]
- [reading] What you can cram into a single $&!#* vector (Conneau et al., 2018)
- [reading] Control probes (Hewitt & Liang et al., 2019)
- [reading] Measuring / manipulating knowledge representations in LMs (Hernandez et al., 2023)
Week 12 (5/1-3): Understanding in-context learning, and using it for translation
- Why does in-context learning work? // [video] // [slides]
- [reading] Rethinking the role of demonstrations (Min et al., 2022)
- [reading] What learning algorithm is in-context learning? (Akyürek et al., 2023)
Week 13 (5/8-10): Multilingual LMs, ethics & NLP
Week 14 (5/17): Vision & language
- Multimodal language modeling // [video] // [slides]
- [reading] Visual-semantic alignments for image captioning (Karpathy & Fei-Fei, 2014)
- [reading] Learning... visual models from NL supervision (Radford et al., 2021, "CLIP")
- [reading] An image is worth 16x16 words (Dosovitskiy et al, 2021)