CS 685, Fall 2021, UMass Amherst
Schedule
Make sure to reload this page to ensure you're seeing the latest version.
Readings should be done before watching the corresponding lecture videos. See this page for materials (videos / slides / reading) from the Fall 2020 offering.
Week 1 (9/1): introduction
-
- Course introduction // [video] // [slides]
- No associated readings or weekly quiz!
- HW 0 released here, due 9/13 on Gradescope
Week 2 (9/8): language models
Week 3 (9/13-15): neural LMs, RNNs, backpropagation
- Backpropagation // [video] // [notes]
- See F2020 video timestamped here for derivation of gradient through linear layer
- [reading] Andrej Karpathy's coding-based backpropagation post
- [optional reading] Pascanu et al., ICML 2013 (vanishing gradients in RNNs)
- Quiz 1 released, due 9/20 on Gradescope
Week 4 (9/20-22): Attention mechanisms, Transformers
- Transformer language models // [video] // [slides] // [notes]
- [reading] Vaswani et al., NeurIPS 2017 (paper that introduced Transformers)
- [reading] An easy-to-read blog post on Transformer language models
- Quiz 2 released, due 9/24 on Gradescope
Week 4 (9/27-29): Transfer learning with pretrained LMs
- Transfer learning with Muppets (ELMo and BERT) // [video] // [notes]
- [reading] Deep contextualized word representations (Peters et al., 2018, "ELMo")
- [reading] BERT: Pre-training of Deep Bidirectional Transformers... (Devlin et al., 2019)
- [reading] Easy-to-read blog post on transfer learning in NLP
- Quiz 3 released, due 10/1 on Gradescope
Week 5 (10/5-7): Text-to-text transfer learning, tokenization
- Transfer learning with text-to-text models, and decoding from LMs // [video] // [slides] // [notes]
- [reading] Exploring the Limits of Transfer Learning... (Raffel et al., JMLR 2020, "T5")
- [reading] Language Models are Few-Shot Learners (Brown et al., NeurIPS 2020, "GPT-3")
- [optional reading] Nucleus sampling (Holtzmann et al., ICLR 2020)
- Quiz 4 released, due 10/8 on Gradescope
Week 6 (10/13): Prompt-based learning
- HW 1 released here, due 11/5 on Gradescope & via email
Week 7 (10/18-20): Evaluating text generation, multilingual NLP
- Evaluating text generation models // [video] // [slides]
- [reading] Evaluation of text generation survey (Celikyilmaz et al., 2020)
- [optional reading] Do massively pretrained LMs make better storytellers? (See et al., 2019)
- Multilingual transfer learning // [video] // [slides]
- [reading] Beyond English-centric multilingual machine translation (Fan et al., 2020)
- [reading] MAD-X: Multi-task cross lingual transfer (Pfeiffer et al., EMNLP 2020)
- Quiz 5 released, due 10/22 on Gradescope
Week 8 (10/18-20): Retrieval-augmented text generation, efficient Transformers
- Retrieval-augmented LMs // [video] // [slides]
- [reading] REALM: retrieval-augmented LMs (Guu et al., 2020)
- [reading] Nearest neighbor machine translation (Khandelwal et al., ICLR 2021)
- [optional reading] Hurdles to progress in long-form QA (Krishna et al., NAACL 2021)
- Efficient / long-range Transformers // [video] // [slides]
- [reading] Survey of Efficient Transformers (Tay et al., 2020)
- [reading] Routing Transformers (Roy et al., TACL 2020)
- [optional reading] Do long-range LMs use long-range context? (Sun et al., EMNLP 2021)
- Quiz 6 released, due 10/29 on Gradescope
Week 9 (11/1-3): Vision & language, midterm review
- Vision + language // [video] // [slides]
- [reading] Visual-semantic alignments for image captioning (Karpathy & Fei-Fei, 2014)
- [reading] Learning... visual models from NL supervision (Radford et al., 2021, "CLIP")
- [reading] NLVR: A corpus of natural language for visual reasoning (Suhr et al., 2017)
Week 10 (11/8-10): Commonsense reasoning, midterm (Mohit out, no live/in-person classes!)
- Midterm (released 11/9, due 11/11 on Gradescope)
Week 11 (11/15-17): Probe tasks, semantic parsing
- Linguistic probe tasks // [video] // [slides]
- [reading] What you can cram into a single $&!#* vector (Conneau et al., 2018)
- [reading] Control probes (Hewitt & Liang et al., 2019)
- Quiz 7 released, due 11/23 on Gradescope
Week 12 (11/22): Ethics in NLP
Week 13 (11/29, 12/1): Psycholinguistics & parsing
Week 14 (12/6-8): Knowledge distillation & story generation
- Model distillation // [video] // [slides]
- [reading] Imitation attacks on MT systems (Wallace et al., 2020)
- [reading] Thieves on Sesame Street! (Krishna et al., 2020)
- [reading] Lottery ticket hypothesis for BERT (Chen et al., 2020)
- Story generation // [video] // [slides]
- [reading] Hierarchical neural story generation (Fan et al., 2018)
- [reading] STORIUM: Machine-in-the-loop story generation (Akoury et al., 2020)