CS 685, Fall 2020, UMass Amherst

Schedule

Make sure to reload this page to ensure you're seeing the latest version. Readings should be done before watching the corresponding lecture videos.

Week 1 (8/24-28): introduction, language models, representation learning

- Course introduction // [video] // [slides]
- No associated readings or weekly quiz!

- Language modeling // [video] // [slides]
- [reading] Jurafsky & Martin, 3.1-3.5 (language modeling)
- [reading] Jurafsky & Martin, 7 (neural language models)
- HW 0 released here, due 9/4 on Gradescope

Week 2 (8/31-9/4): neural LMs, RNNs, backpropagation

- Neural language models // [video] // [slides]
- [reading] Neural language models (Bengio 2003)

- Backpropagation // [video] // [notes]
- [reading] Andrej Karpathy's coding-based backpropagation post

- Final project // [video] // [slides]
- organize into groups by 9/4
- proposal due 9/21 on Gradescope, use this Overleaf template
- Quiz 1 released, due 9/11 on Gradescope

Week 3 (9/7-11): Attention mechanisms

- Implementing neural language models in PyTorch // [video] // [notes] // [colab]

- Attention mechanisms // [video] // [slides]
- [reading] An easy-to-read blog post on attention

- Quiz 2 released, due 9/18 on Gradescope

Week 4 (9/14-18): Transformers, transfer learning

- Transformers and sequence-to-sequence models // [video] // [slides] // [notes]
- [reading] An easy-to-read blog post on Transformers
- [reading] "Attention is All You Need": Transformers research paper (Vaswani et al., 2017)

- Transfer learning via neural language models // [stream] // [slides] // [notes]
- [reading] Deep contextualized word representations (Peters et al., 2018, "ELMo")
- [reading] Easy-to-read blog post on transfer learning in NLP

- Quiz 3 released, due 9/25 on Gradescope

Week 5: BERT and how to use it for downstream tasks

- BERT // [stream] // [slides] // [notes]
- [reading] BERT: Pre-training of Deep Bidirectional Transformers... (Devlin et al., 2019)

- Question answering // [stream] // [slides]
- [reading] SQuAD: 100000+ Questions for Machine Comprehension (Rajpurkar et al., 2016)
- [reading] ELI5: Long Form Question Answering (Fan et al., 2019)

- Extra credit released, due by 12/4 on Gradescope

Week 6: further improving transfer learning in NLP

- Intermediate fine-tuning // [stream] // [slides] // [notes]
- [reading] Sentence Encoders on STILTS (Phang et al., 2019)
- [reading] Exploring and Predicting Transferability across NLP Tasks (Vu et al., 2020)
- [optional reading] Taskonomy: transfer learning in computer vision (Zamir et al., 2018)

- Advanced variants of BERT // [video] // [slides] // [notes]
- [reading] RoBERTa: more data (Liu et al., 2019)
- [reading] XLNet: longer contexts (Yang et al., 2019)
- [reading] ELECTRA: faster training (Clark et al., 2020)
- [reading] ALBERT: reduced parameters (Lan et al., 2019)

- Quiz 4 released, due 10/9 on Gradescope

Week 7: improving text generation

- Brute force scaling of language models // [video] // [slides]
- [reading] Language models are few-shot learners: GPT-3 (Brown et al., 2020)
- [reading] ...On meaning, form, and understanding (Bender & Koller, 2020)
- [optional reading] Julian Michael's blog post on the Octopus Test
- [optional reading] Chris Potts' article on GPT-3 & the Bender & Koller paper

- Evaluating text generation models // [video] // [slides]
- [reading] BLEURT: robust metrics for text generation (Sellam et al., 2020)
- [reading] Do massively pretrained LMs make better storytellers? (See et al., 2019)

- Quiz 5 released, due 10/16 on Gradescope

Week 8: data augmentation and collection

- Paraphrase generation // [video] // [slides]
- [reading] Neural syntactic preordering for paraphrase generation (Goyal & Durrett, 2020)
- [reading] Adversarial examples via paraphrasing (Iyyer et al., 2018)

- Crowdsourced data collection // [video] // [slides]
- [reading] Annotation artifacts in NLI (Gururangan et al., 2018)
- [reading] Adversarial Examples for SQuAD (Jia et al., 2017)

- Homework 1 released, due 10/28 on Gradescope

Week 9: model distillation and retrieval-augmented LMs

- Model distillation // [video] // [slides]
- [reading] Imitation attacks on MT systems (Wallace et al., 2020)
- [reading] Thieves on Sesame Street! (Krishna et al., 2020)
- [reading] Lottery ticket hypothesis for BERT (Chen et al., 2020)
- [reading] Layer dropout for Transformers (Fan et al., 2019)

- Retrieval-augmented LMs // [video] // [slides]
- [reading] REALM: retrieval-augmented LMs (Guu et al., 2020)
- [reading] Nearest neighbor machine translation (Khandelwal et al., 2020)

Week 10: Transformer implementation, vision + language

- Implementing a Transformer in PyTorch // [video] // [self-attn colab] // [transformer colab]

- Vision + language // [video] // [slides]
- [reading] CEREALBAR: executing instructions in situated interactions (Suhr et al., 2019)
- [reading] Visual-semantic alignments for image captioning (Karpathy & Fei-Fei, 2014)
- [reading] NLVR: A corpus of natural language for visual reasoning (Suhr et al., 2017)

Week 11: Exam week!

- Exam review // [video]

- No class Wed 11/4, prepare for your exam!

Week 12: Ethics and probe tasks

- Ethics in NLP // [video] // [slides]
- [optional readings] Yulia Tsvetkov's Ethics for NLP class

- Linguistic probe tasks // [video] // [slides]
- [reading] What you can cram into a single $&!#* vector (Conneau et al., 2018)
- [reading] Control probes (Hewitt & Liang et al., 2019)

- Quiz 6 released, due 11/20 on Gradescope

Week 13: Semantic parsing and commonsense reasoning

- Semantic parsing // [video] // [slides]
- [reading] WikiTableQuestions (Pasupat & Liang, 2015)

- Commonsense reasoning (guest lecture by Lorraine Li) // [video] // [slides] // [feedback]

- Final project report due 12/4, use this Overleaf template