CS 685, Fall 2021, UMass Amherst

Schedule

Make sure to reload this page to ensure you're seeing the latest version. Readings should be done before watching the corresponding lecture videos. See this page for materials (videos / slides / reading) from the Fall 2020 offering.

Week 1 (9/1): introduction

- Course introduction // [video] // [slides]
- No associated readings or weekly quiz!
- HW 0 released here, due 9/13 on Gradescope

- Final projects:
- - organize into groups by 9/15
  - proposal due 9/24 on Gradescope, use this Overleaf template

Week 2 (9/8): language models

- Language modeling // [video] // [slides]
- [reading] Jurafsky & Martin, 3.1-3.5 (language modeling)
- [reading] Jurafsky & Martin, 7 (neural language models)
- [F2020 video]

Week 3 (9/13-15): neural LMs, RNNs, backpropagation

- Neural language models // [video] // [slides]
- [reading] Neural language models (Bengio 2003)
- [F2020 video]
- [F2020 video] and colab notebook on implementing a simple neural LM

- Backpropagation // [video] // [notes]
- See F2020 video timestamped here for derivation of gradient through linear layer
- [reading] Andrej Karpathy's coding-based backpropagation post
- [optional reading] Pascanu et al., ICML 2013 (vanishing gradients in RNNs)

- Quiz 1 released, due 9/20 on Gradescope

Week 4 (9/20-22): Attention mechanisms, Transformers

- Attention mechanisms & self-attention // [video] // [slides] // [notes]
- [reading] Bahdanau et al., ICLR 2015 (paper that introduced attention)
- [optional reading] An easy-to-read blog post on attention
- [F2020 video]

- Transformer language models // [video] // [slides] // [notes]
- [reading] Vaswani et al., NeurIPS 2017 (paper that introduced Transformers)
- [reading] An easy-to-read blog post on Transformer language models

- Quiz 2 released, due 9/24 on Gradescope

Week 4 (9/27-29): Transfer learning with pretrained LMs

- Transfer learning with Muppets (ELMo and BERT) // [video] // [notes]
- [reading] Deep contextualized word representations (Peters et al., 2018, "ELMo")
- [reading] BERT: Pre-training of Deep Bidirectional Transformers... (Devlin et al., 2019)
- [reading] Easy-to-read blog post on transfer learning in NLP

- Using BERT for downstream tasks, and advanced BERT variants // [video] // [notes]
- [optional reading] RoBERTa: more data (Liu et al., 2019)
- [optional reading] XLNet: longer contexts (Yang et al., 2019)
- [optional reading] ELECTRA: faster training (Clark et al., 2020)
- [optional reading] ALBERT: reduced parameters (Lan et al., 2019)

- Quiz 3 released, due 10/1 on Gradescope

Week 5 (10/5-7): Text-to-text transfer learning, tokenization

- Transfer learning with text-to-text models, and decoding from LMs // [video] // [slides] // [notes]
- [reading] Exploring the Limits of Transfer Learning... (Raffel et al., JMLR 2020, "T5")
- [reading] Language Models are Few-Shot Learners (Brown et al., NeurIPS 2020, "GPT-3")
- [optional reading] Nucleus sampling (Holtzmann et al., ICLR 2020)

- Tokenization // [video] // [slides]
- [reading] Neural Machine Translation... with Subword Units (Sennrich et al., ACL 2016)
- [reading] ByT5: Towards a token-free future... (Xue et al., 2021)
- [optional reading] CANINE... Tokenization-free encoder... (Clark et al., 2021)
- [optional reading] Charformer... (Tay et al., 2021)

- Quiz 4 released, due 10/8 on Gradescope

Week 6 (10/13): Prompt-based learning

- Prompt-based learning // [video] // [slides] // [notes]
- [reading] Pre-train, Prompt, and Predict... (Liu et al., 2021, survey paper)
- [optional reading] Power of Scale for Prompt Tuning (Lester et al., EMNLP 2021)
- [optional reading] Prefix tuning... Prompts for Generation (Li & Liang, ACL 2021)

HW 1 released here, due 11/5 on Gradescope & via email

Week 7 (10/18-20): Evaluating text generation, multilingual NLP

- Evaluating text generation models // [video] // [slides]
- [reading] Evaluation of text generation survey (Celikyilmaz et al., 2020)
- [optional reading] Do massively pretrained LMs make better storytellers? (See et al., 2019)

- Multilingual transfer learning // [video] // [slides]
- [reading] Beyond English-centric multilingual machine translation (Fan et al., 2020)
- [reading] MAD-X: Multi-task cross lingual transfer (Pfeiffer et al., EMNLP 2020)

- Quiz 5 released, due 10/22 on Gradescope

Week 8 (10/18-20): Retrieval-augmented text generation, efficient Transformers

- Retrieval-augmented LMs // [video] // [slides]
- [reading] REALM: retrieval-augmented LMs (Guu et al., 2020)
- [reading] Nearest neighbor machine translation (Khandelwal et al., ICLR 2021)
- [optional reading] Hurdles to progress in long-form QA (Krishna et al., NAACL 2021)

- Efficient / long-range Transformers // [video] // [slides]
- [reading] Survey of Efficient Transformers (Tay et al., 2020)
- [reading] Routing Transformers (Roy et al., TACL 2020)
- [optional reading] Do long-range LMs use long-range context? (Sun et al., EMNLP 2021)

- Quiz 6 released, due 10/29 on Gradescope
- Extra credit released, due 12/16 on Gradescope

Week 9 (11/1-3): Vision & language, midterm review

- Vision + language // [video] // [slides]
- [reading] Visual-semantic alignments for image captioning (Karpathy & Fei-Fei, 2014)
- [reading] Learning... visual models from NL supervision (Radford et al., 2021, "CLIP")
- [reading] NLVR: A corpus of natural language for visual reasoning (Suhr et al., 2017)

- Midterm review // [video] // [notes]

Week 10 (11/8-10): Commonsense reasoning, midterm (Mohit out, no live/in-person classes!)

- Commonsense reasoning (pre-recorded guest lecture by Lorraine Li) // [video] // [slides]

- Midterm (released 11/9, due 11/11 on Gradescope)

Week 11 (11/15-17): Probe tasks, semantic parsing

- Linguistic probe tasks // [video] // [slides]
- [reading] What you can cram into a single $&!#* vector (Conneau et al., 2018)
- [reading] Control probes (Hewitt & Liang et al., 2019)

- Semantic parsing // [video] // [slides]
- [reading] WikiTableQuestions (Pasupat & Liang, 2015)

- Quiz 7 released, due 11/23 on Gradescope

Week 12 (11/22): Ethics in NLP

- Ethics in NLP // [video] // [slides]
- [optional readings] Yulia Tsvetkov's Ethics for NLP class

Week 13 (11/29, 12/1): Psycholinguistics & parsing

- Computational psycholinguistics // [video] // [slides]
- no readings!

- Syntactic parsing // [video] // [slides]
- [reading] Jurafsky & Martin Ch. 13
- [optional reading] Learning syntax from bracketings (Shi et al., NAACL 2021)
- [optional reading] Unsupervised constituency parsing (Drozdov et al., NAACL 2019, "DIORA")

- Final report due Dec 16th, use this Overleaf template

Week 14 (12/6-8): Knowledge distillation & story generation

- Model distillation // [video] // [slides]
- [reading] Imitation attacks on MT systems (Wallace et al., 2020)
- [reading] Thieves on Sesame Street! (Krishna et al., 2020)
- [reading] Lottery ticket hypothesis for BERT (Chen et al., 2020)

- Story generation // [video] // [slides]
- [reading] Hierarchical neural story generation (Fan et al., 2018)
- [reading] STORIUM: Machine-in-the-loop story generation (Akoury et al., 2020)