CS 685, Spring 2023, UMass Amherst

Assignments

Homework 0 released, due 2/17
Extra credit talk summaries due by 5/17, use this Overleaf template
Quiz 1 released, due 2/27
Final project proposals due 3/8, use this Overleaf template
Quiz 2 released, due 3/8
Homework 1 released, due 4/5
Midterm scheduled for 4/12
Quiz 3 released, due 4/28
Quiz 4 released, due 5/12
Homework 2 released, due 5/13
Final project reports due 5/17, use this Overleaf template

Schedule

Make sure to reload this page to ensure you're seeing the latest version. Readings should be done before watching the corresponding lecture videos. See this page for materials (videos / slides / reading) from the Fall 2022 offering.

Week 1 (2/6-8): introduction, language modeling

- Course introduction // [video] // [slides]
- No associated readings or weekly quiz!
- HW 0 released here, due 2/17
- Final projects:
- - organize into groups by 2/17
  - proposal due 3/8 on Gradescope, use this Overleaf template

- Language modeling // [video] // [slides]
- [reading] Jurafsky & Martin, 3.1-3.5 (language modeling)
- [reading] Jurafsky & Martin, 7 (neural language models)
- [F2020 video]
- [F2021 video]
- [S2022 video]

Week 2 (2/13-15): neural language models, backpropagation

- Neural language models // [video] // [slides]
- [reading] Neural language models (Bengio 2003)
- [S2022 video]
- [F2021 video]
- [F2020 video] and colab notebook on implementing a simple neural LM

- Backpropagation // [video] // [notes]
- [optional reading] Andrej Karpathy's coding-based backpropagation post
- [optional reading] Pascanu et al., ICML 2013 (vanishing gradients in RNNs)
- [S2022 video]
- [F2021 video]
- See F2020 video timestamped here for derivation of gradient through linear layer

Week 3 (2/22): attention mechanisms

- Attention mechanisms & self-attention // [video] // [notes]
- [reading] Bahdanau et al., ICLR 2015 (paper that introduced attention)
- [optional reading] An easy-to-read blog post on attention
- [F2021 video]
- [F2020 video]

Week 4 (2/27-3/1): Transformers

- Transformer language models // [video] // [notes]
- [reading] Vaswani et al., NeurIPS 2017 (paper that introduced Transformers)
- [reading] An easy-to-read blog post on Transformer language models

- Transformers (cont'd) // [video] // [notes]
- [reading] Deep contextualized word representations (Peters et al., 2018, "ELMo")
- [reading] BERT: Pre-training of Deep Bidirectional Transformers... (Devlin et al., 2019)
- [reading] Easy-to-read blog post on transfer learning in NLP

Week 5 (3/6-3/8): Transformers (cont'd)

- BERT // [video] // [notes]
- [reading] BERT: Pre-training of Deep Bidirectional Transformers... (Devlin et al., 2019)
- [reading] Exploring the Limits of Transfer Learning... (Raffel et al., JMLR 2020, "T5")

- Implementing a Transformer // [video] // [colab]
- [reading] The Annotated Transformer

Week 6 (3/20-3/22): Using large language models

- Decoding from language models // [video] // [slides] // [notes]
- [reading] Nucleus sampling (Holtzmann et al., ICLR 2020)
- [reading] RankGen (Krishna et al., EMNLP 2022)

- Parameter-efficient adaptation // [video] // [slides] // [notes]
- [reading] Pre-train, Prompt, and Predict... (Liu et al., 2021, survey paper)
- [optional reading] Power of Scale for Prompt Tuning (Lester et al., EMNLP 2021)
- [optional reading] Prefix tuning... Prompts for Generation (Li & Liang, ACL 2021)

Week 7 (3/27-3/29): Aligning large language models to human preferences

- Instruction tuning + reinforcement learning with human feedback // [video] // [notes]
- [reading] Instruction tuning (Wei et al., 2022, FLAN)
- [reading] Reinforcement learning from human feedback (Ouyang et al., 2022, RLHF)

- Tokenization // [video] // [slides]
- [reading] Neural Machine Translation... with Subword Units (Sennrich et al., ACL 2016)
- [reading] ByT5: Towards a token-free future... (Xue et al., 2021)

Week 8 (4/3-4/5): Evaluating language generation, and prompt engineering

- Evaluating text generation models // [video] // [slides]
- [reading] Evaluation of text generation survey (Celikyilmaz et al., 2020)
- [reading] BLEURT: robust metrics for text generation (Sellam et al., 2020)
- [optional reading] Do massively pretrained LMs make better storytellers? (See et al., 2019)

- Prompt engineering & augmented LMs // [video] // [slides] // [prompt examples]
- [reading] Lilian Weng's blogpost on prompt engineering
- [reading] REALM: retrieval-augmented LMs (Guu et al., 2020)
- [reading] Toolformer (Schick et al., 2023)

Week 9 (4/10-12): Midterm exam

- Midterm review // [video] // [slides]

- No class on 4/12 due to midterm

Week 10 (4/19): Security risks with LLMs

- Model extraction & other attacks // [video] // [slides]
- [reading] Imitation attacks on MT systems (Wallace et al., 2020)
- [reading] Thieves on Sesame Street! (Krishna et al., 2020)
- [reading] Paraphrasing attacks on AI-generated text detectors (Krishna et al., 2023)

Week 11 (4/24-26): Scaling laws and probe tasks

- Scaling laws of LLMs // [video] // [slides]
- [reading] Scaling Laws for Neural Language Models (Kaplan et al., 2020)
- [reading] Training Compute-Optimal Large Language Models (Hoffmann et al., 2022)

- Linguistic probe tasks // [video] // [slides]
- [reading] What you can cram into a single $&!#* vector (Conneau et al., 2018)
- [reading] Control probes (Hewitt & Liang et al., 2019)
- [reading] Measuring / manipulating knowledge representations in LMs (Hernandez et al., 2023)

Week 12 (5/1-3): Understanding in-context learning, and using it for translation

- Why does in-context learning work? // [video] // [slides]
- [reading] Rethinking the role of demonstrations (Min et al., 2022)
- [reading] What learning algorithm is in-context learning? (Akyürek et al., 2023)

- Document-level literary translation (guest lecture by Marzena Karpinska) // [video] // [slides]
- [reading] Using LLMs to translate paragraphs from novels (Karpinska & Iyyer, 2023)

Week 13 (5/8-10): Multilingual LMs, ethics & NLP

- Multilingual language modeling and adaptation // [video] // [slides]
- [reading] LoRa: Low-rank adaptation of large LMs
- [optional reading] Bactrian-X, an instruction-tuned multilingual LM
- [optional reading] Beyond English-centric multilingual machine translation (Fan et al., 2020)
- [optional reading] MAD-X: Multi-task cross lingual transfer (Pfeiffer et al., EMNLP 2020)

- Ethical issues of LLMs // [video] // [slides]
- [optional readings] Yulia Tsvetkov's Ethics for NLP class

Week 14 (5/17): Vision & language

- Multimodal language modeling // [video] // [slides]
- [reading] Visual-semantic alignments for image captioning (Karpathy & Fei-Fei, 2014)
- [reading] Learning... visual models from NL supervision (Radford et al., 2021, "CLIP")
- [reading] An image is worth 16x16 words (Dosovitskiy et al, 2021)