[Back to CS485 home page]
Homework assignments
See Gradescope for up to date due dates etc.
Course topics outline
- Words and Classification Models
- Datasets and Classification Applications
- Linguistic Structure
- Embeddings and Neural Language Models
- Pre-trained LMs, BERT, generative LMs
- NLP Applications
Homeworks
Schedule
Make sure to reload this page to ensure you’re seeing the latest version.
Readings should be done before the indicated class.
The main textbook is available free online:
- JM = Jurafsky and Martin, Speech and Language Processing, 3rd edition. (Main textbook.)
- INLP = Eisenstein, Intro to NLP, online text (or book version; we'll have readings from this one occasionally, and it's recommended in general.)
- See the class survey exercise on Gradescope.
Thu 9/7: class cancelled
- Instructor Zoom office hours will be held instead.
- Reading: JM ch. 5
- Exercise: Logistic regression example.
Make-up version: create an example document with at least 4 features and a given weight vector.
Calculate the probability of the y=1 class as per slide 12, showing your work. Submit on Gradescope.
Tue 10/10: no class
- Reading: INLP 9.2.
Other sections in Chapter 9 are optional but may be of interest.
- Exercise: English CFG
- Reading: JM 18.1-18.2 (Dependency relations, Transition-based parsers; can skip neural for now)
Tue 10/31: Word embeddings
Thu 11/2: Word embeddings / Midterm review
Tue 11/7: In-class midterm
Thu 11/9: No class
- Extra credit exercise: visit one of the poster sessions at TADA 2023 on either Thursday late afternoon, or Friday morning. Takes notes on a poster and write reflections with at least two paragraphs: one paragraph summarizing the work (include the main research question, what they did, and what they found), and one paragraph with your questions about the work or future work, and any other thoughts. Submit via Gradescope.
Thu 11/23: no class
Some resources:
- Blog post: The Illustrated GPT-2 (Alammar, 2018)
- T5 (Raffel et al., JMLR 2020)
- Nucleus sampling (Holtzman et al., ICLR 2020)
- The GPT* line of work are non-peer-reviewed tech reports, with progressively less and less technical detail; for example, GPT-2 (Radford et al. 2019),
GPT-3 (Brown et al. 2020),
and instruction tuning thereof
(Ouyang et al. 2022),
the last of which was used in ChatGPT; to keep the various GPT variants straight, consider the Wikipedia page.
- Recent (2023) prompt engineering resources: Prompt Engineering Guide (promptingguide.ai), Prompt Engineering (Lilian Weng)
- On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? (Bender et al., FAccT 2021)
Note HW4 is due this weekend (link on top of page).
Tue 12/5, Thu 12/7: Final presentations
- Project presentations will be in class this week. See instructions in the Google Sheets slidedecks.
-
Day 1 program,
Day 2 program
- Exercise, due Friday: project presentation writeup
Final project reports are due by the last day of the final exam period (Dec 15; see project page). Late days cannot be used.