[Intro to NLP, CMPSCI 585, Fall 2014]


Schedule

This schedule is provisional and will change as the semester progresses. Readings that are not linked here are available on the Piazza site. (I initially put some up on the Moodle site but some people had problems accessing them there.)

Keynote’09 versions of slides are available by changing “.pdf” to" “.key” in the url.

Date Topic/Readings Homework
1 Tue 9/2 Introduction [slides]

Optional reading: J&M Intro, ch 1

2 Thu 9/4 Word Counting, Probability Review [slides] [IPython demo HTML] [IPython demo .ipynb] [IPython install notes]

Reading: J&M ch 4.1-4.7

Exercise1 due
3 Tue 9/9 Language Models [slides] [extra notes]; next lecture has a few slides too

Reading: J&M ch 4.1-4.7

For PS1: You need to download [ps1.ipynb], load it in ipython notebook, record your work there, and submit your finished .ipynb file. For viewing convenience only, here is an HTML version of the file: [ps1.html]. See also IPython install notes.

PS1 out: [ps1.ipynb], [ps1.html]
4 Thu 9/11 Machine Translation (Model 1 and EM) [slides] [note on model1 and em]

Readings. In this course we only cover Model 1 with technical depth, and only cover the high level aspects of the other IBM models.

5 Tue 9/16 Machine Translation [slides]

Reading: J&M, 25.1-25.9 (MT chapter)

Exercise3 due
6 Thu 9/18 Classification: Naïve Bayes [slides]

Reading: J&M 3rd ed draft, ch 6 (here).

Optional: to better understand the relationship between Naïve Bayes and logistic regression, see Mitchell’s draft chapter here

PS1 due (Friday 11:59pm)
7 Tue 9/23 Classification: Logistic Regression [slides]

J&M 3ed ch6 covers this, but these notes/tutorials on logreg, linear classifiers, and log-linear models may also be helpful:

  • Hastie ESL book, chapter 4, especially 4.4-4.5. (free online). Covers binary logistic regression especially well.
  • Collins notes on log-linear classifiers. (no exercises). Uses the log-linear notation, which we will prefer to use for multiclass outputs.
  • Murphy MLPP book, chapter 8. Exercises 8.6, 8.7 are nice.
  • Eisenstein GTNLP notes, chapter 5, especially 5.4-5.6. (no exercises).
  • Videos: Ng’s Coursera lectures, week 3, on binary logistic regression.
Wed 9/24

Exercise4 due online
8 Thu 9/25 Multiclass/Log-Linear Models, Evaluation [slides], [softmax demo html] [softmax demo ipynb]

Bring Exercise4 to class
9 Tue 9/30 Finite-State Automata, Regexes [slides]

J&M 2ed, chapter 2

10 Thu 10/2 Finite-State Transducers, Morphology [slides]

J&M 2ed, 3.1-3.8

Exercise5 due
11 Tue 10/7 Part-of-speech Tagging [slides]

J&M 3ed, 7.1-7.4 (HMM’s) and J&M 3ed 8.1-8.4 (POS/HMM)

Exercise6 due
12 Thu 10/9 Sequence Models: HMM and Viterbi [notes]

Mon 10/13 PS2 due at midnight. Starter zip is [ps2.zip]. For viewing convenience, here’s also part1 as html and part2 as html.

Tue 10/14

No class (Monday schedule day)

Midterm review session: 5:30-7pm, CS 142

Thu 10/16

In-class midterm [midterm topics/questions]; partial solutions on piazza

The midterm will consist of short answer and multiple choice questions. It may include any topics covered so far in class, or in the readings. In a few cases, I’ve added notes on the schedule next to readings, explaining which elements of them you do not have understand in some cases. The midterm may include some limited mathematical derivations, but no full-blown proofs.

The midterm is closed-book. But if you like, you may bring a cheatsheet: a single sheet of paper of whatever notes you want to have for the test. (Front and back … use standard sized paper, please, like letter or a4.)

13 Tue 10/21 Discriminative Sequence Models [slides]

J&M 3ed, 8.5-8.8 (POS/MEMM)

This material is used in PS3 part 1
14 Thu 10/23 Disc. Seq Models, Applications, Social Media NLP [slides]

15 Tue 10/28

Syntactic Parsing: CFGs [slides pdf], [slides with builds pdf].

course project slides

J&M 2ed, Ch 12 and 13

This material is used in PS3 part 2
16 Thu 10/30 Syntactic Parsing: PCFGs [slides]; cky solution on piazza

J&M 2ed, Ch 14

17 Tue 11/4 Constit. Parsing / Dependencies [slides]

PS3 part1 is due at start of lecture.
18 Thu 11/6 Dependency Parsing [slides]

PS3 part2 is due at start of lecture.
Tue 11/11 NO CLASS (Veteran’s Day)

19 Wed 11/12

Coreference [slides]

In-class coref exercise

J&M 2ed, Ch 21.3-21.8

Check out PS4
20 Thu 11/13 Lexical Semantics [slides]

J&M 2ed, Ch 19

Sun 11/16

Final project milestone due. Links: final project, milestone

21 Tue 11/18 Using Unlabeled Data [slides]

22 Thu 11/20 Topic Models [slides]

Blei 2012’s overview of topic models. More good readings from David Mimno’s seminar.

Exercise 9 (NER anno) due
23 Tue 11/25 Word representations & Neural networks [slides]

PS4 due at the start of lecture

24 Tue 12/2 Semantic Parsing

Kaggle testset submissions for project due
25 Thu 12/4 Review

Fri 12/5

PS5 due
Thu 12/11

Final exam: 3:30–5:30pm, Hasbrouck Lab Add room 124

See final exam topics

Fri 12/12

Final deadline for projects (report and code)