CMPSCI 591N : Computational Linguistics
Spring 2006
Homework #5: Part of Speech Tagging with HMMs

Out: Tuesday March 28, 2006
Due: Thursday April 9, 2006, by 11:59pm, by email to compling@cs.umass.edu

In this homework assignment you will implement and experiment with a hidden Markov model (HMM) part-of-speech tagger, and write a short report about your experiences and findings.

After being trained, the hidden Markov model should take a sequence of words as input and produce a sequence of part-of-speech tags as output. As you did with the document classification (spam vs non-spam) exercise, you should report the accuracy of your HMM. (That is, the percentage of tags predicted correctly.)

Everyone should estimate the parameters of an HMM from counts (as we did in class on the board), and implement Viterbi, as described in the first bullet below. There are additional bullets below describing further optional exercises. As usual, you need not be limited by the suggestions of these extra bullets. I you are free to come up with your own tasks.

Please re-check this page as well as the course Web site syllabus, in the homework column for any updates and clarifications to this assignment.

Python and Data Infrastructure available

You may begin with hmm.py which is available at http://www.cs.umass.edu/~mccallum/courses/cl2006/code. You are also welcome to develop your own Python programs from scratch, if you prefer.

For training data, you will use the same POS-tagged Wall Street Journal data that we have used previously for regular expressions. This can be found at
http://www.cs.umass.edu/~mccallum/courses/cl2006/data/wsj15-18.pos

Tasks

What to hand in, and how

The homework should be emailed to compling@cs.umass.edu before 11:59pm on Tuesday April 4, 2006.

In addition to writing your Python program, write a short report about your experiences. Feel free to suggest other additional things you might like to to next that build on what you've done so far. This report should be clear, well-written, but needn't be long--one page is fine. Also, no need for fancy formatting. In fact, we prefer to receive this report as the body of your email. Your program can also be included in the body, or included as an email attachment.

Grading

The assignment will be graded for (a) correctness of your implementation, (b) quality/clarity of your written report, and (d) creativity, effort and success in the task(s) you choose.

Questions?

Feel free to ask! Send email to compling@cs.umass.edu, or if you'd like your classmates to be able to help answer your question, use compling-class@cs.umass.edu.