CMPSCI 585 : Introduction to Natural Language Processing
Fall 2007
Homework #6: Maximum Entropy Classifier

 

In this homework assignment you will implement and experiment with a maximum entropy classifier, and write a short report about your experiences and findings.

You may begin with source code provided by Prof. McCallum; you may alternatively start from scratch if you prefer. The main task if you begin with the provided code is to implement the gradient function, then train and test your classifier in various ways you find interesting. See the tasks below.

See the class slides. For significantly more detail, you might also want to see http://www.cs.berkeley.edu/~klein/papers/maxent-tutorial-slides.pdf. More pointers are available at http://homepages.inf.ed.ac.uk/s0450736/maxent.html.

Please re-check this page as well as the course Web site syllabus, in the homework column for any updates and clarifications to this assignment.

Python and Data Infrastructure available

You may begin with maxent.py and optimize.py which is available at http://www.cs.umass.edu/~mccallum/courses/inlp2007/code.

The package optimize.py depends on the Python Numeric package, which you will also have to install if you don't have it already. (Numeric is deprecated in favor of NumPy, but the only version of optimize.py that we could find depends on the old Numeric instead.) The package optmize.py also imports MLab, but note that this is provided by the Numeric installation.

As with HW#4, we are providing training and testing data in the form of spam and ham email, but you are welcome to find your own data.

Tasks

What to hand in, and how

The homework should be emailed to cs585-staff@cs.umass.edu.

In addition to writing your Python program, write a short report about your experiences. Feel free to suggest other additional things you might like to to next that build on what you've done so far. This report should be clear, well-written, but needn't be long--one page is fine. Also, no need for fancy formatting. In fact, we prefer to receive this report as the body of your email. Your program can also be included in the body, or included as an email attachment.

Grading

The assignment will be graded for (a) correctness of your implementation, (b) quality/clarity of your written report, and (d) creativity, effort and success in the task(s) you choose.

Questions?

Feel free to ask! Send email to cs585-staff@cs.umass.edu, or if you'd like your classmates to be able to help answer your question, use cs585-class@cs.umass.edu.