CMPSCI 585 : Introduction to Natural Language Processing
Fall 2007
Homework #6: Probabilistic Context Free Grammars


In this homework assignment you will implement and experiment with a probabilistic (weighted) context free grammar, and write a short report about your experiences and findings.

You may begin with your context free CYK parser from homework #2. Change this implementation so that it accepts a grammar with weights (log-probabilities, or negative log-probabilities) on production rules, and implement dynamic programming so that is properly accumulates the total weights of various parses.

You should then explore some parsing issue that leverages such a weighted grammar. There are several suggestions below. You are also free to show your creativity by creating your own substantive exploratory question.

Everyone should estimate the parameters of an HMM from counts (as we did in class on the board), and implement Viterbi, as described in the first bullet below. There are additional bullets below describing further optional exercises. As usual, you need not be limited by the suggestions of these extra bullets. I you are free to come up with your own tasks.

Please re-check this page as well as the course Web site syllabus, in the homework column for any updates and clarifications to this assignment.

Python and Data Infrastructure available

You may begin with which is available at And/or your own assignment for HW#2.A

As with HW#2, we are not providing training or testing data, but you may make up your own data.


What to hand in, and how

The homework should be emailed to

In addition to writing your Python program, write a short report about your experiences. Feel free to suggest other additional things you might like to to next that build on what you've done so far. This report should be clear, well-written, but needn't be long--one page is fine. Also, no need for fancy formatting. In fact, we prefer to receive this report as the body of your email. Your program can also be included in the body, or included as an email attachment.


The assignment will be graded for (a) correctness of your implementation, (b) quality/clarity of your written report, and (d) creativity, effort and success in the task(s) you choose.


Feel free to ask! Send email to, or if you'd like your classmates to be able to help answer your question, use