CMPSCI 585 : Introduction to Natural Language Processing
Fall 2007
Homework #3: Deterministic Parsing with CYK

 

In this homework assignment you will finish implementing CYK based on the provided Python code, then create a grammar for a given lexicon, and demonstrate that the sentences produced by your grammar are valid English, and finally write a short report about your experiences. There are also several excitinng suggested extension tasks in the bullets below. You should do the first and second bullets. The later bullets are optional. As usual, you need not be limited by the suggestions of these extra bullets. I you are free to come up with your own tasks, as long as you complete the first two bullets.

Please re-check this page as well as the course Web site syllabus, in the homework column for any updates and clarifications to this assignment.

Python Infrastructure available

Begin with cfg.py, available at http://www.cs.umass.edu/~mccallum/courses/inlp2007/code. (This is the module that was demonstrated in class on Thursday.) You are also welcome to develop your own Python programs from scratch, if you prefer.

You can use the code by typing, (for example, where $ is your command-line prompt):

$ python
>>> import cfg
>>> cfg.printsentence()
the ball sees the dogs

This file contains a specification of a grammar in Chomsky normal form, and provides several functions. For example, the function generate will randomly generate a sentence from the grammar. The function generate_tree will also randomly generate a sentence from the grammar, but produced a bracket notation to indicate the rules used in its generation. Your main job is to finish the implementation of the function parse.

Tasks

What to hand in, and how

The homework should be emailed to cs585-staff@cs.umass.edu.

In addition to writing your Python program, write a short report about your experiences in implementing CYK, your experiences creating the first grammar, and as much detail as you like about any extra tasks you choose to do. Feel free to suggest other additional things you might like to to next that build on what you've done so far. This report should be clear, well-written, but needn't be long--one page is fine. Also, no need for fancy formatting. In fact, we prefer to receive this report as the body of your email. Your program can also be included in the body, or included as an email attachment.

Extended version ??

First extension: implement the Extra task 1 above "implement a parser rather than just a recognizer." This means that your Python program should (in some format of your choosing) print out the parse trees of the input sentence. It should print out all valid parse trees.

Second extension: You should devise some "interesting" grammar, exploring some interesting language issue, and experiment with it using your parser.

As usual, in your report you should describe your experiences implementing the algorithm, and write about your experiments and their outcomes.

 

Grading

The assignment will be graded for (a) correctness of your CYK implementation, (b) correctness and comprehensiveness of your grammar, (c) quality/clarity of your written report, and (d) creativity, effort and success in the extra tasks.

Questions?

Feel free to ask! Send email to cs585-staff@cs.umass.edu, or if you'd like your classmates to be able to help answer your question, use cs585-class@cs.umass.edu.