[Intro to NLP, CMPSCI 585, Fall 2014]


PS3 part 2 – CKY algorithm

This is due on Thursday at the start of lecture, as a hard copy. (Part 1 is due on Tuesday at the start of lecture.) On your printout, please include your name and collaborators if any.

In this problem, you will implement the CKY algorithm for an unweighted CFG. See the starter code cky.py. Please print out your finished version of cky.py to turn in along with the answers to the questions below.

1. (2 points) The grammar given in cky.py should give at least two parses for the following sentence. Please draw two of these parses by hand.

2. (6 points) Implement the acceptance version of CKY, as cky_accept(), which returns true if there is an S covering the entire sentence. Does it return true or false for the following sentences? Please pprint() the chart cells for each case as well.

Hint: a simple way to implement the chart cells is by maintaining a list of nonterminals at the span. This list represents all possible nonterminals over that span.

Hint: pprint()ing the CKY chart cells may be useful for debugging.

Hint: Python dictionaries allow tuples as keys. For example, d={}; d[(3,4)] = []. A slight shortcut is that d[3,4] means the same thing as d[(3,4)].

3. (14 points) Implement the parsing version of CKY, which returns one of the legal parses for the sentence (and returns None if there are none). If there are multiple real parses, we don’t care which one you print. Implement this as cky_parse(). You probably want to start by copying your cky_acceptance() answer and modifying it. Have it return the parse in the following format, using nested lists to represent the tree (this is a simple Python variant of the Lisp-style S-expression that’s usually used for this.)

    ['S',
        [['NP', [['Det', 'the'], ['Noun', 'cat']]],
         ['VP', [['Verb', 'attacked'], 
                 ['NP', [['Det', 'the'], ['Noun', 'food']]]]]]]

Print out the parses for the following sentences.

Hint: in the chart cells, you will now have to store backpointers as well. One way to do it is to store a list of tuples, each of which is (nonterminal, splitpoint, leftchild nonterm, rightchild nonterm). For example, if the state ('NP', 3, 'Det', 'Noun') is in the cell for span (2,4), that means this is a chart state of symbol NP, which came from a Det at position (2,3) and a Noun at position (3,4).

Hint: it may be useful to use a recursive function for the backtrace.

4. (6 points) Revise the grammar as follows.

The new grammar should accept and reject the following sentences. Please run your parser on these sentences and report the parse trees for the accepted ones. Also, describe how you changed the grammar, and why.

Hint: you will need to introduce new nonterminal symbols, and modify the currently existing ones.

5. (2 points) Describe an enhancement to CFGs or the version of the CKY algorithm you wrote, that would have made it easier to represent and compute grammatical agreement constraints.