[Intro to NLP, CMPSCI 585, Fall 2014]


This exercise is due online before lecture on Tuesday.


Part-of-speech tagging

This homework is about part-of-speech tagging. Your data will be from Twitter. Select three English-language tweets that you’d like to work with.

Using the annotation framework described in this paper, label the part-of-speech tags of each word in your three tweets.

Next, using the Penn Treebank part-of-speech tags, annotate the fine-grained POS tag of each verb in your data (e.g., VBD, VBZ, MD). You do not need to do the non-verbs.

Finally, using the Brown part-of-speech tagset, annotate the fine-grained POS tag of each verb in one sentence of your data (e.g., MD*, MD+HV, VBG+TO). Try to choose a sentence that involves a tag that is not covered in the previous two tagsets.


(Exercise borrowed from gtnlp)