[Intro to NLP, CMPSCI 585, Fall 2014]
This exercise is due online before lecture on Tuesday.
This homework is about part-of-speech tagging. Your data will be from Twitter. Select three English-language tweets that you’d like to work with.
Using the annotation framework described in this paper, label the part-of-speech tags of each word in your three tweets.
Next, using the Penn Treebank part-of-speech tags, annotate the fine-grained POS tag of each verb in your data (e.g., VBD, VBZ, MD). You do not need to do the non-verbs.
Finally, using the Brown part-of-speech tagset, annotate the fine-grained POS tag of each verb in one sentence of your data (e.g., MD*, MD+HV, VBG+TO). Try to choose a sentence that involves a tag that is not covered in the previous two tagsets.
(Exercise borrowed from gtnlp)