Reading Review Assignments

We will a series of reading summary assignments, which are part of the "weekly reading reviews" grade, where you review a research paper. Please post your reading review to Gradescope by the time it's due.

Your reading review should be one or two paragraphs long. Make sure to include

Full paper citation: include the authors, year, venue/publication, and title of paper. If you omit any of these, it is significantly more difficult to tell at a glance what the paper is about.
Summary: Several sentences describing the motivation of the work, what the authors actually did, and what results they found.
Discussion/Reaction: Two or so questions, criticisms, or thoughts about future or related work that you have in reaction to the paper.

Proper Citations (avoiding plagiarism)

For both reading reviews and the literature review, you'll be extensively citing and discussing other work. Whenever possible, you should paraphrase them in your own words -- that's the point of the synthesis and interpretation process.

Of course, sometimes it can be most clear to directly quote from a paper. If you copy text, images, or other media from another source for use in your paper, you must appropriately cite the source, and make it clear that you're quoting from it. Not doing so is a form of plagiarism and a violation of the Academic Honesty Policy; in this class we take academic honesty seriously, and will pursue remedies for violations (see the link for details).

For more information on plagiarism, please see the Purdue Online Writing Lab FAQ on plagiarism. It includes this non-exhaustive list of what requires citation (list slightly reordered, taken from their FAQ):

Words or ideas presented in a magazine, book, newspaper, song, TV program, movie, website, computer program, letter, advertisement, or any other medium.
When you copy the exact words or a unique phrase.
When you reprint any diagrams, illustrations, charts, pictures, or other visual materials.
When you reuse or repost any digital media, including images, audio, video, or other media.
Information you gain through interviewing or conversing with another person, face to face, over the phone, or in writing.

The rest of this webpage is about the larger literature review assignment.

Literature Review

Due: March 26

The literature review is a paper that reviews a subfield of NLP of your choice.

To ensure some intellectual diversity and depth of literature search, your review must cover at least 10 resesarch papers, and there must be at least 2 papers in each decade since 1990, and 2 papers from before 1990. That is, at least 2 papers in each bucket <=1989, 1990-1999, 2000-2009, and 2010-2021. You can reuse papers you reviewed in reading reviews assignments.

It can sometimes be harder to find older NLP papers that are also good/relevant. But machine learning, statistics, and linguistics are substantially older disciplines and nearly all NLP work builds on their ideas; see more discussion under "General tips" below. Not all of your reviewed papers have to be NLP -- in fact, it's fine if a majority aren't NLP, as long as they're helping the reader understand the overall NLP topic.

Long papers: If you read and review a long journal paper (like, twice as long as a typical 8 page conference paper), that counts as 2 papers with regard to the 10 paper. If you review a full length book, that counts as 3 papers. (Please say what you're counting things as!)

We generally expect your review to be 8-15 pages long, not including the references list at the end. You must use the ACL style files, such as from the LaTeX template linked from the ACL CFP).

Literature reviews must be completed individually.

Your review should not merely describe the papers, but also synthesize, organize, and relate them to one another and the broader literature in NLP, and ideally also ML and linguistics. It can be done either individually or in a group of two.

Here are two excellent examples of papers that were orginally lit reviews for a course:

Kamanth and Das (2019), "A Survey on Semantic Parsing," was originally a lit review for this course! Its authors worked on it more and successfully submitted it to AKBC the next year. They got these pretty positive reviews.
Das and Martins (2007), "A Survey on Automatic Text Summarization," was originally written for a similar lit review class assignment, and has been cited at least a dozen of times since.

Other examples, with more synthesis so they aren't purely literature reviews, include:

Blodgett et al. (2020), "Language (Technology) is Power: A Critical Survey of “Bias” in NLP."
Keith et al. (2020), "Text and Causal Inference: A Review of Using Text to Remove Confounding from Causal Estimates."
Wang et al. (2021), "Putting Humans in the Natural Language Processing Loop: A Survey."
Turney and Pantel (2010) on distributional semantics.
Eisenstein (2013) on NLP for internet "bad language."

There are different ways to structure a literature review. Typically, you should have something like:

An introduction section that explains what the area is and the motivation to study it --- why is it an interesting area of research, and why should the reader care?
A main body, typically divided into several sections, that describes previous work and specific research papers. The most boring way to structure this is as a long list of papers with a paragraph describing each. That's OK when you're writing notes for yourself, but it's better to do some synthesis as well. Try to group papers by common themes, methods, datasets, or assumptions. What is similar and different among them? Did the body of research change over the years? Do different areas of research approach it differently?
A discussion and (potentially brief) conclusion, which sums up the main points you made. This may be a good place to discuss interesting possibilities for future work, or class projects!

Also make sure to:

Have a title, author name, and date.
Have a properly and consistently formatted references list. There are different standards to do this (for example, opinions differ whether the page number is that relevant any more; in this class we don't care), but make sure there is enough key information for others to be able to find the paper, know the exact version you are referring to, and to make a quick determination of its credibility. That means at least: authors, article title, name of publication or venue, date.

Research paper reading

When reading and discussing a research paper, here are some things to write up, or make sure you can answer to your satisfaction:

What is the authors' research question? What problem are they trying to solve? Distill it to one sentence. Sometimes the paper authors do not concisely state this, so it is up to you to figure this out.
What is a concrete example of the NLP task, or type of data or linguistic phenomenon, that the paper is about? Sometimes the paper authors do not do a good job at this; you can better understand what they're doing by thinking or discussing specific examples.
Describe what the authors did, their results, and any implications it may have.
Discussion: what questions did you have? What do you wish or want to be clarified? What do you think of the work and are there suggestions or ideas for future work?

It's OK to explicitly use questions like these when structuring your reading assignment writeups or your initial notes to yourself. For your actual literature review document, it may be awkward or clunky to explicitly structure your discussion of each paper with the above questions, but whatever you write should implicitly address these questions.

General tips for researching the literature

Use Google Scholar or Semantic Scholar to find related papers, and papers that cite a particular paper you've found.
Always look at a paper's references list and try to find the most interesting-looking, or most-cited previous papers. Keep a text file or bookmarks of references that look promising, to check back on later.
Look at other papers written by the paper's authors, especially the more senior ones who may have worked in this area for a while. Sometimes the paper you're looking at is less interesting or relevant than a related one written by one of itst authors.
Note there are different types of papers that all go into understanding an area of NLP. There are empirical, task-driven modeling papers which dominate current NLP publishing. But behind there's more supporting type of work that goes into them For example, there are more theoretical modeling papers on the machine learning or algorithmic underpinnings of the modeling being used. Or linguistics or psychology papers on the language and human behavioral phenomena that inform or help define the NLP task. And there are dataset papers that introduce and discuss the datasets themselves, apart from any NLP models or systems which might later be built with or for them. Your literature review will be stronger if you blend insights from these different viewpoints on your area!
Learn the important terms in the research area, which will help find more relevant or interesting papers -- for example, if you keep searching for "chatbots", you might find there is a field called "dialog systems" which is extremely relevant. Find the names of journals, conferences, and workshops in the field, as well as particular people who do lots of resesarch in the area; you can use all of this to find more related work.
Make sure to skim papers when you first encounter them: read the abstract and jump ahead to the results to roughly understand what they did. Decide later whether it's worth a deeper read.
Search tip: for, say, NLP involving Wikipedia, try expanding with NLP or site keywords; e.g., "wikipedia nlp," "wikipedia site:aclweb.org," "wikipedia ACL", etc.
Look at the ACL Anthology website: www.aclweb.org/anthology.

Suggested Papers

Here is a random sampling of papers that may be of interest, either as themselves or as jumping off points for others.

Brown et al. 1993. The Mathematics of Statistical Machine Translation: Parameter Estimation. Computational Linguistics.
Grosz et al. 1995. Centering: A Framework for Modeling the Local Coherence of Discourse. Computational Linguistics.
Pang et al., 2002. Thumbs Up? Sentiment Classification Using Machine Learning Techniques. Proceedings of EMNLP.
Collins, 2002. Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms. Proceedings of EMNLP.
Sauri and Pustejovsky, 2012. Are You Sure That This Happened? Assessing the Factuality Degree of Events in Text. Computational Linguistics.
Tsvetkov and Dyer, 2016. Cross-Lingual Bridges with Models of Lexical Borrowing. JAIR.
Eisenstein 2013. What to do about bad language on the internet. Proceedings of NAACL.
Dodge et al. 2019. Show Your Work: Improved Reporting of Experimental Results. Proceedings of EMNLP-IJCNLP.
Caliskan et al. 2017. Semantics derived automatically from language corpora contain human-like biases. Science.
Ramiro et al., 2018. Algorithms in the historical emergence of word senses. PNAS.

Possibly of interest: these ACL Anthology pages let you see number citations of papers for entire venues; you can rank by citation count (it only tracks within ACL Anthology papers) to see popular ones. They're not always interesting, but are sometimes.

Papers on text analysis as a tool for social science and the humanities:

Several papers from the Journal of Digital Humanities, 2(1). To start, see the overview: Weingar and Meeks, 2012. The Digital Humanities Contribution to Topic Modeling.
Grimmer and Stewart, 2013. Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts. Political Analysis.
Monroe et al., 2008. Fightin’ Words: Lexical Feature Selection and Evaluation for Identifying the Content of Political Conflict. Political Analysis.
One of the social science-oriented papers from the webpage for Structural Topic Models.

Other areas.

Look at the list of Workshops on the ACL Anthology here. Workshops tend to have topically focused sets of papers, whose organizers and paper authors typically also have published in the area. All the other major conferences also have workshops.
Wikipedia can be hit-or-miss, but it isn't a terrible place to look. For example, Computational humor or Stylometry.