Lecture: Tuesday/Thursday 2:30-3:45pm in Computer Science Building, Room 140
Course web page:
http://www.cs.umass.edu/~mccallum/courses/cl2006
Mailing lists:
compling-class@cs.umass.edu
Announcements, revisions, homework hints, etc.
It is important that all students are subscribed to this mailing list.
To subscribe send email to mccallum@cs.umass.edu with SUBSCRIBE cs591N in the subject line.
All students who were registered in SPIRE as of January 30, 2004, are already subscribed to their student.umass.edu accounts.
compling@cs.umass.edu
Send your questions here! Questions, clarrifications, comments, to the course professor, TAs, and assistant.
We welcome all comments and suggestions about the course!
Professor: Andrew McCallum
Office: Computer Science Building, Room 242
Office Hours: Tuesday 3:45-4:45pm (subject to change)
Phone: (413) 545-1323
Fax: (413) 545-1789
E-mail: mccallum@cs.umass.edu
Professor McCallum's work aims to dramatically increase our ability to mine actionable knowledge from unstructured text. He is especially interested in information extraction from the Web, understanding the connections between people and between organizations, expert finding, social network analysis, and mining the scientific literature & community. Toward this end his research group develops and employs various methods in statistical machine learning, natural language processing, information retrieval and data mining---tending toward probabilistic approaches and graphical models.
Andrew McCallum is an Associate Professor at University of Massachusetts, Amherst. He was previously Vice President of Research and Development at WhizBang Labs, a company that used machine learning for information extraction and data fusion from the Web. In the late 1990's he was a Research Scientist and Coordinator at Justsystem Pittsburgh Research Center, where he spearheaded the creation of CORA, an early research paper search engine that used machine learning for spidering, extraction, classification and citation analysis. He was a post-doctoral fellow at Carnegie Mellon University after receiving his PhD from the University of Rochester with Dana Ballard in 1995. He is an action editor for the flagship Journal of Machine Learning Research. For the past ten years, McCallum has been active in research on statistical machine learning applied to text, especially information extraction, document classification, clustering, finite state models, semi-supervised learning, and social network analysis. He has given invited talks in the past few years at MIT, Stanford, CMU, U. Washington, Brown, Xerox PARC, IBM Almaden, IBM Watson, SRI, AT&T Research, Yahoo and Google.
TA: Pallika Kanani (graduate student)
Office: Computer Science Building, Room 264
Office hours: TBD, see course web site.
Phone: (413) 545-3616 (during office hours only)
E-mail: pallika@cs.umass.edu
TA: Greg Druck (graduate student)
Office: Computer Science Building, Room 264
Office hours: TBD, see course web site.
Phone: (413) 545-3616 (during office hours only)
E-mail: gdruck@cs.umass.edu
Assistant: Gideon Mann (post-doc)
Office: Computer Science Building, Room 264
Office hours: TBD, see course web site.
Phone: (413) 545-3616 (during office hours only)
E-mail: gmann@cs.umass.edu
To introduce students to both fundamental concepts of computational linguistics and natural language processing (NLP), as well as some current research in the area. To give students hands-on experience using computational tools to manipulate natural languages.
Computational Linguistics addresses the fundamental questions at the intersection of human languages and computer science. How can computers acquire, comprehend and produce natural languages, such as English? How can computational methods give us insight into observed human language phenomena? How can you get a job at Google? In this interdisciplinary introductory course, you will learn how computers can do useful things with human languages, such as traslate from French into English, summarize a magazine article into a few sentences, and find the main topics in the day's news. You will also learn about how computational methods can help linguists explain language phenomena, including automatic discovery of different word senses and phrase structure. Over the past decade, computational linguistics has been revolutionized by statistical and probabilistic methods; you will learn abouit robust methods of probabilistic parameter estimation and inference. Our will will include learning new methods, discussions, and hands-on laboratories. While some limited computer programming will be necessary, the course does not assume previous experience in programming.
Intended Audience:
This course is aimed at CS and Linguistics undergraduates, and Linguistics graduate students.
Prerequisites:
Either CMPSCI 287, or LINGUIST 401, or graduate standing in Linguistics. (Computer Science graduate students are not encouraged to attend.)
Expected skills:
• Basic familiarity with logic, basic mathematics (logs, exponents, etc), basic probability by ratio of counts.
• Ability to use a computer, word processor. Readiness to learn a basic programming language, with hand-holding.
The required text is
• Jurafsky and Martin, Speech and Language Processing, Prentice Hall; (January 26, 2000)
ISBN: 0130950696.
See also http://www.cs.colorado.edu/~martin/slp.html for supplementary information, including errata, and new versions of some chapters.
You can read the text online at http://cognet.mit.edu/library/books/view?isbn=0262133601
Each student should be able to get in if they use their "Umail" (OIT) username and password. The UMass Library said that all students should have such an account because they need that for all other services in UMass (even if they use a CS account primarily).
As an alternative, if they are accessing from on-campus, they can go in through the UMass library page and get in without a password. Go to: http://www.library.umass.edu/. Then click on "databases" and type cognet. Then, click on the "cognet" site and it will get you access to books, journals, etc
See also http://nlp.stanford.edu/fsnlp/ for supplementary information about the text, including errata, and pointers to online resources.
The following text are useful but optional:
• Chris Manning and . 1995. Natural Language Understanding. Benjamin/Cummings, 2ed.
• James Allen. 1995. Natural Language Understanding. Benjamin/Cummings, 2ed.
• Gerald Gazdar and Chris Mellish. 1989. Natural Language Processing in X. Addison-Wesley.
• Dan Jurafsky and James Martin. 2000. Speech and Language Processing. Prentice Hall.
We will be using the programming language Python in this class. There are many excellent Python tutorials on-line, includings some for experienced programmers, some for those new to programming, and even some Linguists who are new to programming.
• For linguists new to programming: http://www.zacharski.org/python/
• Other Python pointers for linguists: http://www.ai.uga.edu/mc/PythonForNewbieLinguists.html
• The Natural Language Toolkit (NLTK) in Python: http://nltk.sourceforge.net/
Additional handouts and papers will occasionally be distributed and discussed during the course of the class. Electronic copies (when available) can be accessed from the syllabus.
Students can use their own computers. If you do not have access to a computer, see the Instructor as soon as possible, and we will make other arrangements for you. Materials for class assignments will be made available via the web, and so internet access will be required.
25% | homework assignments (these will also include opportunities for extra credit.) |
20% | final project |
20% | midterm exam |
25% | final exam |
10% | classroom participation & possible "collaborative exercise" quizzes |
Homework submission: Homework is due by email attachment to compling@cs.umass.edu by 11:59pm on the date indicated on the homework assignment. Late homework submissions may be accepted at the discretion of the instructor, but not after a solution set has been handed out. There will be grading penalties for late assignments.
Project Collaboration: One of the exciting things about this course is that we will be bringing together linguists and computer scientists! I plan to take advantage of this by requiring that final projects be done in mixed teams. We will all learn a lot from each other. As part of the write-up for the project, each student will write a brief assessment of their own contribution to the assignment, as well as that of their teammates. These, along with my own impressions of the contributions and teamwork will go into the individual grades assigned.
Homework Collaboration: This fruitful collaboration shouldn't wait only until the project, however. I encourage linguists and computer scientists to meet outside of class, discuss the classwork, and even work side-by-side on homework assignments. For each homework assignment, you can of course do it on your own, but you also have the option of working closely in a small group that combines computer scientists and linguists. You can discuss the assignment, the solutions, possible extensions to the assignments that you might want to add. You will, not however, hand in a single, joint assignment. In the end, each student should write up their own assignment, their own program, and hand in their own work. You also must write clearly at the top of the assignment, who you collaborated with, and in what capacity. (See also "Academic Honesty" below.) If the line between "encouraged collaboration" and "cheating" isn't clear, please ask the instructor!
One recommended way to do the homework, especially for those new to programming, is to do the entire assignment during office hours, with a TA by your side. There are multiple TAs and extensive office hours especially for this purpose. Learning to program can be frustrating when done in isolation. I don't want "programming frustrations" to be a factor in this course, so you are welcome to do all of your programming in the presence of a TA, who will help you through the technical details and silly "gotchas", so you can focus on the Computational Linguistics material. Note that you can combine this recommendation with the collaboration recommendation, and show up to TA office hours with your collaborative group, and do the assignment all together there.
Rescheduling exams: Exams may be taken other than at the scheduled time, but only under exceptional circumstances and then only if approved by the instructor well before the exam. Makeup exams will rarely be the same as the original exam, and will usually be all or partly oral.
Academic Honesty: Your work must be your own, or that of your own project team. You are encouraged to discuss problems, ideas and inspirations with other students, but the final answers, the programming, the writing, and the final result that you hand in must be your own or your own project team's effort. If you have questions about what is honest, please ask! You are strongly encouraged to cite your sources if you received extraordinary help from any person or text (including the Web). Department policy specifies that the penalty for cheating or plagiarism is (1) a final course grade of "F" and (2) possible referral to the Academic Dishonesty Committee. The UMass policy can be found here.
Policy on Regrading: We do make every effort to ensure that your exam or assignment is graded right the first time! However, sometimes people miss things, or there can be disagreements in interpretation. If you're unhappy with the grade for a question, you need to make a written request for a regrade and to resubmit your entire exam or homework, either to one of the TAs or to the instructor. The request doesn't have to be formal and long. Simply writing on a sheet of paper "8 points were taken off question 3, but I think it's a perfectly valid answer to the question" is sufficient. Normally, the TA will regrade it. If you're still not happy, you should repeat this process, but indicate that you want the instructor to re-regrade it. Negating this policy: you should not e-mail grading complaints, and you can't expect assignments to be regraded "while you wait".
Auditing: If you are interested in auditing the course, please contact the instructor. Official auditors will normally be expected to complete all of the homeworks and programming assignments, and to achieve at least a C-level performance. Anyone enrolled for audit should contact the instructor early in the semester to discuss the requirements for receiving audit credit for this course. If the course is heavily over-enrolled, auditing may not be possible.
Attendance: Students are expected to attend each class. Attendance will not be taken directly, but absence may be noted because of occasional in-class assignment. The official means of communication for this course will be in-class announcements, though every effort will be made to ensure that important announcements go out on the course mailing list or appear on the course Web pages.
Course Web page: The class World Wide Web page is http://www.cs.umass.edu/~mccallum/courses/cl2006. Assignments, online materials, and notes about assignments will be available from this page.