Natural Language Processing (CS 4705), Fall 2003 |
|||
Time: | TuTh 1:10-2:25 | Place | 702 Hamilton |
Professor: | Julia Hirschberg | Office Hours: | TBA;Th 2:30-3:30, CEPSR 705 |
Email: | julia@cs.columbia.edu | Phone: | 212-939-7114 |
Teaching Assistant: | Jackson Liscombe | Office Hours: | M/W 2-2:50; 3-3:50, CEPSR 702 |
Email: | jaxin@cs.columbia.edu | Phone: | 212-939-7111 |
Announcements || News|| Academic
Integrity || Description
Links to Resources || Requirements || Syllabus || Text || Thanks
This course provides an introduction to the field of computational linguistics, aka natural language processing (NLP) - the creation of computer programs that can understand, generate, and learn natural language. We will study the three major subfields of NLP: syntax (the structure of an utterance), semantics (the truth-functional meaning of an utterance), and pragmatics/discourse (the context-dependent meaning of an utterance). The course will introduce both knowledge-based and statistical methods for NLP, and will illustate the use of such methods in a variety text- and speech-based application areas.
Speech and Language Processing by Jurafsky and Martin. It should be available from the Papyrus Books, as well as from Amazon and other online providers. It should also be on reserve in the Engineering Library. Please check the online errata for the text for each chapter as you read it. The authors are planning a new addition, so if you find an undocumented error, please let Professor Hirschberg know and she will pass the information along to the textbook authors and will provide a link to these by each chapter assignment below.
Three homework assignments, a midterm and a final exam. Each student in the course is allowed a total of 4 late days on homeworks with no questions asked. Homeworks are due by midnight of the due date.
Homework 1 submission procedure.
Copying or paraphrasing someone's work (code included), or permitting your own work to be copied or paraphrased, even if only in part, is not allowed, and will result in an automatic grade of 0 for the entire assignment or exam in which the copying or paraphrasing was done. Your grade should reflect your own work. If you believe you are going to have trouble completing an assignment, please talk to the instructor or TA in advance of the due date.
We currently have two available projects. Please contact Noemie Elhadad (noemie@cs.columbia.edu) for more information.
"Patients with an elevated heart rate are more likely to have hypertension." If a patient has a heart rate of 120, is it elevated?
In many NLP applications, we need knowledge bases that map these vague values to corresponding precise numerical values. There are some manually built knowledge bases in the medical domain, but they are not complete. In this project, we would like to use Natural Language Processing to mine these correspondences automatically from a large corpus of medical texts.
Good programming skills and knowledge of probabilities/statistics are required.
Come and work in an exciting emerging area of Natural Language Processing: text-to-text generation. The ultimate goal is to design algorithms that take a text as input (say a complicated text which contains many technical terms and phrasings) and rewrites it into a new text that basically conveys the same information in simpler language. Ideally, we would like to use machine learning techniques to learn rules that would allow input sentences to be transformed into simpler output sentences. To do so, we need to have training instances of pairs of sentences such as (complicated version, simple version). Given a bunch of texts, the goal of this project is to automatically find the sentence-pair instances.
Good programming skills are required.
Places to look up definitions and descriptions of terminology:
Try out one of the many versions of Eliza on the web.
AT&T Labs - Research Finite State Machine Library
To James Martin, Diane Litman, Johanna Moore and Regina Barzilay, whose course materials have been very helpful in the preparation of this course and to Ani Nenkova for her useful comments.
Announcements || Academic
Integrity || Description
Links to Resources || Requirements || Syllabus || Text || Thanks