CS 4705: Introduction to Natural Language Processing, Fall 2009

 

Time:

TTh: 2:40-3:55

Place

 1024 Mudd

Professor: 

Kathy McKeown

Office Hours: 

Tu 4-5,We 4-5, 722 CEPSR

Email: 

kathy@cs.columbia.edu

Phone: 

212-939-7114

Teaching Assistant: 

Sara Rosenthal

Office Hours: 

M 5:30-6:30, 726 CEPSR

Th 1:30-2:30

Kaushal Lahankar

Th 4-5, TA Room, Mudd 122A

M 2-3

Email:

ss3067@columbia.edu

Phone: 

212-939-7122

Announcements || Academic Integrity ||  Submission Directions || Description
Links to Resources || Requirements || Syllabus || Text

Announcements:

1.    Check Columbia Courseworks for announcements, your grades (only you will see them), and discussion. Professor McKeown and your TA will monitor the discussion lists to answer questions.

2.    If you are interested in doing NLP research projects for credit, please let Professor McKeown know. The NLP group often has research opportunities available. 

__________________________

 

Description:

This course provides an introduction to the field of computational linguistics, aka natural language processing (NLP). We will learn how to create systems that can understand and produce language, for applications such as information extraction, machine translation, automatic summarization, question-answering, and interactive dialogue systems. The course will cover linguistic (knowledge-based) and statistical approaches to language processing in the three major subfields of NLP: syntax (language structures), semantics (language meaning), and pragmatics/discourse (the interpretation of language in context). Homework assignments will reflect research problems computational linguists currently work on, including analyzing and extracting information from large online corpora.

 

Textbook:

Speech and Language Processing, 2nd Edition  by Jurafsky and Martin. It will be available from the University Bookstore, as well as from Amazon and other online providers. It should also be on reserve in the Engineering Library.

 

Requirements:

Four homework assignments, a midterm and a final exam. Each student in the course is allowed a total of 4 late days on homeworks with no questions asked; after that, 10% per late day will be deducted from the homework grade, unless you have a note from your doctor.  Do not use these up early!  Save them for real emergencies. 

All students are required to have a Computer Science Account for this class. To sign up for one, go to the CRF website and then click on "Apply for an Account".

 

Academic Integrity:

Copying or paraphrasing someone's work (code included), or permitting your own work to be copied or paraphrased, even if only in part, is not allowed, and will result in an automatic grade of 0 for the entire assignment or exam in which the copying or paraphrasing was done. Your grade should reflect your own work. If you believe you are going to have trouble completing an assignment, please talk to the instructor or TA in advance of the due date.

 

Syllabus:

 

Week

Class

Topic

Reading

Assignments

1

Sep 8

Introduction and Course Overview

 Ch 1

 

 

Sep 10

Natural Language and Formal Language: Regular Expressions and Finite State Automata

Ch 2, 3

 

2

Sep 15

N-grams and Language Models

Ch 4

 HW1 assigned

WSJ article

 

Sep 17

POS Tagging

Ch 5

 

 

3

Sep 22

HMMs and POS

Ch 6

 

 

Sep 24

Context-Free Grammars

Ch 12.1-12.14

 

4

Sep 29

Parsing with Context Free Grammars and Evaluation of POS taggers

Ch 13

HW 1 due

 

Oct 1

Probabilistic and Lexicalized Parsing

Presentation on using NLTK

Ch 14.1-14.7

 

 HW2 assigned

Where the Wild Things Are

NLTK Readme

POS guide

Text for the Stanford parser

5

Oct 6

Representing Meaning

Ch 17-17.2, 17.4

 

 

Oct 8

Semantic Analysis 

Ch  18.1-18.4

 

 6

Oct 13

Semantic Analysis

Ch 19

 

 

Oct 15

Word Sense Disambiguation

Ch 20.1-20.5

 

 7

Oct 20

Machine Learning Approaches to NLP and Introduction to Weka

Jansche&Abney02

 HW 2 due at midnight (err.. 11:59PM) on 10/21

 

Oct 22

Semantic Application and

Midterm Review

 

 

Oct 27

Midterm (midterm solutions)

Sample midterm

 

 

Oct 29

Words Eye

 

 

 

 

 Guest Speaker: Robert Coyne

 8

Nov 3

Holiday

Holiday

 Holiday

 

Nov 5

Summarization, Weka slides

 Ch 23.3-23.7

 HW 3 assigned

, HW3 FAQ

10

Nov 10

Summarization

 

 

Nov 12

Organizational Details

Information Extraction

Ch 22.1-22.4

 

 

Nov 17

Information Retrieval and Question Answering

 Ch 23.1-23.2

 

 

Nov 19

Pronouns and Algorithms for Reference Resolution

Ch 21.3-21.5

HW 3 due, 11:58PM, Sunday Nov. 22nd

 

EXTENSION: HW3 due 11:58 Nov. 25th

 

Nov 24

 Text Coherence and Discourse Structure

Ch. 25

 HW 4

 

 

Dec 1

Generation

 Papers

 

13

Dec 3

Machine Translation

And

Philipp Koehn’s tutorial

Ch 21.1-21.2

 

 

Dec 8

Applications using discourse

Papers

 

14

Dec 10

The future and Final Review

A practice final exam

 

 HW 4 due, 11:58 pm, Dec. 13th

 

 

 

 

 

 

 

 

Dec. 15-16

 

 

 

 

Study Days

 

Dec. 17-23

 

 

Final Exams

 

 

 

Links to Resources (cf. also resources available from the text homepage):

General:

1.    Karen Chung Language and Linguistics links

2.    CatSpeak

Places to look up definitions and descriptions of terminology:

1.    Oxford Dictionary of Linguistics

2.    Interesting Language Factoids and Non

 

Chapters 1 and 2:

Try out one of the many versions of Eliza on the web.

Chapter3:

AT&T Labs - Research Finite State Machine Library

Later Chapters:

1.    Appelt and Israel's information extraction tutorial (IJCAI-99).

2.    Framenet.

Chapter 19:

1.    Ask Jeeves -- a search engine that answers questions in plain English.

2.    Answer Bus -- another Q/A system.

3.    Columbia's NewsBlaster summarizer

4.    IBM summarizer demo (canned)

5.    Systran machine translation (also in use at Babelfish)

6.    AT&T Labs - Research Finite State Machine Library

7.    Michael Collins' Parser

8.    On-line dictionaries in many languages.

9.    WordNet

10.                       Framenet

11.                       CoBuildDirect Corpus

12.                       AT&T's SCANMail voicemail browsing/search system

13.                       DiaLeague 2001 -- includes a link to an online dialogue system demo.

14.                       James Allen's Dialogue Modeling for Spoken Language Systems ACL 1997 Tutorial

15.                       Festival speech synthesizer demo and links to other TTS systems

16.                       Julia Hirschberg's Intonational Variation in Spoken Dialogue Systems tutorial

Announcements || Academic Integrity || Contributions || Description
 Links to Resources|| Requirements || Syllabus || Text