COMS W4705: Introduction to Natural Language Processing, Fall 2007

Professor: 

Julia Hirschberg

Email: 

julia@cs.columbia.edu

Course Manager:  

Fadi Biadsy

Email:

fadi@cs.columbia.edu

Announcements || Academic Integrity ||  Contributions || Description
Links to Resources ||
Requirements || Syllabus || Text

Announcements:

Description:

This course provides an introduction to the field of computational linguistics, aka natural language processing (NLP). We will learn how to create systems that can understand and produce language, for applications such as information extraction, machine translation, automatic summarization, question-answering, and interactive dialogue systems. The course will cover linguistic (knowledge-based) and statistical approaches to language processing in the three major subfields of NLP: syntax (language structures), semantics (language meaning), and pragmatics/discourse (the interpretation of language in context). Homework assignments will reflect research problems computational linguists currently work on, including analyzing and extracting information from large online corpora.

Textbook:

Speech and Language Processing by Jurafsky and Martin. It will be available from the University Bookstore, as well as from Amazon and other online providers. It should also be on reserve in the Engineering Library. Please check the online errata for the text for each chapter as you read it.  NB:  Several chapters of this text have been updated by the authors.  Links to them will appear in the syllabus below.  Chapter assignments without hyperlinks are to the hard copy textbook.

Requirements:

Four homework assignments, a midterm and a final exam. Each student in the course is allowed a total of 7 late days on homeworks with no questions asked; after that, points will be deducted for late submission, unless you have a note from your doctor.  Do not use these up early!  Save them for real emergencies.  Homeworks are due by midnight on the due date. 

All students are required to have a Computer Science Account for this class. To sign up for one, go to the CRF website and then click on "Apply for an Account".

Academic Integrity:

Copying or paraphrasing someone's work (code included), or permitting your own work to be copied or paraphrased, even if only in part, is not allowed, and will result in an automatic grade of 0 for the entire assignment or exam in which the copying or paraphrasing was done. Your grade should reflect your own work. If you believe you are going to have trouble completing an assignment, please talk to the instructor or TA in advance of the due date.

 

Syllabus:

 

Week

Class

Topic

Reading

Assignments

1

Sep 4

Introduction and Course Overview

 

 

 

Sep 6

Natural Language and Formal Language: Regular Expressions and Finite State Automata

Ch 1-2

 

2

Sep 11

Words and Their Parts:  Morphology

Ch 3.3.1, 3.12 (new version)

 Homework 1

 

Sep 13

Word Construction and Analysis: Morphological Parsing

Ch 3.3.2-3.8 (new version)

 

 

3

Sep 18

Words: Tokenization and Spelling

Ch 3.3.9-3.11,3.13 (new version)

 Guest Speaker:  Martin Jansche

 

Sep 20

N-grams and Language Models

 Ch 4 (new version)

 Guest Speaker: Dragomir Radev

4

Sep 25

Word Classes and POS Tagging

Ch 5 (new version)

 

 

Sep 25

Machine Learning Approaches to NLP and Introduction to Weka

Jansche&Abney02

Homework 1 due;  Homework 2

 

5

Sep 27

Context-Free Grammars

Ch 10 (new version)

 

 

Oct 2

Parsing with Context Free Grammars

Ch 11 (new version)

 

 6

Oct 4

Probabilistic and Lexicalized Parsing

Ch 12 (Be sure to replace figure 12.3 with new version )

Guest Speaker:  Srinivas Bangalore

 

Oct 9

Representing Meaning and Midterm Review sheet

Ch 14

 

 7

Oct 16

Semantic Analysis 

Ch  15: 15.1-15.4

 

 

Oct 18

Midterm Examination

Sample midterm

 

9

Oct 23

Lexical Semantics: Word Sense Disambiguation

Ch 18: 18.1-18.5 (new version)

Homework 2 due; Homework 3 assigned

 

Oct 25

Lexical Semantics: Word Relations

 Ch 18: 18.6-18.9 (new version)

 

10

Oct 30

Lexical Semantics: Semantic Roles

Ch 18: Ch 18.10-- (new version)

 

 

Nov 1

Robust Semantics and Information Extraction

Ch 15: 15.5-15.6

 

11

Nov 6

Pronouns and Reference Resolution Ch 18: 18.1 (old)

 

  

Nov 8

Algorithms for Reference Resolution  

 

12 Nov 13 Reference Resolution Continued    
 

Nov 15

Machine Translation

Ch 24

 Guest Speaker:  Nizar Habash

13

Nov 20

 Summarization

Check out :  NewsAtSeven

 Guest Speaker:  Kathy McKeown

 

Nov 22

Text Coherence and Discourse Structure

Ch 18.2-18.5,20.4 (old); Grosz&Sidner86

Homework 3 due

14

Nov 27

Dialogue Systems

Ch 22 (new version)

 

 

Nov 29

Natural Language Generation:  Story Generation

Optional

 Guest Speaker:  David Elson

 

 

Dec 20

 

 

Final Examination

Links to Resources (cf. also resources available from the text homepage):

General:

Places to look up definitions and descriptions of terminology:

Chapters 1 and 2:

Try out one of the many versions of Eliza on the web.

Chapter3:

AT&T Labs - Research Finite State Machine Library

Later Chapters:

Chapter 19:

Announcements || Academic Integrity || Contributions || Description
 Links to Resources|| Requirements || Syllabus || Text