!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> syllabus.

 

CS 4705: Introduction to Natural Language Processing, Fall 2006

Time:

TTh: 2:40-3:55

Place

545 Mudd

Professor: 

Julia Hirschberg

Office Hours: 

Tu 4-5,We 12:30-1:30, CEPSR 705

Email: 

julia@cs.columbia.edu

Phone: 

212-939-7114

Teaching Assistant: 

Andrew Rosenberg

Office Hours: 

M 10:30-11:30;TR 11-12,7LW1-A

Email:

amaxwell@cs.columbia.edu

Phone: 

212-939-7147

Announcements || Academic Integrity ||  Contributions || Description
Links to Resources ||
Requirements || Syllabus || Text

Announcements:

Description:

This course provides an introduction to the field of computational linguistics, aka natural language processing (NLP). We will learn how to create systems that can understand and produce language, for applications such as information extraction, machine translation, automatic summarization, question-answering, and interactive dialogue systems. The course will cover linguistic (knowledge-based) and statistical approaches to language processing in the three major subfields of NLP: syntax (language structures), semantics (language meaning), and pragmatics/discourse (the interpretation of language in context). Homework assignments will reflect research problems computational linguists currently work on, including analyzing and extracting information from large online corpora.

Textbook:

Speech and Language Processing by Jurafsky and Martin. It will be available from the University Bookstore, as well as from Amazon and other online providers. It should also be on reserve in the Engineering Library. Please check the online errata for the text for each chapter as you read it.  NB:  Several chapters of this text have been updated by the authors.  Links to them will appear in the syllabus below.  Chapter assignments without hyperlinks are to the hard copy textbook.

Requirements:

Four homework assignments, a midterm and a final exam. Each student in the course is allowed a total of 7 late days on homeworks with no questions asked; after that, points will be deducted for late submission, unless you have a note from your doctor.  Do not use these up early!  Save them for real emergencies.  Homeworks are due by midnight on the due date. 

All students are required to have a Computer Science Account for this class. To sign up for one, go to the CRF website and then click on "Apply for an Account".

Homework submission procedure.

Academic Integrity:

Copying or paraphrasing someone's work (code included), or permitting your own work to be copied or paraphrased, even if only in part, is not allowed, and will result in an automatic grade of 0 for the entire assignment or exam in which the copying or paraphrasing was done. Your grade should reflect your own work. If you believe you are going to have trouble completing an assignment, please talk to the instructor or TA in advance of the due date.

 

Syllabus:

 

Week

Class

Topic

Reading

Assignments

1

Sep 5

Introduction and Course Overview

 

 

 

Sep 7

Natural Language and Formal Language: Regular Expressions and Finite State Automata

Ch 1-2

 

2

Sep 12

Words and Their Parts:  Morphology

Ch 3.3.1, 3.12 (new version)

 Homework 1 Assigned -- available on Courseworks(Homework submission procedure)

 

Sep 14

Word Construction and Analysis: Morphological Parsing

Ch 3.3.2-3.8 (new version)

 

 

3

Sep 19

Words: Tokenization and Spelling

Ch 3.3.9-3.11,3.13 (new version)

 Guest Speaker:  Martin Jansche

 

Sep 21

N-grams and Language Models

 Ch 4 (new version)

 Guest Speaker: Dragomir Radev

4

Sep 26

Word Classes and POS Tagging

Ch 5 (new version)

 

 

Sep 28

Machine Learning Approaches to NLP and Introduction to Weka

Jansche&Abney02

Homework 1 due;  Homework 2 Assigned.

 

5

Oct 3

Context-Free Grammars

Ch 10 (new version)

 

 

Oct 5

Parsing with Context Free Grammars

Ch 11 (new version)

 

 6

Oct 10

Probabilistic and Lexicalized Parsing

Ch 12 (Be sure to replace figure 12.3 with new version )

Guest Speaker:  Srinivas Bangalore

 

Oct 12

Representing Meaning and Midterm Review sheet

Ch 14

 

 7

Oct 17

Semantic Analysis 

Ch  15: 15.1-15.4

 

 

Oct 19

Midterm Examination

Sample midterm

 

 8

Oct 24

Lexical Semantics: Word Sense Disambiguation

Ch 18: 18.1-18.5 (new version)

 

 

Oct 26

Lexical Semantics: Word Relations

 Ch 18: 18.6-18.9 (new version)

 

9

Oct 31

Lexical Semantics: Semantic Roles

Ch 18: Ch 18.10-- (new version)

 Homework 2 due; Homework 3 assigned

 

Nov 2

Robust Semantics and Information Extraction

Ch 15: 15.5-15.6

 

10

Nov 7

Holiday

Holiday

 Holiday

 

Nov 9

Pronouns and Reference Resolution Ch 18: 18.1 (old)

 

 11

Nov 14

Algorithms for Reference Resolution  

 

  Nov 16 Reference Resolution Continued    
12

Nov 21

Machine Translation

Ch 24

 Guest Speaker:  Nizar Habash

 

Nov 23

 Thanksgiving Holiday

 Thanksgiving Holiday

Thanksgiving Holiday

13

Nov 28

 Summarization

Check out :  NewsAtSeven

 Guest Speaker:  Kathy McKeown

Nov 30

Text Coherence and Discourse Structure

Ch 18.2-18.5,20.4 (old); Grosz&Sidner86

 

14

Dec 5

Dialogue Systems

Ch 22 (new version)

Homework 3 due

Dec 7

Natural Language Generation:  Story Generation

Ch 20

 Guest Speaker:  David Elson

 

 

Dec. 21

 

 

Final Examinations

Links to Resources (cf. also resources available from the text homepage):

General:

Places to look up definitions and descriptions of terminology:

Chapters 1 and 2:

Try out one of the many versions of Eliza on the web.

Chapter3:

AT&T Labs - Research Finite State Machine Library

Later Chapters:

Chapter 19:

Announcements || Academic Integrity || Contributions || Description
 Links to Resources|| Requirements || Syllabus || Text