CS 4705: Introduction to Natural Language Processing, Fall 2004

Time:

MW  1:10-2:25

Place

Mudd 545

Professor: 

Julia Hirschberg

Office Hours: 

M 2:30-3:30;Th 3:30-4:30, CEPSR 705

Email: 

julia@cs.columbia.edu

Phone: 

212-939-7114

Teaching Assistant: 

Sameer Maskey

Office Hours: 

MW 4-5

Email:

smaskey@cs.columbia.edu

Phone: 

212-939-7116

Announcements || Academic Integrity ||  Contributions || Description
Links to Resources ||
Requirements || Syllabus || Text

Announcements:

Description:

This course provides an introduction to the field of computational linguistics, aka natural language processing (NLP) - the creation of computer programs that can understand, generate, and learn natural language. We will  study the three major subfields of NLP: syntax (the structure of an utterance), semantics (the truth-functional meaning of an utterance), and pragmatics/discourse (the context-dependent meaning of an utterance). The course will introduce both linguistic (knowledge-based) and statistical approaches to language processing, and will illustate the use of such methods in a variety of text- and speech-based application areas, including spoken dialogue systems, speech recognition and synthesis, machine translation, and language summarization.

Textbook:

Speech and Language Processing by Jurafsky and Martin. It will be available from the Morningside Bookshop (was Papyrus Books), as well as from Amazon and other online providers. It should also be on reserve in the Engineering Library. Please check the online errata for the text for each chapter as you read it.

Requirements:

Three homework assignments, a midterm and a final exam. Graduate students will have one additional assignment. Each student in the course is allowed a total of 5 late days on homeworks with no questions asked; after that, points will be deducted for late submission, unless you have a note from your doctor.  Do not use these up early!  Save them for real emergencies.  Homeworks are due by midnight on the due date. 

All students are required to have a Computer Science Account for this class. To sign up for one, go to the CRF website and then click on "Apply for an Account".

Homework submission procedure.

Academic Integrity:

Copying or paraphrasing someone's work (code included), or permitting your own work to be copied or paraphrased, even if only in part, is not allowed, and will result in an automatic grade of 0 for the entire assignment or exam in which the copying or paraphrasing was done. Your grade should reflect your own work. If you believe you are going to have trouble completing an assignment, please talk to the instructor or TA in advance of the due date.

 

Syllabus:

 

Week

Class

Topic

Reading

Assignments

1

Sep 8

Introduction and Course Overview

 

 

 

Sep 13

Regular Expressions and Automata

Ch 1-2

 

2

Sep 15

Morphology and FSTs

Ch 3

 Homework 1 Assigned (nb: Homework submission procedure)

 

Sep 20

Phonetics, Phonology and Text-to-Speech

Ch 4

 

3

Sep 22

N-grams and Machine Learning

Ch 6

 Guest Speaker:  Sameer Maskey

 

Sep 27

Word Pronunciation and Spelling

Ch 5

 

4

Sep 29

Automatic Speech Recognition

Ch 7

 

 

Oct 4

Word Classes and POS Tagging

Ch 8

 Guest Speaker:  Martin Jansche

5

Oct 6

CFGs for English

Ch 9

 Guest Speaker: Owen Rambow

 

Oct 11

Basic Parsing with CFGs

Ch 10:1-3

Homework 1 due

 

Oct 13

Parsing Problems and Some Solutions

Ch 10:4-6; 11:0-3

 

 7

Oct 18

Probabilistic and Lexicalized Parsing

Ch 12

 Be sure to replace figure 12.3 with new version

 

Oct 20

 

 Sample midterm

Midterm Examination; Grad assignment paper list due

8

Oct 25

Meaning Representations and Semantic Analysis

Ch  14-15 (15.1-3 opt)

 Homework 2 Assigned.

 

Oct 27

Lexical Semantics 

Ch 16

 

 

Nov 1

 

 

Holiday

9

Nov 3

Word Sense Disambiguation

Ch 17.1-2, TBA

 

 

Nov 8

Robust Semantics and Information Retrieval

Ch 17.3-5

 

 

Nov 10

YALE Review

 Nov 12

11

Nov 15

Text Coherence and Discourse Structure

Ch 18.2-3,5; Grosz&Sidner86

 

 

Nov 17

Reference Resolution

Ch 18.1,4

Guest Lecturer: Ani Nenkova

Homework 2 First Report due

12 Nov 22 Information Status Prince92
 

Nov 24

Information Status 2

 

 

 

Nov 25

 

 

Thanksgiving Holiday

13

Nov 29

Spoken Dialogue Systems

Ch 19

 

Dec 1

Intonation in TTS Systems

 Ch 4.7

 

14

Dec 6

New Approaches to Story Modeling for Understanding, Generation and Summarization

Sengers, Smith97, and optional:  NYT

Guest Lecturer: David Elson

Dec 8

Machine Translation

Ch 20

Guest Lecturer: Nizar Habash 

Homework 2 Final Report due

 17

Dec. 13

Summing Up: NLP Research and Applications

 

Grad assignment report due

 

Dec. 20

 

 

Final Examination

Links to Resources (cf. also resources available from the text homepage):

General:

Places to look up definitions and descriptions of terminology:

Chapters 1 and 2:

Try out one of the many versions of Eliza on the web.

Chapter3:

AT&T Labs - Research Finite State Machine Library

Later Chapters:

Chapter 19:

Announcements || Academic Integrity || Contributions || Description
 Links to Resources|| Requirements || Syllabus || Text