Advanced Natural Language Processing (CS 6998), Spring 2004: Speech Research and Technologies

Time: W  4:10-6:00 Place 825 MUD 
Professor:  Julia Hirschberg Office Hours:  Tu 2-3; Th 12:30-1:30, CEPSR 705
Email:  julia@cs.columbia.edu Phone:  212-939-7114
Teaching Assistant:
Sameer Maskey
Office Hours:
Tu 2:30-4;Th, 12-1 -- CEPSR 720
Email:
smaskey@cs.columbia.edu
Phone:
212-939-7116

Announcements || Academic Integrity || Description
 
Resources || Requirements || Syllabus || Readings

Description:

This course introduces students to research in spoken language in computational linguistics, aka natural language processing (NLP). We will examine approaches to the analysis of the speech signal of particular interest to NLP research, look at several speech technologies in some detail, including approaches to recognizing and to generating speech, and consider several important application areas for these technologies, including spoken dialogue systems and speech data mining. There are no formal prerequisites for the course (i.e. no knowledge of speech or signal processing is assumed) but some knowledge of/serious interest in NLP is assumed. Format for the class will be lecture and discussion. NB: This course can be counted as a PhD elective in Advanced AI.

Readings:

Acoustic & Auditory Phonetics by Keith Johnson  is available from Papyrus.  Speech and Language Processing by Jurafsky and Martin will be a useful reference for those with no formal background in NLP. The early chapters of one of the following paperbacks, The Speech Chain by Peter Denes and Elliot Pinson (on reserve in Psychology Library), Elements of Acoustic Phonetics by Peter Ladefoged will also be useful. Used copies of all these should be available in local bookstores, as well as from Amazon and other online providers. Most course readings will be available either on the web or in-class handouts. NB: '*' means that the reading is optional.

Requirements:

Class participation and a term project. Class participation will include a) bringing 3 discussion questions to class each week, based on the readings; and b) helping with one class during the semester. The project (done alone or in collaboration) on one of the topics coverered in the course or some other topic in spoken language will be defined by each class participant in consultation with the professor. These projects will involve a) a project description; b) periodic project reports on progress; c) a class presentation of project results; and d) delivery of the final project. Project formats may include literature reviews, data collection and analysis, experiments, and/or systems or system components.

Academic Integrity:

Copying or paraphrasing someone's work (code included), or permitting your own work to be copied or paraphrased, even if only in part, is not allowed, and will result in an automatic grade of 0 for the entire assignment or exam in which the copying or paraphrasing was done. Your grade should reflect your own work. If you believe you are going to have trouble completing an assignment, please talk to the professor in advance of the due date.

Announcements:

Resources:

Help using xwaves
Help using ToBI
Getting wavesurfer


Text-to-Song synthesis!

Syllabus:

Week Date Topic Readings and Assignments
1 Jan 21 Introduction to the Course
2 Jan 28 Interpreting Speech Variation Hirschberg03, ToBI labeling conventions, and see ToBI examples
3 Feb 4
Analyzing the Speech Signal Handout; Recommended: Johnson (Chs 1,2) or Denes&Pinson (Chs 1,3,4); TTS exercises (extended to Feb 20)
 4 Feb 11 Speech Generation: From Concept and from Text HLT96-ch5, TTS systems,
Guest Speaker: Martin Jansche
 5 Feb 18 Meanings of Intonational Contours

(Agus G)

Pierrehumbert&Hirschberg '90, Liberman&Sag '75, Hirschberg&Ward '92 (all handouts)
 6 Feb 25 Predicting Accents and Phrasing
Pan99, Sun02, Koehn00, Rambow01
 7 Mar 3
Tools for Speech Analysis
Guest Lecturer:  Jean-Philippe Goldman

See also Praat tutorial

 8 Mar 10 Information Status: Focus and Given/New

(Ani N)

*Nakatani99, GBrown83, Bard99, Prince92, Dahan02

Midterm Reports Due

 9 Mar 17   Spring Break
 10 Mar 25 Speech Recognition and Understanding HLT96-ch1 (Sameer will hold a TTS clinic in the Speech Lab, CEPSR, 7th floor; do the recognition reading as background for weeks 11-15)
 11 Mar 31
Speech Acts and Topic Segmentation

(Eric,Corey)

Jurafsky98, Shriberg00, Nickerson&Chu-Carroll99, *Shriberg98
 12 Apr 7
Speech Disfluencies and Turntaking
(Michael Mu,Sarah,Judd,Aron)
Turn-taking in Conversational Analysis (follow links) , *Sacksetal74, Bear92, *Brennan&Schober99, Hindle83,
 13 Apr 14 Spoken Dialogue Systems
(Vera,Andy,David)
Walkeretal97, Goldberg03, Bell&Gustafson00,[pdf] Krahmer01
 14 Apr 21 Speech Search, Data Mining, and Summarization

(Aaron H,Kyle,Matthew)

SCANMail demo, Furui02, Barzilay00, Hearst99
 15 Apr 28
 Emotional Speech

(Michael Ma, Wayne, Jared)

Cowie00, Pereira00, Schroeder01, Bosch00, *Burkhardt00, *Ang02
 16 May 5
  No Class
 17 May 12   Presentation of Term Projects in CEPSR 415, 3-6:30pm

Announcements || Academic Integrity || Description
Links to Resources || Requirements || Syllabus || Text || Thanks